我正在嘗試創建一個串列,該串列將公司股票代碼與新聞標題及其相應日期分組。
資料的頭部基本上如下所示:
{'ford-motor-co': "\n\n\n\n\n\nFord Unloads More Shares in Electric-Vehicle Startup
Rivian\nBy The Wall Street Journal\xa0-\xa020 hours ago\nFord sold 7 millionRivian
shares at a price of $26.88, the company says. That followed an 8-million-share
sale earlier in the week at about the same price.\n\n\n\n\n \nFord sells shares in
EV maker Rivian for $188.2 million\nBy Reuters\xa0-\xa0May 14, 2022 (Reuters) -
Ford Motor (NYSE:F) Co sold 7 million shares of electric carmaker Rivian Automotive
Inc for about $188.2 million, or $26.88 apiece, the U.S. automaker said in a filing
on Friday. Ford now... \n\n\n\n\n\n\n\nFord sells sh
我已經成功地提取了日期和股票代碼,但我無法弄清楚如何將日期與其相關的新聞標題分組。
parsed_data = []
for stock , stock_news_table in stock_news_tables.items():
date_data = re.findall(r'[A-Z][a-z]{2} \d{1,2}, \d{4}' , str(stock_news_table))
headline = stock_news_table
#print(date_data)
parsed_data.append([stock , date_data , headline])
到目前為止的輸出如下所示。如您所見,標題在有多個新行的地方被拆分: \n\n\n\n 。
[['ford-motor-co',
['May 14, 2022',
'May 14, 2022',
'May 13, 2022',
'May 13, 2022',
'May 13, 2022',
'May 13, 2022',
'May 12, 2022',
'May 12, 2022',
'May 12, 2022'],
"\n\n\n\n\n\nFord Unloads More Shares in Electric-Vehicle Startup Rivian\nBy The
Wall Street Journal\xa0-\xa020 hours ago\nFord sold 7 million Rivian shares at a
price of $26.88, the company says. That followed an 8-million-share sale earlier
in the week at about the same price.\n\n\n\n\n \nFord sells shares in EV maker
Rivian for $188.2 million\nBy Reuters\xa0-\xa0May 14, 2022 (Reuters) - Ford
Motor (NYSE:F) Co sold 7 million shares of electric carmaker Rivian Aut
uj5u.com熱心網友回復:
我設法使用dateparser自然語言日期決議器和 2 個不同的正則運算式解決了您的問題。希望這就足夠了。
首先,安裝dateparaser:
pip install dateparser
然后運行代碼:
import collections, re, dateparser
Stock = collections.namedtuple("Stock", ["name", "symbol", "headlines"])
# Remember, '.' is not multiline, equiv to '[^\n] '
headline_re =re.compile(r"\n\n ?\n(?P<headline>. )\nBy . ?\xa0-\xa0(?P<date>[\w ,] )")
symbol_re = re.compile(r"\(([A-Z]{1,4}:[A-Z]{1,4})\)")
input_data = {'ford-motor-co':(
"\n\n\n\n\n\nFord Unloads More Shares in Electric-Vehicle Startup "
"Rivian\nBy The Wall Street Journal\xa0-\xa020 hours ago\nFord sold 7 millionRivian "
"shares at a price of $26.88, the company says. That followed an 8-million-share "
"sale earlier in the week at about the same price.\n\n\n\n\n \nFord sells shares in "
"EV maker Rivian for $188.2 million\nBy Reuters\xa0-\xa0May 14, 2022 (Reuters) - "
"Ford Motor (NYSE:F) Co sold 7 million shares of electric carmaker Rivian Automotive "
"Inc for about $188.2 million, or $26.88 apiece, the U.S. automaker said in a filing "
"on Friday. Ford now... \n\n\n\n\n\n\n\nFord sells sh")}
stocks = []
for name, data in input_data.items():
headlines = []
for match in headline_re.finditer(data):
date_str = match.group("date")
date = dateparser.parse(date_str)
headlines.append((match.group("headline"), date))
symbol = symbol_re.search(data).group(1)
stocks.append(Stock(name, symbol, headlines))
產出(庫存):
[Stock(name='ford-motor-co', symbol='NYSE:F', headlines=[('Ford Unloads More Shares in Electric-Vehicle Startup Rivian', datetime.datetime(2022, 5, 14, 20, 58, 28, 30552)), ('Ford sells shares in EV maker Rivian for $188.2 million', datetime.datetime(2022, 5, 14, 0, 0))])]
請確保符號正則運算式是正確的,因為我不確定股票市場的限制。
uj5u.com熱心網友回復:
您可以重新拆分。
檔案說,if capturing parentheses are used in pattern, then the text of all groups in the pattern are also returned as part of the resulting list
所以如果你使用r'([A-Z][a-z]{2} \d{1,2}, \d{4})'
stock = 'ford-motor-co'
stock_news_table = """\n\n\n\n\n\nFord Unloads More Shares in Electric-Vehicle Startup
Rivian\nBy The Wall Street Journal\xa0-\xa020 hours ago\nFord sold 7 millionRivian
shares at a price of $26.88, the company says. That followed an 8-million-share
sale earlier in the week at about the same price.\n\n\n\n\n \nFord sells shares in
EV maker Rivian for $188.2 million\nBy Reuters\xa0-\xa0May 14, 2022 (Reuters) -
Ford Motor (NYSE:F) Co sold 7 million shares of electric carmaker Rivian Automotive
Inc for about $188.2 million, or $26.88 apiece, the U.S. automaker said in a filing
on Friday. Ford now... \n\n\n\n\n\n\n\nFord sells sh"""
date_data = re.split(r'([A-Z][a-z]{2} \d{1,2}, \d{4})' , str(stock_news_table))
headline = stock_news_table
date_data
將回傳
['\n\n\n\n\n\nFord Unloads More Shares in Electric-Vehicle Startup \nRivian\nBy The Wall Street Journal\xa0-\xa020 hours ago\nFord sold 7 millionRivian \nshares at a price of $26.88, the company says. That followed an 8-million-share \nsale earlier in the week at about the same price.\n\n\n\n\n \nFord sells shares in \nEV maker Rivian for $188.2 million\nBy Reuters\xa0-\xa0',
'May 14, 2022',
' (Reuters) - \nFord Motor (NYSE:F) Co sold 7 million shares of electric carmaker Rivian Automotive \nInc for about $188.2 million, or $26.88 apiece, the U.S. automaker said in a filing \non Friday. Ford now... \n\n\n\n\n\n\n\nFord sells sh']
轉載請註明出處,本文鏈接:https://www.uj5u.com/houduan/484549.html
上一篇:Python從字串決議鍵=值
