創建公司新聞故事和匹配日期的串列-有解無憂

我正在嘗試創建一個串列，該串列將公司股票代碼與新聞標題及其相應日期分組。

資料的頭部基本上如下所示：

{'ford-motor-co': "\n\n\n\n\n\nFord Unloads More Shares in Electric-Vehicle Startup 
Rivian\nBy The Wall Street Journal\xa0-\xa020 hours ago\nFord sold 7 millionRivian 
shares at a price of $26.88, the company says. That followed an 8-million-share 
sale earlier in the week at about the same price.\n\n\n\n\n \nFord sells shares in 
EV maker Rivian for $188.2 million\nBy Reuters\xa0-\xa0May 14, 2022  (Reuters) - 
Ford Motor (NYSE:F) Co sold 7 million shares of electric carmaker Rivian Automotive 
Inc for about $188.2 million, or $26.88 apiece, the U.S. automaker said in a filing 
on Friday. Ford now... \n\n\n\n\n\n\n\nFord sells sh

我已經成功地提取了日期和股票代碼，但我無法弄清楚如何將日期與其相關的新聞標題分組。

parsed_data = []

for stock , stock_news_table in stock_news_tables.items():

    date_data = re.findall(r'[A-Z][a-z]{2} \d{1,2}, \d{4}' , str(stock_news_table))

    headline = stock_news_table

    #print(date_data)

    parsed_data.append([stock , date_data , headline])

到目前為止的輸出如下所示。如您所見，標題在有多個新行的地方被拆分： \n\n\n\n 。

 [['ford-motor-co',
  ['May 14, 2022',
   'May 14, 2022',
   'May 13, 2022',
   'May 13, 2022',
   'May 13, 2022',
   'May 13, 2022',
   'May 12, 2022',
   'May 12, 2022',
   'May 12, 2022'],
  "\n\n\n\n\n\nFord Unloads More Shares in Electric-Vehicle Startup Rivian\nBy The 
   Wall Street Journal\xa0-\xa020 hours ago\nFord sold 7 million Rivian shares at a 
   price of $26.88, the company says. That followed an 8-million-share sale earlier 
   in the week at about the same price.\n\n\n\n\n \nFord sells shares in EV maker 
   Rivian for $188.2 million\nBy Reuters\xa0-\xa0May 14, 2022  (Reuters) - Ford 
   Motor (NYSE:F) Co sold 7 million shares of electric carmaker Rivian Aut

uj5u.com熱心網友回復：

我設法使用dateparser自然語言日期決議器和 2 個不同的正則運算式解決了您的問題。希望這就足夠了。

首先，安裝dateparaser：

pip install dateparser

然后運行代碼：

import collections, re, dateparser
Stock = collections.namedtuple("Stock", ["name", "symbol", "headlines"])

# Remember, '.' is not multiline, equiv to '[^\n] '
headline_re =re.compile(r"\n\n ?\n(?P<headline>. )\nBy . ?\xa0-\xa0(?P<date>[\w ,] )")
symbol_re = re.compile(r"\(([A-Z]{1,4}:[A-Z]{1,4})\)")
input_data = {'ford-motor-co':(
    "\n\n\n\n\n\nFord Unloads More Shares in Electric-Vehicle Startup "
    "Rivian\nBy The Wall Street Journal\xa0-\xa020 hours ago\nFord sold 7 millionRivian "
    "shares at a price of $26.88, the company says. That followed an 8-million-share "
    "sale earlier in the week at about the same price.\n\n\n\n\n \nFord sells shares in "
    "EV maker Rivian for $188.2 million\nBy Reuters\xa0-\xa0May 14, 2022  (Reuters) - "
    "Ford Motor (NYSE:F) Co sold 7 million shares of electric carmaker Rivian Automotive "
    "Inc for about $188.2 million, or $26.88 apiece, the U.S. automaker said in a filing "
    "on Friday. Ford now... \n\n\n\n\n\n\n\nFord sells sh")}

stocks = []
for name, data in input_data.items():
    headlines = []
    for match in headline_re.finditer(data):
        date_str = match.group("date")
        date = dateparser.parse(date_str)
        headlines.append((match.group("headline"), date))
    symbol = symbol_re.search(data).group(1)
    stocks.append(Stock(name, symbol, headlines))

產出（庫存）：

[Stock(name='ford-motor-co', symbol='NYSE:F', headlines=[('Ford Unloads More Shares in Electric-Vehicle Startup Rivian', datetime.datetime(2022, 5, 14, 20, 58, 28, 30552)), ('Ford sells shares in EV maker Rivian for $188.2 million', datetime.datetime(2022, 5, 14, 0, 0))])]

請確保符號正則運算式是正確的，因為我不確定股票市場的限制。

uj5u.com熱心網友回復：

您可以重新拆分。

檔案說，if capturing parentheses are used in pattern, then the text of all groups in the pattern are also returned as part of the resulting list

所以如果你使用r'([A-Z][a-z]{2} \d{1,2}, \d{4})'

stock = 'ford-motor-co'
stock_news_table =  """\n\n\n\n\n\nFord Unloads More Shares in Electric-Vehicle Startup 
Rivian\nBy The Wall Street Journal\xa0-\xa020 hours ago\nFord sold 7 millionRivian 
shares at a price of $26.88, the company says. That followed an 8-million-share 
sale earlier in the week at about the same price.\n\n\n\n\n \nFord sells shares in 
EV maker Rivian for $188.2 million\nBy Reuters\xa0-\xa0May 14, 2022  (Reuters) - 
Ford Motor (NYSE:F) Co sold 7 million shares of electric carmaker Rivian Automotive 
Inc for about $188.2 million, or $26.88 apiece, the U.S. automaker said in a filing 
on Friday. Ford now... \n\n\n\n\n\n\n\nFord sells sh"""
date_data = re.split(r'([A-Z][a-z]{2} \d{1,2}, \d{4})' , str(stock_news_table))
headline = stock_news_table
date_data

將回傳

['\n\n\n\n\n\nFord Unloads More Shares in Electric-Vehicle Startup \nRivian\nBy The Wall Street Journal\xa0-\xa020 hours ago\nFord sold 7 millionRivian \nshares at a price of $26.88, the company says. That followed an 8-million-share \nsale earlier in the week at about the same price.\n\n\n\n\n \nFord sells shares in \nEV maker Rivian for $188.2 million\nBy Reuters\xa0-\xa0',
 'May 14, 2022',
 '  (Reuters) - \nFord Motor (NYSE:F) Co sold 7 million shares of electric carmaker Rivian Automotive \nInc for about $188.2 million, or $26.88 apiece, the U.S. automaker said in a filing \non Friday. Ford now... \n\n\n\n\n\n\n\nFord sells sh']

轉載請註明出處，本文鏈接：https://www.uj5u.com/houduan/484549.html

標籤：Python 列表解析

上一篇：Python從字串決議鍵=值

下一篇：使用LinqexceptBy函式的泛型類中出現奇怪的編譯器錯誤