所以我嘗試通過將其提供到將轉換為串列的資料幀來抓取多個新聞。但是當我插入到資料幀中時,它只給出最后的抓取值,但列印顯示不同的結果。我的示例 df 是這樣的
df = {data:['https://www.liputan6.com/bisnis/read/4661489/erick-thohir-apresiasi-transformasi-digital-pos-indonesia],
[https://ekonomi.bisnis.com/read/20211010/98/1452514/hari-pos-sedunia-pos-indonesia-kasih-diskon-70-persen-paket-kilat]}
這是我的代碼
import pandas as pd
import newspaper
from newspaper import Article
df = pd.read_excel(' 1.xlsx')
urls = df['data'].to_list()
for url in urls:
try:
a = Article(url, language='id')
a.download()
a.parse()
author = a.authors
dates = a.publish_date
add_data = a.additional_data
text = a.text
tag = a.tags
title = a.title
keywords = a.keywords
new_df = pd.DataFrame({'author':[author]}) #it need in [] because it can be multiple
print(author,dates,add_data,text,tag,title,keywords)
except Exception as e:
print(e)
當我運行print(author)它顯示結果
['S. Dian Andryanto', 'Reporter', 'Editor']
['Ali Akhmad Noor Hidayat', 'Reporter', 'Editor']
但是當我插入到資料框時,它們只回傳最后一個值
new_data = {"author":['Ali Akhmad Noor Hidayat', 'Reporter', 'Editor']}
任何人都可以解釋如何將我的所有作者插入到資料框中?
uj5u.com熱心網友回復:
您正在遍歷urls串列并在每次將整個 DataFrame 存盤在new_df. 為避免這種情況,您可以創建一個外部字典,并在回圈結束時創建整個 DataFrame,如下面的代碼所示:
import pandas as pd
import newspaper
from newspaper import Article
df = pd.read_excel(' 1.xlsx')
urls = df['data'].to_list()
all_authors = {"author": []}
for url in urls:
try:
a = Article(url, language='id')
a.download()
a.parse()
author = a.authors
dates = a.publish_date
add_data = a.additional_data
text = a.text
tag = a.tags
title = a.title
keywords = a.keywords
all_authors['author'].append(author) #it need in [] because it can be multiple
except Exception as e:
print(e)
new_df = pd.DataFrame(data=all_authors)
uj5u.com熱心網友回復:
收集new_df在一個串列中并在最后連接它們。
我稍微修改了您的代碼,因為捕獲所有例外是一個壞主意,請newspaper.ArticleException改用。
urls = ['https://www.liputan6.com/bisnis/read/4661489/erick-thohir-apresiasi-transformasi-digital-pos-indonesia',
'https://ekonomi.bisnis.com/read/20211010/98/1452514/hari-pos-sedunia-pos-indonesia-kasih-diskon-70-persen-paket-kilat']
data = []
for url in urls:
try:
a = Article(url, language='id')
a.download()
a.parse()
except newspaper.ArticleException as e:
print(e)
else:
author = a.authors
dates = a.publish_date
add_data = a.additional_data
text = a.text
tag = a.tags
title = a.title
keywords = a.keywords
new_df = pd.DataFrame({'author':[author]})
data.append(new_df)
df = pd.concat(data)
轉載請註明出處,本文鏈接:https://www.uj5u.com/houduan/317049.html
