參考上一個問題,我希望遍歷字典串列并將輸出轉換為新的資料框。現在,我有一個包含兩列的 CSV。一列包含一個詞,另一列包含一個 URL(見下文)。
| Keyword | URL |
| -------- | -------------- |
| word1 | www.example.com/topic-1 |
| word2 | www.example.com/topic-2 |
| word3 | www.example.com/topic-3 |
| word4 | www.example.com/topic-4 |
我已將此 CSV 轉換為字典串列,并嘗試遍歷這些串列以計算該詞在 URL 上顯示的頻率。
我的代碼可以在這個 colab notebook 中看到。
我希望有一個看起來像這樣的最終輸出:
| Keyword | URL | Count |
|:---- |:------: | -----:|
| word1 | www.example.com/topic-1 | 1003 |
| word2 | www.example.com/topic-2 | 405 |
| word3 | www.example.com/topic-3 | 123 |
| word4 | www.example.com/topic-4 | 554 |
的“計數”列是頻率“WORD1”上“www.example.com/topic-1”。
任何幫助表示贊賞!
uj5u.com熱心網友回復:
使用DataFrame.apply創建使用其他列的功能的新列:
import pandas as pd
import requests
df = pd.DataFrame({'Keyword': ['code', 'apply', 'midnight'],
'URL': ['https://stackoverflow.com/questions/70581444/word-frequency-by-iterating-over-a-list-of-dictionaries-python/',
'https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.apply.html',
'https://stackoverflow.com/questions/62694219/minimum-number-of-platforms-required-for-a-railway-station'
]})
print(df)
# Keyword URL
# 0 code https://stackoverflow.com/questions/70581444/w...
# 1 apply https://pandas.pydata.org/docs/reference/api/p...
# 2 midnight https://stackoverflow.com/questions/62694219/m...
def get_count(row):
r = requests.get(row['URL'], allow_redirects=False)
count = r.text.lower().count(row['Keyword'].lower())
return count
df['Count'] = df.apply(get_count, axis=1)
print(df)
# Keyword URL Count
# 0 code https://stackoverflow.com/questions/70581444/w... 32
# 1 apply https://pandas.pydata.org/docs/reference/api/p... 32
# 2 midnight https://stackoverflow.com/questions/62694219/m... 18
轉載請註明出處,本文鏈接:https://www.uj5u.com/yidong/404578.html
標籤:
