下面是我的代碼。
import requests
import re
import pandas as pd
from bs4 import BeautifulSoup
r = requests.get("https://www.gutenberg.org/browse/scores/top")
soup = BeautifulSoup(r.content,"lxml")
List1 = soup.find_all('ol')
List1
newlist = []
for List in List1:
ulList = List.find_all('li')
extend_list = []
for li in ulList:
#extend_list = []
for link in li.find_all('a'):
a = link.get_text()
print(a)
我的輸出是

我想將輸出轉換為串列串列
[['A Room with a View by E. M. Forster (37480)'], ['Middlemarch by George Eliot (34900)'],['Little Women; Or, Meg, Jo, Beth, and Amy by Louisa May Alcott (31929)']]將串列分成兩部分
[["A Room with a View by E. M. Forster", "37480"], ["Middlemarch by George Eliot", "34900"],["Little Women; Or, Meg, Jo, Beth, and Amy by Louisa May Alcott", "31929"]]將資料加載到資料框中

uj5u.com熱心網友回復:
您可以使用簡短的正則運算式一步完成,并且str.extract:
df = (pd.Series([e.text for e in soup.select('ol a')])
.str.extract(r'(.*) \((\d )\)$')
.set_axis(['Ebooks', 'Code'], axis=1)
)
如果您需要串列的中間串列:
import re
L = [list(m.groups()) for e in soup.select('ol a')
if (m:=re.search(r'(.*) \((\d )\)$', e.text))]
df = pd.DataFrame(L, columns=['Ebooks', 'Code'])
輸出:
Ebooks Code
0 A Room with a View by E. M. Forster 37480
1 Middlemarch by George Eliot 34900
2 Little Women; Or, Meg, Jo, Beth, and Amy by Lo... 31929
3 The Enchanted April by Elizabeth Von Arnim 31648
4 The Blue Castle: a novel by L. M. Montgomery 30646
.. ... ...
395 Hapgood, Isabel Florence 12240
396 Mill, John Stuart 12223
397 Marlowe, Christopher 11760
398 Wharton, Edith 11728
399 Burnett, Frances Hodgson 11630
[400 rows x 2 columns]
uj5u.com熱心網友回復:
簡化代碼,同時選擇更具體的元素:
for e in soup.select('ol a'):
data.append({
'Ebook':e.text.split('(')[0].strip(),
'Code':e.text.split('(')[-1].strip(')')
})
例子
import requests
import pandas as pd
from bs4 import BeautifulSoup
r = requests.get("https://www.gutenberg.org/browse/scores/top")
soup = BeautifulSoup(r.content,"lxml")
data = []
for e in soup.select('ol a'):
data.append({
'Ebook':e.text.split('(')[0].strip(),
'Code':e.text.split('(')[-1].strip(')')
})
pd.DataFrame(data)
輸出
| 電子書 | 代碼 | |
|---|---|---|
| 0 | EM Forster 的景觀房間 | 37480 |
| 1 | 喬治·艾略特的米德爾馬奇 | 34900 |
| 2 | 小女人; 或者,Louisa May Alcott 的 Meg、Jo、Beth 和 Amy | 31929 |
| 3 | 伊麗莎白·馮·阿尼姆的迷人四月 | 31648 |
| 4 | 藍色城堡:LM蒙哥馬利的小說 | 30646 |
| 5 | 白鯨記; 或者,赫爾曼梅爾維爾的鯨魚 | 30426 |
| 6 | 威廉莎士比亞全集威廉莎士比亞 | 30266 |
...
轉載請註明出處,本文鏈接:https://www.uj5u.com/caozuo/526500.html
下一篇:在多個資料幀上操作
