我想要的輸出
我想在一個單元格中附加作者串列,我可以得到它,但并非所有作者都在網站上提到了角色,所以我想讓作者與它的角色有角色。我想要的出在上面。見鏈接。這對我來說很棘手,有人可能能夠解決這個問題。期待答案,我將不勝感激。謝謝你。
from selenium import webdriver
import pandas as pd
driver = webdriver.Chrome()
site = 'https://www.goodreads.com/search?q=chughtai&qid=WzdWh5nG8z'
driver.get(site)
driver.maximize_window()
roles = []
authors = []
main = driver.find_elements_by_tag_name('tr')
for i in main:
role = []
author = []
con = i.find_elements_by_xpath('.//div[@]')
try:
for n in con:
auth = n.find_element_by_xpath('.//a[@]/span').text
rol = n.find_element_by_xpath('.//span[@]').text
author.append(auth)
if rol:
role.append(rol)
one = ', '.join(role)
roles.append(auth ' ' rol)
else:
continue
one_cell = ', '.join(author)
authors.append(one_cell)
except:
pass
a = {'Author Name': authors,'Role': roles}
df = pd.DataFrame.from_dict(a, orient='index')
df = df.transpose()
df.to_csv("only_roles.csv", index=False)
print(df)
uj5u.com熱心網友回復:
不知何故,我無法通過您的代碼來獲取所有書籍,因此我對其進行了修改,請從我的版本中提取有用的部分并帶到您的版本中。我在代碼注釋中的解釋。
from selenium import webdriver
from selenium.common.exceptions import NoSuchElementException
import pandas as pd
driver = webdriver.Chrome('...')
site = 'https://www.goodreads.com/search?q=chughtai&qid=WzdWh5nG8z'
driver.get(site)
driver.maximize_window()
data = [] # pandas can convert a list of dictionaries to a dataframe. Dictionary keys are column names.
for tr in driver.find_elements_by_tag_name('tr'):
# one tr for one book
# I chose the following as check for a book because it worked for the webpage
if tr.get_attribute('itemtype') != 'http://schema.org/Book':
continue # Not a book
temp = {'Author Names': [], 'Role': []}
for con in tr.find_elements_by_class_name('authorName__container'):
# one container for one author
try:
authorName = con.find_element_by_class_name('authorName').find_element_by_tag_name('span').text
temp['Author Names'].append(authorName)
authorRole = con.find_element_by_class_name('role').text
temp['Role'].append(f'{authorName} {authorRole}')
except NoSuchElementException:
pass # ignore this one
except Exception as e:
print(e) # print this one for inspection
# convert lists to strings
data.append({k: ','.join(v) for k,v in temp.items()})
df = pd.DataFrame(data)
print(df)
Author Names \
0 Ismat Chughtai,M. Asaduddin
1 Ismat Chughtai
2 Muhammad Umar Memon,M. Asaduddin,Ismat Chughtai
3 Ismat Chughtai,Tahira Naqvi
4 Ismat Chughtai,Amar Shahid
5 Ismat Chughtai,Tahira Naqvi,Syeda S. Hameed
6 Ismat Chughtai
7 Hephaestus Books
8 Ismat Chughtai,Tahira Naqvi
9 Rakhshanda Jalil
10 Ismat Chughtai
11 Ismat Chughtai
12 Ismat Chughtai
13 Azeem Baig Chughtai
14 Ismat Chughtai
15 Ismat Chughtai
16 Ismat Chughtai
17 Ismat Chughtai
18 Ismat Chughtai,Tahira Naqvi
19 Hephaestus Books
Role
0 M. Asaduddin (Translator)
1
2 Muhammad Umar Memon (Translator),M. Asaduddin ...
3 Tahira Naqvi (Translator)
4 Amar Shahid (Compiler)
5 Tahira Naqvi (Translator),Syeda S. Hameed (Tra...
6
7
8 Tahira Naqvi (Translator)
9 Rakhshanda Jalil (Editor)
10
11
12
13
14
15
16
17
18 Tahira Naqvi (Translator)
19
轉載請註明出處,本文鏈接:https://www.uj5u.com/shujuku/431925.html
標籤:Python python-3.x 熊猫 硒 网页抓取
上一篇:為什么用os/exec呼叫的這個命令中的'\'無效?
下一篇:缺少引數“by”,沒有默認值
