我正在嘗試從多個靜態網頁的表中獲取值。這是韓語動詞的動詞變位資料:
在這里查看代碼:
import requests
from bs4 import BeautifulSoup
import pandas as pd
import csv
# create csv file
outfile = open("scrape.csv","w",newline='')
writer = csv.writer(outfile)
## define first URL to grab conjugation names
url1 = 'https://koreanverb.app/?search=하다'
# define dataframe columns
df = pd.DataFrame(columns=['conjugation name'])
# get URL content
response = requests.get(url1)
soup = BeautifulSoup(response.content, 'html.parser')
# get table with all verb conjugations
results = soup.find("div", class_="table-responsive")
##### GET CONJUGATIONS AND APPEND TO CSV
# define URLs
urls = ['https://koreanverb.app/?search=하다',
'https://koreanverb.app/?search=먹다',
'https://koreanverb.app/?search=마시다']
# loop to get data
for url in urls:
response = requests.get(url)
soup2 = BeautifulSoup(response.content, 'html.parser')
# get table with all verb conjugations
results2 = soup2.find("div", class_="table-responsive")
# get dictionary form of verb/adjective
verb_results = soup2.find('dl', class_='dl-horizontal')
verb_title = verb_results.find('dd')
verb_title_text = verb_title.text
job_elements = results2.find_all("tr", class_="conjugation-row")
for job_element in job_elements:
conjugation_name = job_element.find("td", class_="conjugation-name")
conjugation_korean = conjugation_name.find_next_sibling("td")
conjugation_name_text = conjugation_name.text
conjugation_korean_text = conjugation_korean.text
data_column = pd.DataFrame({ 'conjugation name': [conjugation_name_text],
verb_title_text: [conjugation_korean_text],
})
#data_column = pd.DataFrame({verb_title_text: [conjugation_korean_text]})
df = df.append(data_column, ignore_index = True)
# save to csv
df.to_csv('scrape.csv')
outfile.close()
print('Verb Conjugations Collected and Appended to CSV, one per column')
uj5u.com熱心網友回復:
獲取所有 job_elements usingfind_all()因為find()只回傳第一次出現并在for如下回圈中迭代它們。
job_elements = results.find_all("tr", class_="conjugation-row")
for job_element in job_elements:
conjugation_name = job_element.find("td", class_="conjugation-name")
conjugation_korean = conjugation_name.find_next_sibling("td")
conjugation_name_text = conjugation_name.text
conjugation_korean_text = conjugation_korean.text
# append element to data
df2 = pd.DataFrame([[conjugation_name_text,conjugation_korean_text]],columns=['conjugation_name','conjugation_korean'])
df = df.append(df2)
錯誤是您嘗試find()在型別為 的變數上使用的地方list。
隨著你的腳本越來越大,我做了一些修改,比如使用get_conjugations()函式和一些易于理解的專有名稱。首先,將conjugation_names和conjugation_korean_names添加到pandas Dataframe列中,然后隨后添加其他列 ( korean0, korean1...)。
import requests
from bs4 import BeautifulSoup
import pandas as pd
# function to parse the html data & get conjugations
def get_conjugations(url):
#set return lists
conjugation_names = []
conjugation_korean_names = []
#get html text
html = requests.get(url).text
#parse the html text
soup = BeautifulSoup(html, 'html.parser')
#get table
table = soup.find("div", class_="table-responsive")
table_rows = table.find_all("tr", class_="conjugation-row")
for row in table_rows:
conjugation_name = row.find("td", class_="conjugation-name")
conjugation_korean = conjugation_name.find_next_sibling("td")
conjugation_names.append(conjugation_name.text)
conjugation_korean_names.append(conjugation_korean.text)
#return both lists
return conjugation_names, conjugation_korean_names
# create csv file
outfile = open("scrape.csv", "w", newline='')
urls = ['https://koreanverb.app/?search=하다',
'https://koreanverb.app/?search=먹다',
'https://koreanverb.app/?search=마시다']
# define dataframe columns
df = pd.DataFrame(columns=['conjugation_name', 'conjugation_korean', 'korean0', 'korean1'])
conjugation_names, conjugation_korean_names = get_conjugations(urls[0])
df['conjugation_name'] = conjugation_names
df['conjugation_korean'] = conjugation_korean_names
for index, url in enumerate(urls[1:]):
conjugation_names, conjugation_korean_names = get_conjugations(url)
#set column name
column_name = 'korean' str(index)
df[column_name] = conjugation_korean_names
#save to csv
df.to_csv('scrape.csv')
outfile.close()
# Print DONE
print('Export to CSV Complete')
輸出:
,conjugation_name,conjugation_korean,korean0,korean1
0,declarative present informal low,?,??,??
1,declarative present informal high,??,???,???
2,declarative present formal low,??,???,???
3,declarative present formal high,???,????,????
...
注意:這里假設不同 URL 中的元素順序相同。
轉載請註明出處,本文鏈接:https://www.uj5u.com/qiye/329524.html
