這肯定是缺乏知識的問題,因為我通常是新手。我試圖用這段代碼完成的是抓取我正在完成的網頁上的所有資料。問題是在回圈繼續之前,我希望 pandas 將當前 position_text 變數寫入 ["Positions"] 列。我通過 print 陳述句確認它正在將我想要寫入的內容準確地寫入新的 ["Position"] 列,但它只是將最后一個實體寫入 ["Position"] ,即 "C"
鏈接:https ://www.fantasypros.com/daily-fantasy/nba/fanduel-defense-vs-position.php
df_results = pd.DataFrame()
?
follow_loop=list(range(1,7))
for i in follow_loop:
xpath = '//*[@id="main-container"]/div/div/div/div[4]/div[1]/ul/li['
xpath = str(i)
xpath = "]"
driver.find_element(By. XPATH,(xpath)).click()
sleep (2)
driver.execute_script("window.scrollTo(1,1200)")
sleep(2)
driver.execute_script("window.scrollTo(1,-1200)")
?
html=driver.page_source
?
soup = BeautifulSoup(html,'html.parser')
?
stats_table=soup.find(id="data-table")
position='//*[@id="main-container"]/div/div/div/div[4]/div[1]/ul/li['
position = str(i)
position = "]"
position_text = driver.find_element(By. XPATH,(position)).text
df_results = df_results.append(pd.read_html(str(stats_table)))
df_results["Position"] = position_text
print(position_text)
sleep (2)
ALL
PG
SG
SF
PF
C
uj5u.com熱心網友回復:
這是在一個大資料框中從所有表中獲取資料的一種方法:
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import time as t
import pandas as pd
pd.set_option('display.max_columns', None)
pd.set_option('display.max_colwidth', None)
big_df = pd.DataFrame()
chrome_options = Options()
chrome_options.add_argument("--no-sandbox")
chrome_options.add_argument('disable-notifications')
chrome_options.add_argument("window-size=1280,720")
webdriver_service = Service("chromedriver/chromedriver") ## path to where you saved chromedriver binary
driver = webdriver.Chrome(service=webdriver_service, options=chrome_options)
wait = WebDriverWait(driver, 20)
url = "https://www.fantasypros.com/daily-fantasy/nba/fanduel-defense-vs-position.php"
driver.get(url)
tables_list = wait.until(EC.presence_of_all_elements_located((By.XPATH, '//ul[@]/li')))
for x in tables_list:
x.click()
print('selected', x.text)
t.sleep(2)
table = wait.until(EC.element_to_be_clickable((By.XPATH, '//table[@id="data-table"]')))
df = pd.read_html(table.get_attribute('outerHTML'))[0]
df['Category'] = x.text.strip()
big_df = pd.concat([big_df, df], axis=0, ignore_index=True)
print('done, moving to next table')
print(big_df)
big_df.to_csv('fanduel.csv')
這會將資料保存到 csv 檔案中,并在終端中顯示:
Team PTS REB AST 3PM STL BLK TO FD PTS Category
0 HOUHouston Rockets 23.54 9.10 5.10 2.54 1.88 1.15 2.65 48.55 ALL
1 OKCOklahoma City Thunder 22.22 9.61 5.19 2.70 1.67 1.18 2.52 47.57 ALL
2 PORPortland Trail Blazers 22.96 8.92 5.31 2.74 1.63 0.99 2.65 46.84 ALL
3 SACSacramento Kings 23.00 9.10 5.03 2.58 1.61 0.95 2.50 46.65 ALL
4 ORLOrlando Magic 22.35 9.39 4.94 2.62 1.57 1.04 2.50 46.36 ALL
... ... ... ... ... ... ... ... ... ... ...
175 DENDenver Nuggets 22.96 12.91 3.68 0.96 1.21 1.76 2.62 50.26 C
176 PHIPhiladelphia 76ers 21.95 13.35 3.01 1.15 1.14 1.94 2.07 49.66 C
177 BOSBoston Celtics 19.52 14.46 3.58 0.61 1.40 1.82 2.80 49.10 C
178 NYKNew York Knicks 19.31 14.48 3.02 1.07 1.02 1.98 2.26 47.96 C
179 MIAMiami Heat 19.00 14.44 2.95 0.64 1.24 1.55 2.71 46.41 C
轉載請註明出處,本文鏈接:https://www.uj5u.com/shujuku/514464.html
