我在網站上抓取了一些資料,這是我的腳本:
import warnings
warnings.filterwarnings("ignore")
import re
import requests
from requests import get
from bs4 import BeautifulSoup
import os
import pandas as pd
import numpy as np
import shutil
from selenium import webdriver
from selenium.webdriver.support.select import Select
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.keys import Keys
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Firefox/78.0',
'Accept': 'text/html,application/xhtml xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
'Accept-Language': 'fr,fr-FR;q=0.8,en-US;q=0.5,en;q=0.3',
'Referer': 'https://www.espncricinfo.com/',
'Upgrade-Insecure-Requests': '1',
'Connection': 'keep-alive',
'Pragma': 'no-cache',
'Cache-Control': 'no-cache',
}
PATH = "driver\chromedriver.exe"
options = webdriver.ChromeOptions()
options.add_argument("--disable-gpu")
#options.add_argument('enable-logging')
options.add_argument("start-maximized")
#options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('useAutomationExtension', False)
options.add_experimental_option('excludeSwitches', ['enable-logging'])
driver = webdriver.Chrome(options=options, executable_path=PATH)
driver.execute_script("Object.defineProperty(navigator, 'webdriver', {get: () => undefined})")
driver.execute_cdp_cmd('Network.setUserAgentOverride', {"userAgent": 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.53 Safari/537.36'})
url = 'https://www.boursorama.com/'
driver.get(url)
cookie = WebDriverWait(driver, 50).until(EC.element_to_be_clickable((By.XPATH, '//*[@id="didomi-notice-agree-button"]')))
try:
cookie.click()
except:
pass
df = pd.read_excel('liste.xlsx')
df2 = pd.DataFrame(df)
df3 = df2['Entreprises'].values.tolist()
currencies = []
for i in df3:
try :
print(i)
searchbar = WebDriverWait(driver, 50).until(EC.element_to_be_clickable((By.XPATH, 'html/body/div[6]/div[3]/div[2]/ol/li[1]/button')))
searchbar.click()
searchbar2 = WebDriverWait(driver, 50).until(EC.element_to_be_clickable((By.XPATH, '/html/body/div[6]/div[1]/div[2]/form/div/input')))
searchbar2.click()
searchbar2.send_keys(i '\n')
time.sleep(2)
links = driver.find_elements_by_xpath('//*[@id="main-content"]/div/div/div[4]/div[1]/div[3]/div/div/div[2]/div[1]/div/div[3]/div/div[1]/div/table/tbody/tr[1]/td[1]/div/div[2]/a')
for k in links:
data = k.get_attribute("href")
results = requests.get(data)
soup = BeautifulSoup(results.text, "html.parser")
currency = soup.find('span', class_= 'c-instrument c-instrument--last').text
currencies.append(currency)
except :
print(i)
searchbar = WebDriverWait(driver, 50).until(EC.element_to_be_clickable((By.XPATH, 'html/body/div[6]/div[3]/div[2]/ol/li[1]/button')))
searchbar.click()
searchbar2 = WebDriverWait(driver, 50).until(EC.element_to_be_clickable((By.XPATH, '/html/body/div[6]/div[1]/div[2]/form/div/input')))
searchbar2.click()
searchbar2.send_keys(i '\n')
time.sleep(2)
url2 = driver.current_url
results = requests.get(url2)
soup = BeautifulSoup(results.text, "html.parser")
currency = soup.find('span', class_= 'c-instrument c-instrument--last').text
currencies.append(currency)
print(currencies)
liste.xlsx只是一個帶有企業名稱的 excel 檔案,用于我的回圈:
串列
這是我的輸出:
TotalEnergies
TotalEnergies
Engie
Engie
BNP
BNP
['45.59', '11.07', '49.03']
我不明白,似乎我的腳本也try可以except。我按預期有 3 個輸出,但每個企業列印兩次。我的目標是:如果需要,執行 try,否則執行 except。
我可以改進我的代碼以使其只執行一個嗎?需要的那個。
因為有時在搜索企業時,您需要更加具體,并且該網站為您提供了一些替代方案,因此此代碼:
try :
print(i)
searchbar = WebDriverWait(driver, 50).until(EC.element_to_be_clickable((By.XPATH, 'html/body/div[6]/div[3]/div[2]/ol/li[1]/button')))
searchbar.click()
searchbar2 = WebDriverWait(driver, 50).until(EC.element_to_be_clickable((By.XPATH, '/html/body/div[6]/div[1]/div[2]/form/div/input')))
searchbar2.click()
searchbar2.send_keys(i '\n')
time.sleep(2)
links = driver.find_elements_by_xpath('//*[@id="main-content"]/div/div/div[4]/div[1]/div[3]/div/div/div[2]/div[1]/div/div[3]/div/div[1]/div/table/tbody/tr[1]/td[1]/div/div[2]/a')
for k in links:
data = k.get_attribute("href")
results = requests.get(data)
soup = BeautifulSoup(results.text, "html.parser")
currency = soup.find('span', class_= 'c-instrument c-instrument--last').text
currencies.append(currency)
有時您在搜索欄上寫下正確的名稱,網站會立即進入所需的頁面,因此代碼如下:
except :
print(i)
searchbar = WebDriverWait(driver, 50).until(EC.element_to_be_clickable((By.XPATH, 'html/body/div[6]/div[3]/div[2]/ol/li[1]/button')))
searchbar.click()
searchbar2 = WebDriverWait(driver, 50).until(EC.element_to_be_clickable((By.XPATH, '/html/body/div[6]/div[1]/div[2]/form/div/input')))
searchbar2.click()
searchbar2.send_keys(i '\n')
time.sleep(2)
url2 = driver.current_url
results = requests.get(url2)
soup = BeautifulSoup(results.text, "html.parser")
currency = soup.find('span', class_= 'c-instrument c-instrument--last').text
currencies.append(currency)
但是如何讓腳本檢查兩種情況但只執行需要的一種?提高時間表現?
uj5u.com熱心網友回復:
“我的目??標是:如果需要,執行 try,否則執行 except。”
這正是它正在做的事情。我建議研究如何除錯代碼。您將能夠逐行運行它,并遵循邏輯,您會看到發生了什么。
當您這樣做時try/except,它會“嘗試”執行try塊中的腳本。如果成功,則跳過該except塊。如果它在塊內的某個點失敗try,它會繼續執行例外腳本。
它似乎同時運行的原因是,從技術上講,正如我上面所描述的,它確實同時運行。print()由于您的陳述的位置,您看到此印刷品兩次。
它進入try塊,然后print(i)在開頭列印 i 。在try之后的塊中的某個點print(i),引發錯誤/例外,然后它轉到except塊,再次print(i)在該塊的開頭列印 i 。
如果您希望它查找條件并僅執行您想要的條件,那么您需要使用if塊來檢查條件,而不是try/except.
話雖如此,與使用 Selenium 進行渲染相比,從源獲取資料要高效得多。您還可以獲得更多資料。我不知道你到底想從回復中得到什么,但這就是你會得到的:點擊這里
代碼:
import requests
from bs4 import BeautifulSoup
df3 = ['TotalEnergies','Engie','BNP']
currencies = []
for i in df3:
url = f'https://www.boursorama.com/recherche/ajax?query={i}&searchId='
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
symbol = soup.find('a', {'class':'search__list-link'})['href'].split('/')[-2]
url = 'https://www.boursorama.com/bourse/action/graph/ws/GetTicksEOD'
payload = {
'symbol': symbol,
'length': '1',
'period': '0',
'guid': ''}
jsonData = requests.get(url, params=payload).json()
data = jsonData['d']
name = data['Name']
qd = data ['qd']['c']
currencies.append(qd)
print(f'{name}: {qd}')
print(currencies)
輸出:
TOTALENERGIES: 45.59
ENGIE: 11.07
BNP PARIBAS: 49.03
[45.59, 11.07, 49.03]
轉載請註明出處,本文鏈接:https://www.uj5u.com/qita/443349.html
標籤:python-3.x 硒 美丽的汤
上一篇:棄用警告:firefox_binary已被棄用,請在SeleniumPython中使用引數firefox_binary傳入服務物件
