我嘗試了很多方法,但都沒有解決,他們給我顯示的是錯誤。
所有的陣列必須是相同的長度。
from bs4 import BeautifulSoup
import requests
import pandas as pd
評論 = []
評分 = []
headers = {
"User-Agent"。"Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:66.0) Gecko/20100101 Firefox/66.0" ,
"Accept-Encoding": "gzip, deflate",
"接受": "text/html,application/xhtml xml,application/xml;q=0.9,*/*;q=0.8"/span>。
"DNT"。"1",
"連接": "close",
"Upgrade-Insecure-Requests": "1"。
}
for page in range(1, 5)。
r = requests.get(
"https://www.amazon.com/s?k=redmi&page=2&qid=1631528810&ref=sr_pg_={page}"/span>.format(
page=page
),
headers=頭檔案。
)
soup = BeautifulSoup(r.content, "lxml"/span>)
for d in soup.findAll("div"/span>, attrs={"class"/span>: "s-result-item"})。)
rating = d.find("span"/span>, attrs={"class"/span>: "a-icon-alt"})
if rating is not None:
ratings.append(rating.text)
reviews = d.find("span"/span>, class_="a-size-base"/span>)
if reviews is not None:
review.append(reviews.text)
df = pd.DataFrame({"rating": ratings, "reports": review})
df.to_csv(" products .csv", index=False, encoding="utf-8")
uj5u.com熱心網友回復:
ratings和reviews的長度不一樣,你搜刮了錯誤的容器。我做了必要的修改,現在它應該可以作業了:
from bs4 import BeautifulSoup
import requests
import pandas as pd
評論 = []
評分 = []
headers = {
"User-Agent"。"Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:66.0) Gecko/20100101 Firefox/66.0" ,
"Accept-Encoding": "gzip, deflate",
"接受": "text/html,application/xhtml xml,application/xml;q=0.9,*/*;q=0.8"/span>。
"DNT"。"1",
"連接": "close",
"Upgrade-Insecure-Requests": "1"。
}
for page in range(1, 5)。
cookies = {'session': '17ab96bd8ffbe8ca58a78657a918558'}
r = requests.get(
"https://www.amazon.com/s?k=redmi&page=2&qid=1631528810&ref=sr_pg_={page}"/span>.format(
page=page
),
headers=頭檔案。
cookies =cookies
)
soup = BeautifulSoup(r.content, "lxml"/span>)
for d in soup.select(".s-result-item[data-component-type='s-search-result']") 。
rating = d.find("span"/span>, attrs={"class"/span>: "a-icon-alt"})
if rating is not None:
ratings.append(rating.text)
else:
ratings.append("-")
reviews = d.find("span"/span>, class_="a-size-base"/span>)
if reviews is not None and rating is not None:
review.append(reviews.text)
else:
review.append("-")
df = pd.DataFrame({"rating": ratings, "reports": review})
df.to_csv(" products.csv", index=False, encoding="utf-8")
轉載請註明出處,本文鏈接:https://www.uj5u.com/yidong/320255.html
標籤:
