我正在用 Beautifulsoup 撰寫一個 python 腳本來廢棄 Stack Overflow 上的問題頁面。我只想從頁面中獲取所有問題標題以及投票。
對于每個問題,我使用同一個類獲得 3 個 div 元素(投票、答案、視圖)。它看起來像這樣:
<div class="question">
<div class="stats">
<span class="item-number">0</span>
<span class="item-unit">votes</span>
</div>
<div class="stats">
<span class="item-number">10</span>
<span class="item-unit">answer</span>
</div>
<div class="stats">
<span class="item-number">15</span>
<span class="item-unit">views</span>
</div>
</div>
我的 python 代碼看起來像這樣
from tkinter import DoubleVar
from urllib import response
import requests
from bs4 import BeautifulSoup
url = "https://stackoverflow.com/questions"
response = requests.get(url)
soup = BeautifulSoup(response.text, "html.parser")
questions = soup.select(".s-post-summary.js-post-summary")
for question in questions:
print(question.select_one(".question").getText())
# Need help select the votes
print(question.select_one(".item-number").getText())
由于選票、答案和視圖都共享相同的類,那么僅計算選票的最佳方法和最不脆弱的方法是什么?
uj5u.com熱心網友回復:
它正在作業
import requests
from bs4 import BeautifulSoup
url = "https://stackoverflow.com/questions"
response = requests.get(url)
soup = BeautifulSoup(response.text, "html.parser")
title = [x.get_text(strip=True) for x in soup.select('[] > a')]
print(title)
votes = [x.get_text(strip=True) for x in soup.select('div[] > span:nth-child(1)')]
print(votes)
輸出:
['React Native - expo/vector-icons typescript type definition for icon name', 'React 25 5 Clock is working but fails all tests', 'Add weekly tasks, monthly tasks in google spreadsheet', 'Count number of change in values in Pandas column', "React-Select: How do I update the selected option dropdown's defaultValue on selected value onChange?", 'Block execution over a variable (TTS Use-Case), other than log statements (spooky)', "'npm install firebase' hangs in wsl. runs fine in windows", 'Kubernetes Dns service sometimes not working', 'Neo4j similarity of single node with entire graph', 'What is this error message? ORA-00932: inconsistent datatypes: expected DATE got NUMBER', 'Why getChildrenQueryBuilder of NestedTreeRepository say Too few parameters: the query defines 2 parameters but you only bound 0', 'Is is a security issue that Paypal uses dynamic certificate to verify webhook notification?', 'MessageBox to autoclose after
a function done', 'Can someone clearly explain how this function is working?', 'Free open-sourced tools for obfuscating iOS app?', "GitHub page is not showing background image, FF console
shows couldn't load images", 'Is possible to build a MLP model with the tidymodels framework?', 'How do I embed an interactive Tableau visual into an R Markdown script/notebook on Kaggle?', 'Dimensionality reduction methods for data including categorical variables', 'Reaching localhost api from hosted static site', 'Finding the zeros of a two term exponential function with
python', 'optimizing synapse delta lake table not reducing the number of files', '(GAS) Email
Spreadsheet range based on date input in the cell', 'EXCEL Formula to find and copy cell based on criteria', 'how to write function reduce_dimensionality?', 'Semi-Radial type Volume Slider in WPF C#', 'tippy.js tool tips stop working after "window.reload()"', 'is there some slice indices must be integers on FFT opencv python? because i think my coding is correct', 'NoParameterFoundException', 'How to get the two Input control elements look exactly same in terms of background and border?', 'My code is wrong because it requests more data than necessary, how can i solve it?', 'Express Session Not Saving', 'Which value should I search for when changing the date by dragging in FullCalendar?', 'Non-constant expression specified where only constant
expressions are allowed', 'Cocoapods not updating even after latest version is installed', 'Ruby "Each with Index" starting at 1?', 'Converting images to Pytorch tensors loses label data', 'itemview in Adapter for recyclerview not getting id from xml', 'Use Margin Auto & Flex to Align Text', '(C ) URLDownloadToFile Function corrupting downloaded EXE', 'Search plugin for Woocommerce website (Free)', 'Create new folder when save image in Python Plotly', "What's the difference between avfilter_graph_parse_ptr() and avfilter_link()?", 'Inputs to toString (java) on a resultset from MySQL [duplicate]', 'Which language i learn in This time for better future? python or javaScript? [closed]', 'Hi everyone. I want to write a function in python for attached data frame. I can not figure out how can I do it', 'is there a way in R to mutate a cumulative subtraction to update the same mutated var?', 'making a simple reccommendation system in JavaScript', 'Amchart4 cursor does not match mouse position in screen with zoom', 'Bash curl command works in terminal, but not with Python os.system()']
['0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '-2', '0', '1', '0', '0', '0']
uj5u.com熱心網友回復:
你可以使用 find_all() 函式,回傳的是一個陣列。由于索引 0 中的投票,您可以在索引 0 中訪問。
print(question.find_all(".item-number")[0].getText())
uj5u.com熱心網友回復:
您可以選擇回傳所有問題串列的共享父元素,然后使用 select_one 和 css 類選擇器獲取投票和問題標題。將其包裝在字典理解中以配對結果。
import requests
from bs4 import BeautifulSoup as bs
r = requests.get("https://stackoverflow.com/questions")
soup = bs(r.content, "lxml")
data = {
i.select_one(".s-post-summary--content-title").text.strip(): int(
i.select_one(".s-post-summary--stats-item-number").text
)
for i in soup.select(".s-post-summary")
}
print(data)
轉載請註明出處,本文鏈接:https://www.uj5u.com/qiye/476152.html
