我不知道如何刮掉這段文字
Telefon Mobil Apple iPhone 13、Super Retina XDR OLED 6.1"、256GB 閃存、Duala 12 12 MP 攝像頭、Wi-Fi、5G、iOS (Negru)
<div class="npi_name">
<h2>
<a href="/solutii-mobile-telefoane-mobile/apple-telefon-mobil-apple-iphone-13-super-retina-xdr-oled-6.1-256gb-flash-camera-duala-12-12-mp-wi-fi-5g-ios-negru-3824456.html">
<span style="color:red">Stoc limitat!</span>
Telefon Mobil Apple iPhone 13, Super Retina XDR OLED 6.1", 256GB Flash, Camera Duala 12 12 MP, Wi-Fi, 5G, iOS (Negru)
</a>
</h2>
</div>
我試過的:
for n in j.find_all("div","npi_name"):
n2=n.find("a", href=True, text=True)
try:
n1=n2['href']
except:
n2=n.find("a")
n1=n2['href']
n3=n2.string
print(n3)
輸出:
None
uj5u.com熱心網友回復:
嘗試:
from bs4 import BeautifulSoup
html_doc = """
<div >
<h2>
<a href="/solutii-mobile-telefoane-mobile/apple-telefon-mobil-apple-iphone-13-super-retina-xdr-oled-6.1-256gb-flash-camera-duala-12-12-mp-wi-fi-5g-ios-negru-3824456.html">
<span style="color:red">Stoc limitat!</span>
Telefon Mobil Apple iPhone 13, Super Retina XDR OLED 6.1", 256GB Flash, Camera Duala 12 12 MP, Wi-Fi, 5G, iOS (Negru)
</a>
</h2>
</div>
"""
soup = BeautifulSoup(html_doc, "html.parser")
t = "".join(soup.select_one(".npi_name a").find_all(text=True, recursive=False))
print(t.strip())
印刷:
Telefon Mobil Apple iPhone 13, Super Retina XDR OLED 6.1", 256GB Flash, Camera Duala 12 12 MP, Wi-Fi, 5G, iOS (Negru)
uj5u.com熱心網友回復:
我做了一些假設,但這樣的事情應該可行:
for n in j.find_all("div", {"class": "npi_name"}):
print(n.find("a").contents[2].strip())
這就是我得出答案的方式(您提供的 HTML 已輸入到a.html):
from bs4 import BeautifulSoup
def main():
with open("a.html", "r") as file:
html = file.read()
soup = BeautifulSoup(html, "html.parser")
divs = soup.find_all("div", {"class": "npi_name"})
for div in divs:
a = div.find("a").contents[2].strip()
# Testing
print(a)
if __name__ == "__main__":
main()
uj5u.com熱心網友回復:
texts = []
for a in soup.select("div.npi_name a[href]"):
texts.append(a.contents[-1].strip())
或更明確地說:
texts = []
for a in soup.select("div.npi_name a[href]"):
if a.span:
text = a.span.next_sibling
else:
text = a.string
texts.append(text.strip())
uj5u.com熱心網友回復:
選擇更具體的元素,例如css selectors并用于stripped_strings獲取文本,假設它始終是元素中的最后一個節點:
for e in soup.select('div.npi_name a[href]'):
text = list(e.stripped_strings)[-1]
print(text)
這樣,如果需要,您還可以處理其他資訊,例如 href、span 文本、...
例子
選擇多個專案,將資訊存盤在字典串列中并將其轉換為資料框:
from bs4 import BeautifulSoup
import pandas as pd
html = '''
<div >
<h2>
<a href="/solutii-mobile-telefoane-mobile/apple-telefon-mobil-apple-iphone-13-super-retina-xdr-oled-6.1-256gb-flash-camera-duala-12-12-mp-wi-fi-5g-ios-negru-3824456.html">
<span style="color:red">Stoc limitat!</span>
Telefon Mobil Apple iPhone 13, Super Retina XDR OLED 6.1", 256GB Flash, Camera Duala 12 12 MP, Wi-Fi, 5G, iOS (Negru)
</a>
</h2>
</div>
'''
soup = BeautifulSoup(html)
data = []
for e in soup.select('div.npi_name a[href]'):
data.append({
'url' : e['href'],
'stock': s.text if (s := e.span) else None,
'label' :list(e.stripped_strings)[-1]
})
pd.DataFrame(data)
輸出
| 網址 | 股票 | 標簽 |
|---|---|---|
| /solutii-mobile-telefoane-mobile/apple-telefon-mobil-apple-iphone-13-super-retina-xdr-oled-6.1-256gb-flash-camera-duala-12-12-mp-wi-fi-5g -ios-negru-3824456.html | 庫存限制! | Telefon Mobil Apple iPhone 13、Super Retina XDR OLED 6.1"、256GB 閃存、Duala 12 12 MP 攝像頭、Wi-Fi、5G、iOS (Negru) |
轉載請註明出處,本文鏈接:https://www.uj5u.com/caozuo/448287.html
標籤:Python python-3.x 网页抓取 美丽的汤
上一篇:我需要從類文本中提取id
