從div內的網站中刮取一行文本-有解無憂

我不知道如何刮掉這段文字

Telefon Mobil Apple iPhone 13、Super Retina XDR OLED 6.1"、256GB 閃存、Duala 12 12 MP 攝像頭、Wi-Fi、5G、iOS (Negru)

 <div class="npi_name">
        <h2>
            <a href="/solutii-mobile-telefoane-mobile/apple-telefon-mobil-apple-iphone-13-super-retina-xdr-oled-6.1-256gb-flash-camera-duala-12-12-mp-wi-fi-5g-ios-negru-3824456.html"> 
                <span style="color:red">Stoc limitat!</span>  
                Telefon Mobil Apple iPhone 13, Super Retina XDR OLED 6.1", 256GB Flash, Camera Duala 12   12 MP, Wi-Fi, 5G, iOS (Negru)
            </a>        
        </h2>
    </div>

我試過的：

for n in j.find_all("div","npi_name"):
   n2=n.find("a", href=True, text=True)
   try:
       n1=n2['href']
   except:
       n2=n.find("a")
       n1=n2['href']
   n3=n2.string
   print(n3)

輸出：

None

uj5u.com熱心網友回復：

嘗試：

from bs4 import BeautifulSoup

html_doc = """
 <div >
        <h2>
            <a href="/solutii-mobile-telefoane-mobile/apple-telefon-mobil-apple-iphone-13-super-retina-xdr-oled-6.1-256gb-flash-camera-duala-12-12-mp-wi-fi-5g-ios-negru-3824456.html"> 
                <span style="color:red">Stoc limitat!</span>  
                Telefon Mobil Apple iPhone 13, Super Retina XDR OLED 6.1", 256GB Flash, Camera Duala 12   12 MP, Wi-Fi, 5G, iOS (Negru)
            </a>        
        </h2>
    </div>
"""

soup = BeautifulSoup(html_doc, "html.parser")

t = "".join(soup.select_one(".npi_name a").find_all(text=True, recursive=False))
print(t.strip())

印刷：

Telefon Mobil Apple iPhone 13, Super Retina XDR OLED 6.1", 256GB Flash, Camera Duala 12   12 MP, Wi-Fi, 5G, iOS (Negru)

uj5u.com熱心網友回復：

我做了一些假設，但這樣的事情應該可行：

 for n in j.find_all("div", {"class": "npi_name"}):
      print(n.find("a").contents[2].strip())

這就是我得出答案的方式（您提供的 HTML 已輸入到a.html）：

 from bs4 import BeautifulSoup


 def main():

   with open("a.html", "r") as file:

     html = file.read()
     soup = BeautifulSoup(html, "html.parser")

     divs = soup.find_all("div", {"class": "npi_name"})
     for div in divs:
       a = div.find("a").contents[2].strip()
       
       # Testing
       print(a)

 if __name__ == "__main__":
   main()

uj5u.com熱心網友回復：

texts = []
for a in soup.select("div.npi_name a[href]"):
    texts.append(a.contents[-1].strip())

或更明確地說：

texts = []
for a in soup.select("div.npi_name a[href]"):
    if a.span:
        text = a.span.next_sibling
    else:
        text = a.string

    texts.append(text.strip())

uj5u.com熱心網友回復：

選擇更具體的元素，例如css selectors并用于stripped_strings獲取文本，假設它始終是元素中的最后一個節點：

for e in soup.select('div.npi_name a[href]'):
    text = list(e.stripped_strings)[-1]
    print(text)

這樣，如果需要，您還可以處理其他資訊，例如 href、span 文本、...

例子

選擇多個專案，將資訊存盤在字典串列中并將其轉換為資料框：

from bs4 import BeautifulSoup
import pandas as pd

html = '''
<div >
    <h2>
        <a href="/solutii-mobile-telefoane-mobile/apple-telefon-mobil-apple-iphone-13-super-retina-xdr-oled-6.1-256gb-flash-camera-duala-12-12-mp-wi-fi-5g-ios-negru-3824456.html"> 
            <span style="color:red">Stoc limitat!</span>  
                Telefon Mobil Apple iPhone 13, Super Retina XDR OLED 6.1", 256GB Flash, Camera Duala 12   12 MP, Wi-Fi, 5G, iOS (Negru)
        </a>
    </h2>
</div>
'''

soup = BeautifulSoup(html)

data = []

for e in soup.select('div.npi_name a[href]'):
    data.append({
        'url' : e['href'],
        'stock': s.text if (s := e.span) else None,
        'label' :list(e.stripped_strings)[-1]
    })

pd.DataFrame(data)

輸出

網址	股票	標簽
/solutii-mobile-telefoane-mobile/apple-telefon-mobil-apple-iphone-13-super-retina-xdr-oled-6.1-256gb-flash-camera-duala-12-12-mp-wi-fi-5g -ios-negru-3824456.html	庫存限制！	Telefon Mobil Apple iPhone 13、Super Retina XDR OLED 6.1"、256GB 閃存、Duala 12 12 MP 攝像頭、Wi-Fi、5G、iOS (Negru)

轉載請註明出處，本文鏈接：https://www.uj5u.com/caozuo/448287.html

標籤：Python python-3.x 网页抓取美丽的汤

上一篇：我需要從類文本中提取id

下一篇：在python中使用beautifulsoup獲取hrefurl