完全新手,但我已經成功地從上游代碼創建的鏈接串列中使用 Python 抓取了 EAN 數字。但是,我的輸出檔案將所有刮取的數字包含為連續的單行,而不是每行一個 EAN。
這是我的代碼 - 它有什么問題?(已洗掉的 URL 已編輯)
import requests
from bs4 import BeautifulSoup
import urllib.request
import os
subpage = 1
while subpage <= 2:
URL = "https://..." str(subpage)
page = requests.get(URL)
soup = BeautifulSoup(page.content, "html.parser")
"""writes all links under the h2 tag into a list"""
links = []
h2s = soup.find_all("h2")
for h2 in h2s:
links.append("http://www.xxxxxxxxxxx.com" h2.a['href'])
"""opens links from list and extracts EAN number from underlying page"""
with open("temp.txt", "a") as output:
for link in links:
urllib.request.urlopen(link)
page_2 = requests.get(link)
soup_2 = BeautifulSoup(page_2.content, "html.parser")
if "EAN:" in soup_2.text:
span = soup_2.find(class_="articleData_ean")
EAN = span.a.text
output.write(EAN)
subpage = 1
os.replace('temp.txt', 'EANs.txt')
uj5u.com熱心網友回復:
output.write(EAN)正在撰寫每個 EAN,它們之間沒有任何東西。它不會自動添加分隔符或換行符。您可以添加換行符:output.write('\n')或逗號等將它們分開
轉載請註明出處,本文鏈接:https://www.uj5u.com/houduan/337852.html
下一篇:將串列和字串連接成四個元素的元組
