可愛的人!我對 Python 完全陌生。我試圖抓取幾個 URL 并遇到“列印”問題。
我試圖列印并寫下“發貨狀態”。我有兩個 URL,所以理想情況下我會得到兩個結果。
這是我的代碼:
from bs4 import BeautifulSoup
import re
import urllib.request
import urllib.error
import urllib
# read urls of websites from text file
list_open = open("c:/Users/***/Downloads/web list.txt")
read_list = list_open.read()
line_in_list = read_list.split("\n")
for url in line_in_list:
soup = BeautifulSoup(urllib.request.urlopen(url).read(), 'html')
# parse something special in the file
shipment = soup.find_all('span')
Preparation=shipment[0]
Sent=shipment[1]
InTransit=shipment[2]
Delivered=shipment[3]
for p in shipment:
# extract information
print (url,';',"Preparation",Preparation.getText(),";","Sent",Sent.getText(),";","InTransit",InTransit.getText(),";","Delivered",Delivered.getText())
import sys
file_path = 'randomfile.txt'
sys.stdout = open(file_path, "w")
print(url,';',"Preparation",Preparation.getText(),";","Sent",Sent.getText(),";","InTransit",InTransit.getText(),";","Delivered",Delivered.getText())`
我這里有兩個問題:
- 問題一:我只有兩個 URL,當我列印結果時,每個“跨度”重復 4 次(因為有四個“跨度”)。“輸出”中的結果如下:
(我洗掉了結果示例以保護隱私。)
- 問題二:我試圖將“列印”寫入文本檔案,但檔案中只出現了一行:
(我洗掉了結果示例以保護隱私。)
我想知道代碼中有什么問題。我只想列印 2 個 url 結果。
非常感謝您的幫助!先感謝您!
uj5u.com熱心網友回復:
第一點是由重復裝運引起的 - 只需洗掉 for 回圈并正確縮進print():
for url in line_in_list:
soup = BeautifulSoup(urllib.request.urlopen(url).read(), 'html')
# parse something special in the file
shipment = soup.find_all('span')
Preparation=shipment[0]
Sent=shipment[1]
InTransit=shipment[2]
Delivered=shipment[3]
print (url,';',"Preparation",Preparation.getText(),";","Sent",Sent.getText(),";","InTransit",InTransit.getText(),";","Delivered",Delivered.getText())
第二個問題是在您在回圈之外而不是在附加模式下呼叫寫入時引起的 - 您最終將以此作為您的回圈:
#open file in append mode
with open('somefile.txt', 'a') as f:
#start iterating your urls
for url in line_in_list:
soup = BeautifulSoup(urllib.request.urlopen(url).read(), 'html')
# parse something special in the file
shipment = soup.find_all('span')
Preparation=shipment[0]
Sent=shipment[1]
InTransit=shipment[2]
Delivered=shipment[3]
#create output text
line = f'{url};Preparation{Preparation.getText()};Sent{Sent.getText()};InTransit{InTransit.getText()};Delivered{Delivered.getText()}'
#print output text
print (line)
#append output text to file
f.write(line '\n')
你可以洗掉:
import sys
file_path = 'randomfile.txt'
sys.stdout = open(file_path, "w")
print(url,';',"Preparation",Preparation.getText(),";","Sent",Sent.getText(),";","InTransit",InTransit.getText(),";","Delivered",Delivered.getText())`
位優化代碼示例:
from bs4 import BeautifulSoup
import urllib.request
import urllib.error
import urllib
# read urls of websites from text file
list_open = open("c:/Users/***/Downloads/web list.txt")
read_list = list_open.read()
line_in_list = read_list.split("\n")
file_path = "randomfile.txt"
with open('somefile.txt', 'a', encoding='utf-8') as f:
for url in line_in_list:
soup = BeautifulSoup(urllib.request.urlopen(url).read(), 'html')
# parse something special in the file
shipment = list(soup.select_one('#progress').stripped_strings)
line = f"{url},{';'.join([':'.join(x) for x in list(zip(shipment[::2], shipment[1::2]))])}"
print (line)
f.write(line '\n')
uj5u.com熱心網友回復:
list_open = open("c:/Users/***/Downloads/web list.txt")
read_list = list_open.read()
line_in_list = read_list.split("\n")
file_path = 'randomfile.txt'
sys.stdout = open(file_path, "w")
There are four spans actuelly, try this
for url in line_in_list:
soup = BeautifulSoup(urlopen(url).read(), 'html')
# parse something special in the file
shipments = soup.find_all("span") # there are four span actually;
sys.stdout.write('Url ' url '; Preparation' shipments[0].getText() '; Sent' shipments[1].getText() '; InTransit' shipments[2].getText() '; Delivered' shipments[3].getText())
# change line
sys.stdout.write("\r")
uj5u.com熱心網友回復:
第一個問題
你有兩個嵌套回圈:
for url in line_in_list:
for p in shipment:
print(...)
列印嵌套在第二個回圈中。如果每個 url 有 4 個貨件,則每個 url 將列印 4 次。
由于您不使用pfromfor p in shipment您可以完全擺脫第二個回圈并將 print 向左移動一個縮進級別,如下所示:
for url in line_in_list:
soup = BeautifulSoup(urllib.request.urlopen(url).read(), 'html')
# parse something special in the file
shipment = soup.find_all('span')
Preparation=shipment[0]
Sent=shipment[1]
InTransit=shipment[2]
Delivered=shipment[3]
print (url,';',"Preparation",Preparation.getText(),";","Sent",Sent.getText(),";","InTransit",InTransit.getText(),";","Delivered",Delivered.getText())
第二個問題
sys.stdout = open(file_path, "w")
print(url,';',"Preparation",Preparation.getText(),";","Sent",Sent.getText(),";","InTransit",InTransit.getText(),";","Delivered",Delivered.getText())`
沒有關鍵字引數, print 正在寫入sys.stdout,默認情況下是您的終端輸出。之后只有一個列印,sys.sdtout = ...所以只有一行寫入檔案。
還有另一種列印到檔案的方法:
with open('demo.txt', 'a') as f:
print('Hello world', file = f)
with即使引發例外,該關鍵字也將確保關閉檔案。
兩者結合
據我了解,您想在檔案中列印兩行。這是一個解決方案:
from bs4 import BeautifulSoup
import urllib.request
import urllib.error
import urllib
# read urls of websites from text file
list_open = open("c:/Users/***/Downloads/web list.txt")
read_list = list_open.read()
line_in_list = read_list.split("\n")
file_path = "randomfile.txt"
for url in line_in_list:
soup = BeautifulSoup(urllib.request.urlopen(url).read(), "html")
# parse something special in the file
shipment = soup.find_all("span")
Preparation = shipment[0]
Sent = shipment[1]
InTransit = shipment[2]
Delivered = shipment[3]
with open(file_path, "a") as f:
f.write(
f"{url} ; Preparation {Preparation.getText()}; Sent {Sent.getText()}; InTransit {InTransit.getText()}; Delivered {Delivered.getText()}"
)
轉載請註明出處,本文鏈接:https://www.uj5u.com/caozuo/407893.html
標籤:
上一篇:如何從該站點將團隊名稱和目標刮到表格中?我一直在嘗試幾種不同的方法,但無法完全弄清楚
下一篇:無法獲取電話號碼和地址
