我一直在研究 Python 刮板。我想將獲得的資訊保存在不同的檔案中。URL 必須在一個檔案中,而標題必須在另一個檔案中。
使用 URL 時沒有問題,但是當我嘗試抓取我正在搜索的博客的名稱時,我得到了以下結果:
w
a
t
a
s
h
i
n
o
s
e
k
a
i
s
w
o
r
l
d
v
-
a
-
p
-
o
-
r
-
s
-
m
-
u
-
t
b
l
a
c
k
e
n
e
d
d
e
a
t
h
e
y
e
5
h
i
n
y
8
l
a
z
e
2
o
m
b
i
e
p
o
r
y
g
o
n
-
d
i
g
i
t
a
l
v
a
p
o
r
w
a
v
e
b
o
m
b
s
u
b
t
l
e
a
n
i
m
e
v
a
p
o
r
w
a
v
e
c
o
r
p
f
i
r
m
i
m
a
g
e
我已經確定了問題,我認為它與“\n”有關,但我一直無法找到解決方案。
這是我的代碼:
from bs4 import BeautifulSoup
search_term = "landscape/recent"
posts_scrape = requests.get(f"https://www.tumblr.com/search/{search_term}")
soup = BeautifulSoup(posts_scrape.text, "html.parser")
articles = soup.find_all("article", class_="FtjPK")
data = {}
for article in articles:
try:
source = article.find("div", class_="vGkyT").text
for imgvar in article.find_all("img", alt="Image"):
data.setdefault(source, []).extend(
[
i.replace("500w", "").strip()
for i in imgvar["srcset"].split(",")
if "500w" in i
]
)
except AttributeError:
continue
archivo = open ("Sites.txt", "w")
for source, image_urls in data.items():
for url in image_urls:
archivo.write(url '\n')
archivo.close()
archivo = open ("Source.txt", "w")
for source, image_urls in data.items():
for sources in source:
archivo.write(sources '\n')
archivo.close()
uj5u.com熱心網友回復:
將最后一個回圈更改為:
archivo = open("Source.txt", "w")
for source in data:
archivo.write(source "\n")
archivo.close()
那么內容Source.txt將是:
harshvardhan25
mikeahrens
amazinglybeautifulphotography
landscaperrosebay
danielapelli
sahrish-acrylic-painting
sweetd3lights
pensamentsisomnis
pics-bae
oneshotolive
scattopermestesso
huariqueje
或使用with:
with open("Source.txt", "w") as archivo:
archivo.write("\n".join(data))
轉載請註明出處,本文鏈接:https://www.uj5u.com/qianduan/481053.html
標籤:Python python-3.x 网页抓取
上一篇:我想使用python從網站上抓取他們的產品價格和名稱,并且只使用beautifulsoup、requests和json模塊
