本人小白,在爬糗百的時候發現在zip打包前資料是完整的,但打包之后最后5條資料會不見???是我哪里寫錯了嗎?
求指教!!!
#encoding: utf-8
import re
import requests
def parse_page(url):
headers = {
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.119 Safari/537.36',
'Referer': 'https://www.qiushibaike.com/'
}
response = requests.get(url,headers = headers)
text = response.text
users = re.findall(r'<div class="author clearfix">.*?<h2>(.*?)</h2>',text,re.DOTALL)
levels = re.findall(r'<div class="articleGender manIcon">(.*?)</div>',text,re.DOTALL)
contents = re.findall(r'<div\sclass="content">.*?<span>(.*?)</span>',text,re.DOTALL)
usersall = []
for user in users:
y = re.sub(r"\\n","",user)
usersall.append(y.strip())
contentall = []
for content in contents:
x = re.sub(r'<.*?>',"",content)
contentall.append(x.strip())
poems = []
for value in zip(usersall,levels,contentall):
user,level,content = value
poem = {
'user' : user,
"level" : level,
"content" : content
}
poems.append(poem)
for poem in poems:
print(poem)
def main():
url = 'https://www.qiushibaike.com/text/page/1/'
# for x in range(1,6):
# url = 'https://www.qiushibaike.com/text/page/%s/' % x
parse_page(url)
if __name__ == '__main__':
main()
uj5u.com熱心網友回復:
資料抓取數量對了嗎?uj5u.com熱心網友回復:
我在放進去打包之前還特意列印了一下,每一項都能全部列印出來。但是進zip打包后最后五條就莫名其妙不見了。uj5u.com熱心網友回復:
你先len下這三個usersall,levels,contentall,肯定長度不一樣uj5u.com熱心網友回復:
a=[1,2]b=[]
for c,d in zip(a,b):
print(c,d)
運行這個試試
uj5u.com熱心網友回復:
print(len(users),len(levels),len(contents))#得到25 17 25只能列印17條資料吧
轉載請註明出處,本文鏈接:https://www.uj5u.com/qita/28341.html
下一篇:python出現報錯ModuleNotFoundError: No module named 'isPrimeFun'如何解決?
