代碼見圖。
自學寫的一個簡單的爬蟲, 卡在那個URL回圈了。
單個Url,我驗證過代碼,資料可以正常出來。
當我放幾個網址在串列里的時候(計劃未來放更多頁面結構相同的URL),資料就只能出來KEY。不知道問題出在哪里了,請前輩指點!感謝!~
uj5u.com熱心網友回復:
有沒有大佬幫助下, 謝謝
uj5u.com熱心網友回復:
貼代碼別貼圖片。。。uj5u.com熱心網友回復:
import requests
import re
import time
import xlwt
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) XXXXebKit/537.36 (KHTML, like Gecko) ChXXme/70.0.35XX.110 Safari/537.36"
}
def get_page(url):
try:
r = requests.get(url=url, headers=headers)
r.raise_for_status()
r.encoding = 'utf-8'
return r.text
except Exception as e:
print(e)
def get_info(page):
try:
com_name = re.findall('<h1>(.*?)</h1>', page, re.S)
com_add = re.findall('q=(.*?)&hl=en" target="_blank" class="FLOAT_R">', page, re.S)
com_tel = re.findall('<b>Phone:</b> (.*?)<br />', page, re.S)
com_web = re.findall('CT=CCW&MemberID=0&ComID=.*?&URL=https%3a%2f%2f(.*?)&CK=', page, re.S)
com_biz = re.findall('<meta name="keywords" content="(.*?)">', page, re.S)
com_info = com_name + com_add + com_tel + list(set(com_web)) + com_biz
for ele in com_info:
data = {}
data['com_name'] = com_info[0]
data['com_add'] = com_info[1]
data['com_tel'] = com_info[2]
data['com_web'] = com_info[3]
data['com_biz'] = com_info[4]
return data
except Exception as e:
print(e)
urls = ['https://www.xxxxxxx.com/Buyers_xxxx/xxr_Technologies/c546',
'https://www.xxxxxxx.com/Buyers_xxxx/Bxxr_AG/c1527', 'https://www.xxxxxxx.com/Buyers_xxxxx/Axxxx/c95']
DATA = []
for url in urls:
page = get_page(url)
datas = get_info(page)
time.sleep(2)
for data in datas:
DATA.append(data)
f = xlwt.Workbook(encoding='utf-8')
sheet01 = f.add_sheet(u'sheet1', cell_overwrite_ok=True)
sheet01.write(0, 0, 'com_name') # 第一行第一列
sheet01.write(0, 1, 'com_add')
sheet01.write(0, 2, 'com_tel')
sheet01.write(0, 3, 'com_web')
sheet01.write(0, 4, 'com_biz')
# 寫內容
for i in range(len(DATA)):
sheet01.write(i + 1, 0, DATA[i]['com_name'])
sheet01.write(i + 1, 1, DATA[i]['com_add'])
sheet01.write(i + 1, 2, DATA[i]['com_tel'])
sheet01.write(i + 1, 3, DATA[i]['com_web'])
sheet01.write(i + 1, 4, DATA[i]['com_biz'])
print('p', end='')
f.save('D:\\test.xls')
uj5u.com熱心網友回復:
不好意思。。我小白,我重新開了一貼,這貼能不能麻煩版主刪一下。。謝了。
轉載請註明出處,本文鏈接:https://www.uj5u.com/qita/89444.html
