版主大大求助啊,代碼如下,我自己寫的爬蟲,去年運行還一切正常,現在想更新一下,重新抓取已經全部失效了,VFP有沒有辦法突破服務器對這種爬蟲的封鎖呢?真心求教!!謝謝!!
PUBLIC lncount,roothtml,imghtml,lccover,lccoverurl
lncount = 0
imghtml = ''
roothtml = '<html><head><link rel="stylesheet" type="text/css" href="https://bbs.csdn.net/css/main.css"><meta http-equiv="expires" content="60">'+'<meta http-equiv="content-type" content="text/html;charset=utf-8"><meta name="keywords" content="美女,性感,寫真,傭訓,大尺度,絲襪"><title>Beautiful Girl!!!</title></head><body><ul>'
IF !DIRECTORY('.\img')
MKDIR('.\img')
ENDIF
IF !DIRECTORY('.\cover')
MKDIR('.\cover')
ENDIF
IF !DIRECTORY('.\html')
MKDIR('.\html')
ENDIF
FOR pg = 1 TO 2 &&這個地方可以將100分成10等分,然后編譯出10個EXE,利用多行程,加快爬去速度。
lnnum = 30
IF pg < 2
lcpgurl = 'https://www.nvshens.com/gallery/yazhou'
ELSE
lcpgurl = 'https://www.nvshens.com/gallery/yazhou/'+ALLTRIM(STR(pg))+'.html'
ENDIF
LOCAL oxhttp AS Microsoft.xmlhttp
oxhttp=CREATEOBJECT("Microsoft.xmlhttp")
oxhttp.OPEN("GET",lcpgurl,.F.)
oxhttp.SEND()
SourceCode=STRCONV(oxhttp.responseBody,11)
RELEASE oxhttp
FOR a = 1 TO lnnum
lchtml = 'https://www.nvshens.com'+STREXTRACT(SourceCode,"'><a href='","' class='caption'>",a)
lccoverurl = "https://t1.onvshen.com:85/gallery/" + STREXTRACT(SourceCode,"https://t1.onvshen.com:85/gallery/","'",a)
IF (lchtml == "") = .F.
LOCAL oxhttp1 AS Microsoft.xmlhttp
oxhttp1=CREATEOBJECT("Microsoft.xmlhttp")
oxhttp1.OPEN("GET",lchtml,.F.)
oxhttp1.SEND()
SourceCode1=STRCONV(oxhttp1.responseBody,11)
lnpicnum = VAL(STREXTRACT(SourceCode1,"#DB0909'>","張照片",1))
lcfolder = ALLTRIM(STREXTRACT(SourceCode1,'<title>','</title>',1))
lcfolder = STRTRAN(lcfolder,'_宅男女神','')
lcfolder = STRTRAN(lcfolder,'圖片','')
lcfolder = STRTRAN(lcfolder,'|','')
lcfolder = STRTRAN(lcfolder,'<','')
lcfolder = STRTRAN(lcfolder,'>','')
lcfolder = STRTRAN(lcfolder,'/','')
lcfolder = STRTRAN(lcfolder,'\','')
lcfolder = STRTRAN(lcfolder,':','')
lcfolder = STRTRAN(lcfolder,'"','')
lcfolder = STRTRAN(lcfolder,'*','')
lcfolder = STRTRAN(lcfolder,'?','')
IF ("404" $ lcfolder) = .F.
imghtml = imghtml + '<html><head><link rel="stylesheet" type="text/css" href="https://bbs.csdn.net/css/main.css"><meta http-equiv="expires" content="60"><meta http-equiv="content-type" content="text/html;charset=utf-8"><meta name="keywords" content="美女,性感,寫真,傭訓,大尺度,絲襪"><title>'+lcfolder+'</title></head><body>'
lncount = lncount + 1
IF !DIRECTORY('.\img\'+lcfolder)
MKDIR('.\img\'+lcfolder)
ELSE
lcfolder = lcfolder + '_NEW'
MKDIR('.\img\'+lcfolder)
ENDIF
lcpicurlbase = STREXTRACT(SourceCode1,"<img src='","0.jpg",1)
RELEASE oxhttp1
****************獲取cover******************
LOCAL oxhttp1 AS Microsoft.xmlhttp
oxhttp1=CREATEOBJECT("Microsoft.xmlhttp")
oxhttp1.OPEN("GET",lccoverurl,.F.)
oxhttp1.SEND()
STRTOFILE(oxhttp1.responseBody,".\cover\" + lcfolder + ".jpg")
RELEASE oxhttp1
*******************************************
FOR b = 1 TO lnpicnum
IF b < 2
lcpicurl = lcpicurlbase + "0.jpg"
ELSE
DO CASE
CASE b < 11
lcpicurl = lcpicurlbase + '00' + ALLTRIM(STR(b-1)) + '.jpg'
CASE b < 101
lcpicurl = lcpicurlbase + '0' + ALLTRIM(STR(b-1)) + '.jpg'
OTHERWISE
lcpicurl = lcpicurlbase + ALLTRIM(STR(b-1)) + '.jpg'
ENDCASE
ENDIF
LOCAL oxhttp2 AS Microsoft.xmlhttp
oxhttp2=CREATEOBJECT("Microsoft.xmlhttp")
oxhttp2.OPEN("GET",lcpicurl,.F.)
oxhttp2.SEND()
IF !ISNULL(oxhttp2.responseBody)
STRTOFILE(oxhttp2.responseBody,'.\img\'+lcfolder+'\'+ALLTRIM(STR(b))+'.jpg')
imghtml = imghtml + '<p align="center"><img src="https://bbs.csdn.net/img/'+lcfolder+'\'+ALLTRIM(STR(b))+'.jpg"'+' alt="'+'jpg'+ALLTRIM(STR(b))+'"></p>'
ENDIF
RELEASE oxhttp2
ENDFOR
imghtml = imghtml + '<p align="center"><a href="https://bbs.csdn.net/beauty.html">回傳</a></p></body></html>'
STRTOFILE(STRCONV(imghtml,9),'.\html\'+ALLTRIM(STR(lncount))+'.html')
imghtml = ''
roothtml = roothtml + '<li class="pic"><div class="pic_div"><a class="pic_link" href="' + '.\html\'+ALLTRIM(STR(lncount))+'.html';
+'" target="view_window"><img alt="'+lcfolder+'" src="'+'.\cover\'+lcfolder+'.jpg'+'" title="'+lcfolder+'" /></a></div><div class="pic_title"><a href="';
+'.\html\'+ALLTRIM(STR(lncount))+'.html'+'" target="view_window" class="caption">'+lcfolder+'</a></div></li>'
ENDIF
ENDIF
ENDFOR
ENDFOR
roothtml = roothtml + '</ul></body></html>'
STRTOFILE(STRCONV(roothtml,9),'.\beauty.html')
MESSAGEBOX("All done!!",48,"Tips")
uj5u.com熱心網友回復:
程式還可以正常運行不報錯,但是圖片全部不對。。。。uj5u.com熱心網友回復:
你先看看 lccoverurl 的值是什么uj5u.com熱心網友回復:
url一切都正常,就是服務器會檢測這次訪問是正常瀏覽器訪問,還是爬蟲的訪問,應該是檢測header和cookie還有沒有別的我就不清楚了。uj5u.com熱心網友回復:
你這是在和人討論問題嗎?uj5u.com熱心網友回復:
請問我上面的回復有什么不妥嗎?uj5u.com熱心網友回復:
你認為不是嗎?https://www.nvshens.com/gallery/yazhou/
的第一張圖片是
https://t1.onvshen.com:85/gallery/19864/25551/cover/0.jpg
不過就是讓你核實一下正確與否,為什么要含糊其辭呢?
圖片來源網站有簡單的防盜鏈措施
需要在 http 頭中加入來源 REFERER 欄位,值為, https://www.nvshens.com/
你可以查看 https://www.baidu.com/s?wd=%E7%BB%99+xmlhttp+%E5%8A%A0+REFERER&ie=UTF-8
查找可能的解決辦法
比較可行的方法是通過 webbrowse 組件獲取
轉載請註明出處,本文鏈接:https://www.uj5u.com/shujuku/93773.html
標籤:VFP
上一篇:mysqld_multi centos6.9 無法自啟動求助
下一篇:請教一個sql查詢陳述句
