ip被封？有妙招-有解無憂

一種可能的解決辦法
同樣也是基于ADSL撥號，不同的是，需要兩臺能夠進行ADSL撥號的服務器，抓取程序中使用這兩臺服務器作為代理。
假設有A、B兩臺可以進行ADSL撥號的服務器。爬蟲程式在C服務器上運行，使用A作為代理訪問外網，如果在抓取程序中遇到禁止訪問的情況，立即將代理切換為B，然后將A進行重新撥號。如果再遇到禁止訪問就切換為A做代理，B再撥號，如此反復。如下圖：
使用A為代理，B撥號：
&lt;img data-rawheight="327" data-rawwidth="721" src=https://bbs.csdn.net/topics/"https://pic1.zhimg.com/50/9196e28cd8621a06cd0f0339f1fa765b_hd.jpg" class="origin_image zh-lightbox-thumb" width="721" data-original="https://pic1.zhimg.com/9196e28cd8621a06cd0f0339f1fa765b_r.jpg">
使用B為代理，A撥號：
&lt;img data-rawheight="327" data-rawwidth="721" src=https://bbs.csdn.net/topics/"https://pic2.zhimg.com/50/7afaf540be23920733bc466ae3f6f651_hd.jpg" class="origin_image zh-lightbox-thumb" width="721" data-original="https://pic2.zhimg.com/7afaf540be23920733bc466ae3f6f651_r.jpg">

代碼爬蟲（web）：
import requests
import random
pro=['122.152.196.126','114.215.174.227','119.185.30.75']
head={
'user-Agent':'Mozilla/5.0(Windows NT 10.0;Win64 x64)AppleWebkit/537.36(KHTML,like Gecko) chrome/58.0.3029.110 Safari/537.36'
}
url='http://www.whatismyip.com.tw/'
r=requests.get(url,proxies={'http':random.choice(pro)},headers=head)
r.encoding=r.apparent_encoding
print(r.status_code)
print(r.text)

其他：

# coding=utf-8

import requests

import time

from lxml import etree

def getUrl():

    for i in range(33):

        url = 'http://task.zbj.com/t-ppsj/p{}s5.html'.format(i+1)

        spiderPage(url)

def spiderPage(url):

    if url is None:

        return None

    try:

        proxies = {

            'http': 'http://221.202.248.52:80',

        }

        user_agent = 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.2785.104 Safari/537.36 Core/1.53.4295.400'

        headers = {'User-Agent': user_agent}

        htmlText = requests.get(url, headers=headers,proxies=proxies).text

        selector = etree.HTML(htmlText)

        tds = selector.xpath('//*[@class="tab-switch tab-progress"]/table/tr')

        for td in tds:

            price = td.xpath('./td/p/em/text()')

            href = td.xpath('./td/p/a/@href')

            title = td.xpath('./td/p/a/text()')

            subTitle = td.xpath('./td/p/text()')

            deadline = td.xpath('./td/span/text()')

            price = price[0] if len(price)>0 else ''

            title = title[0] if len(title)>0 else ''

            href = href[0] if len(href)>0 else ''

            subTitle = subTitle[0] if len(subTitle)>0 else ''

            deadline = deadline[0] if len(deadline)>0 else ''

            print price,title,href,subTitle,deadline

            print '---------------------------------------------------------------------------------------'

            spiderDetail(href)

    except Exception,e:

        print '出錯',e.message

def spiderDetail(url):

    if url is None:

        return None

    try:

        htmlText = requests.get(url).text

        selector = etree.HTML(htmlText)

        aboutHref = selector.xpath('//*[@id="utopia_widget_10"]/div[1]/div/div/div/p[1]/a/@href')

        price = selector.xpath('//*[@id="utopia_widget_10"]/div[1]/div/div/div/p[1]/text()')

        title = selector.xpath('//*[@id="utopia_widget_10"]/div[1]/div/div/h2/text()')

        contentDetail = selector.xpath('//*[@id="utopia_widget_10"]/div[2]/div/div[1]/div[1]/text()')

        publishDate = selector.xpath('//*[@id="utopia_widget_10"]/div[2]/div/div[1]/p/text()')

        aboutHref = aboutHref[0] if len(aboutHref) > 0 else ''  # python的三目運算 :為真時的結果 if 判定條件 else 為假時的結果

        price = price[0] if len(price) > 0 else ''

        title = title[0] if len(title) > 0 else ''

        contentDetail = contentDetail[0] if len(contentDetail) > 0 else ''

        publishDate = publishDate[0] if len(publishDate) > 0 else ''

        print aboutHref,price,title,contentDetail,publishDate

    except:

      print '出錯'

if '_main_':

    getUrl()

uj5u.com熱心網友回復：

這種需求肯定是找HTTP代理啊，被BAN了立馬切代理。你的方法前提是：撥號一定會獲取不一樣的IP。更何況如果是運營商級別的局域網，不管怎么換，服務器那邊都是一個IP。

uj5u.com熱心網友回復：

666666

轉載請註明出處，本文鏈接：https://www.uj5u.com/qita/7496.html

標籤：網絡協議與配置

上一篇：VOIP安裝試用協議書下載

下一篇：如圖：網路中含有大量的TCP重復確認