開始擼代碼：

1、匯入工具

import requests
import parsel

2、偽造瀏覽器的環境

headers = {
    # "Cookie": "bcolor=; font=; size=; fontcolor=; width=; Hm_lvt_3806e321b1f2fd3d61de33e5c1302fa5=1596800365,1596800898; Hm_lpvt_3806e321b1f2fd3d61de33e5c1302fa5=1596802442",
    "Host": "www.shuquge.com",
    "Referer": "http://www.shuquge.com/txt/8659/index.html",
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.92 Safari/537.36",
}

3、決議網站，爬取小說

def download_one_chapter(url_chapter, book):
    """爬取一章小說"""
    # 從瀏覽器里面分析出來的
    response = requests.get(url_chapter, headers=headers)
    # response.apparent_encoding
    # 自適應編碼,萬能的  正確率是百分之 99%
    response.encoding = response.apparent_encoding
    # print(response.text)
    """提取資料"""
    """ 
    工具  bs4 parsel
    
    xpath
    css
    re
    """
    # 把html轉化為提取物件
    # 標簽重復怎么辦 id class 怎么二次進行提取
    sel = parsel.Selector(response.text)
    h1 = sel.css('h1::text')
    title = h1.get()
    print(title)

    content = sel.css('#content ::text').getall()
    # print(content)
    # text = "".join(content)
    # print(text)
    # w write 寫入
    """寫入資料"""
    # with open(title + '.txt', mode='w', encoding='utf-8') as f:
    with open(book + '.txt', mode='w', encoding='utf-8') as f:
        f.write(title)
        f.write('\n')
        for line in content:
            f.write(line.strip())
            f.write('\n')
"""爬取一本小說 會有很多章"""
# download_one_chapter('http://www.shuquge.com/txt/8659/2324752.html')
# download_one_chapter('http://www.shuquge.com/txt/8659/2324753.html')
def download_one_book(book_url):
    response = requests.get(book_url, headers=headers)
    response.encoding = response.apparent_encoding
    html = response.text
    sel = parsel.Selector(html)
    title = sel.css('h2::text').get()

    index_s = sel.css('body > div.listmain > dl > dd > a::attr(href)').getall()
    print(index_s)
    for index in index_s:
        print(book_url[:-10] + index)
        one_chapter_url = book_url[:-10] + index
        download_one_chapter(one_chapter_url, title)

1. 例外不會 try except

2. 錯誤重試報錯之后,重新嘗試,或者是記錄下來,重新請求

下載一本小說需要哪些東西

download_one_book('http://www.shuquge.com/txt/8659/index.html')
download_one_book('http://www.shuquge.com/txt/122230/index.html')
download_one_book('http://www.shuquge.com/txt/117456/index.html')

根據每一章的地址下載每一章小說根據每一本小說的目錄頁下載一個本小說

下載整個網站的小說 -> 下載所有類別的小說 -> 下載每一個類別下面的每一頁小說

運行代碼后的效果：

轉載請註明出處，本文鏈接：https://www.uj5u.com/houduan/61870.html

標籤：Python

下一篇：python---rsa加密根據指數和模生成加密引數模板--有填充

Python小白爬蟲入門的第一個案例：爬取全站小說

開始擼代碼：