寫在前面

鄙人不才，博客訪問數寥寥，僅有的幾個評論還幾乎都是來撈經驗的，于是乎我便寫了一個給別人博客評論的腳本，講究個禮尚往來，

主要的步驟分為：獲取博客鏈接 - 登錄 - 評論，

前置操作

需要用到 selenium 和 Beautifulsoup 兩個庫，通過以下命令安裝：

pip install selenium
pip install beautifulsoup4

另外還需要安裝 Chromedriver，網上已經說得很詳細了，解壓后把 .exe 放在 python 目錄下即可，

selenium的安裝和操作詳解

獲取博客鏈接

（我猜測）https://blog.csdn.net 推送的博文應該是每天更新的，于是我們可以從這里獲得 url，

注意，通過 chromedriver/requests 啟動的訪問默認是不帶 cookie 的，因此雖然在你的瀏覽器上每次訪問 https://blog.csdn.net 得到的推送不一樣，但是通過它們打開的網頁始終是一樣的，

想要獲得全部博文的 url，需要獲取一批具有某一特征的標簽，F12 找到我們需要的部分，右鍵，如圖，
在這里插入圖片描述
選中賦值 selector 即可得到 #feedlist_id > li:nth-child(1) > div > div > h2 > a，因此這一批標簽可以用 selector 描述為：#feedlist_id > li > div > div > h2 > a，呼叫 soup.select('#feedlist_id > li > div > div > h2 > a') 即可獲取，

代碼如下：

headers = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.80 Safari/537.36 Edg/86.0.622.48'}
req = requests.get(url = r'https://blog.csdn.net', headers = headers)  # 親測必須要偽裝
html = req.text
soup = BeautifulSoup(html, features = "lxml")
hrefs = soup.select('#feedlist_id > li > div > div > h2 > a')  # 通過selector選中
hrefs = [u.get('href') for u in hrefs]

登錄

想要評論就必須要登錄，為了防止直接模擬登陸需要填寫驗證碼，最簡單的方法就是通過攜帶 cookie，

先另寫一個程式需要獲得登錄狀態的 cookie：

import time, json
from selenium import webdriver

browser = webdriver.Chrome()
browser.get('https://passport.csdn.net/login?code=public')
time.sleep(60)  # 在這里手操登錄
cookiesDict = browser.get_cookies()  # 獲取list的cookies
cookiesJSON = json.dumps(cookiesDict)  # 轉換成字串保存
with open('csdn_cookies.txt', 'w') as fl:  # 保存cookies
    fl.write(cookiesJSON)
    
print('cookies have been saved.')

然后讀取、添加即可：

browser = webdriver.Chrome()
browser.get(r'https://blog.csdn.net')  # 注意要先打開csdn再添加cookie

with open('CSDNremarker/csdn_cookies.txt', 'r') as fl:
    list_cookies = json.loads(fl.read())

for cookie in list_cookies:
    cookie_dict = {
        'domain': '.csdn.net',
        'name': cookie.get('name'),
        'value': cookie.get('value'),
    }
    browser.add_cookie(cookie_dict)
    
browser.refresh()  # 重繪網頁，cookie生效

for href in hrefs[: 10]:
    if not 'marketing' in href:  # 過濾廣告
        browser.get(href)
        comment = browser.find_element_by_id('comment_content')  # 通過id定位評論框框
        comment.send_keys('很牛的啊XD.')  # 內容
        time.sleep(1)  # 太快了不好...
        botton = browser.find_element_by_xpath(r'//*[@id="rightBox"]/a/input')  # 用xpath定位
        botton.click()
        time.sleep(2)

完整代碼

import time, requests, sys, json
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.common.keys import Keys

headers = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.80 Safari/537.36 Edg/86.0.622.48'}
req = requests.get(url = r'https://blog.csdn.net', headers = headers)
html = req.text
soup = BeautifulSoup(html, features = "lxml")
hrefs = soup.select('#feedlist_id > li > div > div > h2 > a')
hrefs = [u.get('href') for u in hrefs]


options = webdriver.ChromeOptions()
options.add_argument("--disable-notifications")
browser = webdriver.Chrome(options = options)
browser.get(r'https://blog.csdn.net')

with open('CSDNremarker/csdn_cookies.txt', 'r') as fl:
    list_cookies = json.loads(fl.read())

for cookie in list_cookies:
    cookie_dict = {
        'domain': '.csdn.net',
        'name': cookie.get('name'),
        'value': cookie.get('value'),
    }
    browser.add_cookie(cookie_dict)
    
browser.refresh()

for href in hrefs[: 10]:
    if not 'marketing' in href:
        browser.get(href)
        comment = browser.find_element_by_id('comment_content')
        comment.send_keys('很牛的啊XD.')
        time.sleep(1)
        botton = browser.find_element_by_xpath(r'//*[@id="rightBox"]/a/input')
        botton.click()
        time.sleep(2)

轉載請註明出處，本文鏈接：https://www.uj5u.com/houduan/261737.html

標籤：python

上一篇：Python大作業實驗一

下一篇：Python OpenCV 影像的二值化操作再次學習與影像平滑處理（卷積處理）

selenium實作CSDN自動評論

寫在前面

前置操作

獲取博客鏈接

登錄

評論

完整代碼