寫在前面
鄙人不才,博客訪問數寥寥,僅有的幾個評論還幾乎都是來撈經驗的,于是乎我便寫了一個給別人博客評論的腳本,講究個禮尚往來,
主要的步驟分為:獲取博客鏈接 - 登錄 - 評論,
前置操作
需要用到 selenium 和 Beautifulsoup 兩個庫,通過以下命令安裝:
pip install selenium
pip install beautifulsoup4
另外還需要安裝 Chromedriver,網上已經說得很詳細了,解壓后把 .exe 放在 python 目錄下即可,
selenium的安裝和操作詳解
獲取博客鏈接
(我猜測)https://blog.csdn.net 推送的博文應該是每天更新的,于是我們可以從這里獲得 url,
注意,通過 chromedriver/requests 啟動的訪問默認是不帶 cookie 的,因此雖然在你的瀏覽器上每次訪問 https://blog.csdn.net 得到的推送不一樣,但是通過它們打開的網頁始終是一樣的,
想要獲得全部博文的 url,需要獲取一批具有某一特征的標簽,F12 找到我們需要的部分,右鍵,如圖,

選中賦值 selector 即可得到 #feedlist_id > li:nth-child(1) > div > div > h2 > a,因此這一批標簽可以用 selector 描述為:#feedlist_id > li > div > div > h2 > a,呼叫 soup.select('#feedlist_id > li > div > div > h2 > a') 即可獲取,
代碼如下:
headers = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.80 Safari/537.36 Edg/86.0.622.48'}
req = requests.get(url = r'https://blog.csdn.net', headers = headers) # 親測必須要偽裝
html = req.text
soup = BeautifulSoup(html, features = "lxml")
hrefs = soup.select('#feedlist_id > li > div > div > h2 > a') # 通過selector選中
hrefs = [u.get('href') for u in hrefs]
登錄
想要評論就必須要登錄,為了防止直接模擬登陸需要填寫驗證碼,最簡單的方法就是通過攜帶 cookie,
先另寫一個程式需要獲得登錄狀態的 cookie:
import time, json
from selenium import webdriver
browser = webdriver.Chrome()
browser.get('https://passport.csdn.net/login?code=public')
time.sleep(60) # 在這里手操登錄
cookiesDict = browser.get_cookies() # 獲取list的cookies
cookiesJSON = json.dumps(cookiesDict) # 轉換成字串保存
with open('csdn_cookies.txt', 'w') as fl: # 保存cookies
fl.write(cookiesJSON)
print('cookies have been saved.')
然后讀取、添加即可:
browser = webdriver.Chrome()
browser.get(r'https://blog.csdn.net') # 注意要先打開csdn再添加cookie
with open('CSDNremarker/csdn_cookies.txt', 'r') as fl:
list_cookies = json.loads(fl.read())
for cookie in list_cookies:
cookie_dict = {
'domain': '.csdn.net',
'name': cookie.get('name'),
'value': cookie.get('value'),
}
browser.add_cookie(cookie_dict)
browser.refresh() # 重繪網頁,cookie生效
評論
for href in hrefs[: 10]:
if not 'marketing' in href: # 過濾廣告
browser.get(href)
comment = browser.find_element_by_id('comment_content') # 通過id定位評論框框
comment.send_keys('很牛的啊XD.') # 內容
time.sleep(1) # 太快了不好...
botton = browser.find_element_by_xpath(r'//*[@id="rightBox"]/a/input') # 用xpath定位
botton.click()
time.sleep(2)
完整代碼
import time, requests, sys, json
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
headers = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.80 Safari/537.36 Edg/86.0.622.48'}
req = requests.get(url = r'https://blog.csdn.net', headers = headers)
html = req.text
soup = BeautifulSoup(html, features = "lxml")
hrefs = soup.select('#feedlist_id > li > div > div > h2 > a')
hrefs = [u.get('href') for u in hrefs]
options = webdriver.ChromeOptions()
options.add_argument("--disable-notifications")
browser = webdriver.Chrome(options = options)
browser.get(r'https://blog.csdn.net')
with open('CSDNremarker/csdn_cookies.txt', 'r') as fl:
list_cookies = json.loads(fl.read())
for cookie in list_cookies:
cookie_dict = {
'domain': '.csdn.net',
'name': cookie.get('name'),
'value': cookie.get('value'),
}
browser.add_cookie(cookie_dict)
browser.refresh()
for href in hrefs[: 10]:
if not 'marketing' in href:
browser.get(href)
comment = browser.find_element_by_id('comment_content')
comment.send_keys('很牛的啊XD.')
time.sleep(1)
botton = browser.find_element_by_xpath(r'//*[@id="rightBox"]/a/input')
botton.click()
time.sleep(2)
轉載請註明出處,本文鏈接:https://www.uj5u.com/houduan/261737.html
標籤:python
上一篇:Python大作業實驗一
