【python】美女在召喚，python批量采集~-有解無憂

前言

嗨嘍~大家好呀，這里是魔王吶 !

知識點:

動態資料抓包
requests發送請求
json資料決議

開發環境：

python 3.8

運行代碼
pycharm 2021.2

輔助敲代碼
requests

pip install requests

思路分析

如何去實作一個案例:

簡單的基礎知識點內容比較多的案例

批量采集互聯網當中資料

原理: 模擬瀏覽器/客戶端向服務器發送網路請求

第一個步驟

找到資料來源

實作代碼:

發送請求
獲取資料
決議資料
保存資料

?? 博主所有文章素材、解答、原始碼領取處：點擊

代碼展示

匯入模塊

import requests         # 發送請求 第三方模塊
import re

請求頭

# 請求體 也得是字典?
json = {
    'operationName': "visionSearchPhoto",
    'query': "fragment photoContent on PhotoEntity {\n  id\n  duration\n  caption\n  originCaption\n  likeCount\n  viewCount\n  realLikeCount\n  coverUrl\n  photoUrl\n  photoH265Url\n  manifest\n  manifestH265\n  videoResource\n  coverUrls {\n    url\n    __typename\n  }\n  timestamp\n  expTag\n  animatedCoverUrl\n  distance\n  videoRatio\n  liked\n  stereoType\n  profileUserTopPhoto\n  musicBlocked\n  __typename\n}\n\nfragment feedContent on Feed {\n  type\n  author {\n    id\n    name\n    headerUrl\n    following\n    headerUrls {\n      url\n      __typename\n    }\n    __typename\n  }\n  photo {\n    ...photoContent\n    __typename\n  }\n  canAddComment\n  llsid\n  status\n  currentPcursor\n  tags {\n    type\n    name\n    __typename\n  }\n  __typename\n}\n\nquery visionSearchPhoto($keyword: String, $pcursor: String, $searchSessionId: String, $page: String, $webPageArea: String) {\n  visionSearchPhoto(keyword: $keyword, pcursor: $pcursor, searchSessionId: $searchSessionId, page: $page, webPageArea: $webPageArea) {\n    result\n    llsid\n    webPageArea\n    feeds {\n      ...feedContent\n      __typename\n    }\n    searchSessionId\n    pcursor\n    aladdinBanner {\n      imgUrl\n      link\n      __typename\n    }\n    __typename\n  }\n}\n",
    'variables': {'keyword': "換裝", 'pcursor': "", 'page': "search"}
}

json和字典是兩個東西

json和字典可以互相轉換型別

到底什么是json?

json是一種資料互動格式

前后端資料互動

前端:網頁

后端:資料傳輸

json在Python里面其實就是字串內容轉換成字典

日常生活中所用的字典

通過拼音找到字

通過:前面的找到:后面的內容

{“A”:“123”, “B”:{“C”:{“D”:“”}}}[“B”][“C”][“D”]

發送請求

response = requests.post(url=url, headers=headers, json=json)

獲取資料

<Response [200]>: 請求成功

.text: 字串

.json(): 字典資料

.content: 獲取二進制資料視頻/音頻/圖片

json_dict = response.json()

決議資料

feeds = json_dict['data']['visionSearchPhoto']['feeds']
# len(feeds): 測量feeds的長度
for i in range(0, len(feeds)):
    photoUrl = feeds[i]['photo']['photoUrl']
    caption = feeds[i]['photo']['caption']
    caption = re.sub('[\\\/:*?"<>|\\n]', '_', caption)
    print(caption, photoUrl)

保存資料

    video_data = https://www.cnblogs.com/Qqun261823976/archive/2022/11/08/requests.get(photoUrl).content
    with open(f'video/{caption}.mp4', mode='wb') as f:
        f.write(video_data)