前言
本文的文字及圖片來源于網路,僅供學習、交流使用,不具有任何商業用途,如有問題請及時聯系我們以作處理,
以下文章來源于青燈編程 ,作者:清風
Python爬蟲、資料分析、網站開發等案例教程視頻免費在線觀看
https://space.bilibili.com/523606542
基本開發環境
- Python 3.6
- Pycharm
相關模塊的使用
import time
import os
import re
import requests
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
目標網頁分析
如何獲取視頻地址
西瓜視頻有兩種:
1、有水印視頻
2、無水印視頻
有水印視頻
在網頁源代碼中
https://www.ixigua.com/embed?group_id=6817258591586615812
這個鏈接點擊進去是視頻播放地址,
前端頁面中已有視頻真實地址
//v9-xg-web-s.ixigua.com/ac99e1bf75dd0faa6854d9e5367fac3f/5fe894d7/video/tos/cn/tos-cn-ve-4/626cf09c0830417da4b70982950cedd9/?a=1768&br=3891&bt=1297&cd=0%7C0%7C0&cr=0&cs=0&cv=1&dr=0&ds=3&er=0&l=20201227210214010204050203275E2F92&lr=default&mime_type=video_mp4&qs=0&rc=anQ3aWdzNjd2dDMzZjczM0ApPDQ2NjU8aGU3NzplMzZoNWdfMWguMmA0NWFfLS02LS9zczIwXjBfY2A2MmIvXjMyLjI6Yw%3D%3D&vl=&vr=
只要請求這個網址即可下載保存視頻,
無水印視頻
無水印的視頻下載比較麻煩,首先它是音頻和視頻畫面分離的
水印是沒有水印,但是視頻是沒有聲音的,
如何找音頻和視頻地址呢?
使用開發者工具,在XHR里面是有相對對應鏈接的
音頻地址:
https://v9-xg-web-s.ixigua.com/79457295a8a89bf86bdcd157eb848175/5fe895f4/video/tos/cn/tos-cn-vd-0026/43771a1a38ea473d9cb5b8e7c0f651f3/media-audio-und-mp4a/?a=1768&br=0&bt=0&cd=0%7C0%7C0&cr=0&cs=0&cv=1&dr=0&ds=&er=0&l=20201227210659010028033025224FC377&lr=default&mime_type=video_mp4
視頻畫面地址:
https://v9-xg-web-s.ixigua.com/9b4e18f3b29244557c83b8e88f13dd1b/5fe895f4/video/tos/cn/tos-cn-vd-0026/86a41ef8ebd3496585db455ae56b3ff3/media-video-avc1/?a=1768&br=12159&bt=4053&cd=0%7C0%7C0&cr=0&cs=0&cv=1&dr=0&ds=4&er=0&l=20201227210659010028033025224FC377&lr=default&mime_type=video_mp4
所以如果想要爬取西瓜視頻無水印版本的話,不僅要下載視頻,還要下載音頻,然后再合成視頻和音頻兩個檔案,和之前的爬取B視頻有相似之處,
西瓜視頻水印版本下載
1、獲取源代碼提取視頻播放地址以及標題
def main(html_url):
headers = {
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.138 Safari/537.36'
}
response = requests.get(url=html_url, headers=headers)
response.encoding = response.apparent_encoding
play_url = re.findall('"embedUrl":"(.*?)"', response.text)[0]
title = re.findall('<title data-react-helmet="true">(.*?)</title>', response.text)[0].replace(' - 西瓜視頻', '')
2、獲取視頻真實下載地址
這里使用selenium主要是因為,鏈接的變化規律問題,每次請求網頁的引數都不一樣,比較難以分析,但是前端網頁中是有顯示真實的視頻地址,所以可以使用selenium直接提取,
def get_video_url(html_url):
"""傳入播放地址,獲取視頻下載地址"""
chrome_options = Options()
chrome_options.add_argument('--headless')
os.system("taskkill /f /im chromedriver.exe")
driver = webdriver.Chrome(executable_path='chromedriver.exe', options=chrome_options)
driver.get(html_url)
driver.implicitly_wait(10)
video_url = driver.find_element_by_css_selector('#player_default video').get_attribute('src')
driver.close()
return video_url
3、視頻下載保存
方式一:正常保存方式
def save(video_url, video_title):
filename = 'video\\' + video_title + '.mp4'
video_headers = {
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.138 Safari/537.36'
}
video_response = requests.get(url=video_url, headers=video_headers).content
with open(filename, mode='wb') as f:
f.write(video_response)
print('正在下載保存:', video_title)
運行效果:
方式二:實作下載進度條
def progressbar(video_url, video_title):
start = time.time() # 下載開始時間
response = requests.get(video_url, stream=True) # stream=True必須寫上
size = 0 # 初始化已下載大小
chunk_size = 1024 # 每次下載的資料大小
content_size = int(response.headers['content-length']) # 下載檔案總大小
try:
if response.status_code == 200: # 判斷是否回應成功
print('Start download,[File size]:{size:.2f} MB'.format(
size=content_size / chunk_size / 1024)) # 開始下載,顯示下載檔案大小
filepath = 'video\\' + video_title + '.mp4' # 設定圖片name,注:必須加上擴展名
with open(filepath, 'wb') as file: # 顯示進度條
for data in response.iter_content(chunk_size=chunk_size):
file.write(data)
size += len(data)
print('[下載進度]:%s%.2f%%' % ('▇' * int(size * 50 / content_size), float(size / content_size * 100)),
end='\n')
end = time.time() # 下載結束時間
print('Download completed!,times: %.2f秒' % (end - start)) # 輸出下載用時時間
print(f'視頻【 {video_title} 】已經保存完畢')
except:
print('Error')
運行效果:
只要輸入視頻的ID即可下載視頻,之后也可以做一個簡單GUI桌面應用軟體,之前文章都是有寫過類似的,
完整代碼
import time
import os
import re
import requests
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
def get_video_url(html_url):
"""傳入播放地址,獲取視頻下載地址"""
chrome_options = Options()
chrome_options.add_argument('--headless')
os.system("taskkill /f /im chromedriver.exe")
driver = webdriver.Chrome(executable_path='chromedriver.exe', options=chrome_options)
driver.get(html_url)
driver.implicitly_wait(10)
video_url = driver.find_element_by_css_selector('#player_default video').get_attribute('src')
driver.close()
return video_url
# def save(video_url, video_title):
# filename = 'video\\' + video_title + '.mp4'
# video_headers = {
# 'user-agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.138 Safari/537.36'
# }
# video_response = requests.get(url=video_url, headers=video_headers).content
# with open(filename, mode='wb') as f:
# f.write(video_response)
# print('正在下載保存:', video_title)
def progressbar(video_url, video_title):
start = time.time() # 下載開始時間
response = requests.get(video_url, stream=True) # stream=True必須寫上
size = 0 # 初始化已下載大小
chunk_size = 1024 # 每次下載的資料大小
content_size = int(response.headers['content-length']) # 下載檔案總大小
try:
if response.status_code == 200: # 判斷是否回應成功
print('Start download,[File size]:{size:.2f} MB'.format(
size=content_size / chunk_size / 1024)) # 開始下載,顯示下載檔案大小
filepath = 'video\\' + video_title + '.mp4' # 設定圖片name,注:必須加上擴展名
with open(filepath, 'wb') as file: # 顯示進度條
for data in response.iter_content(chunk_size=chunk_size):
file.write(data)
size += len(data)
print('[下載進度]:%s%.2f%%' % ('▇' * int(size * 50 / content_size), float(size / content_size * 100)),
end='\n')
end = time.time() # 下載結束時間
print('Download completed!,times: %.2f秒' % (end - start)) # 輸出下載用時時間
print(f'視頻【 {video_title} 】已經保存完畢')
except:
print('Error')
def main(html_url):
headers = {
'cookie': '輸入你自己的cookie',
'referer': 'https://www.ixigua.com/?wid_try=1',
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.138 Safari/537.36'
}
response = requests.get(url=html_url, headers=headers)
response.encoding = response.apparent_encoding
play_url = re.findall('"embedUrl":"(.*?)"', response.text)[0]
title = re.findall('<title data-react-helmet="true">(.*?)</title>', response.text)[0].replace(' - 西瓜視頻', '')
video_url = get_video_url(play_url)
progressbar(video_url, title)
if __name__ == '__main__':
video_id = input('請輸入你要下載的視頻ID:')
url = f'https://www.ixigua.com/{video_id}'
main(url)
轉載請註明出處,本文鏈接:https://www.uj5u.com/houduan/241728.html
標籤:Python
上一篇:hook工具xserver
