我想從網站下載視頻。
這是我的代碼。每次我運行此代碼時,它都會回傳空白檔案。這是實時代碼:https ://colab.research.google.com/drive/19NDLYHI2n9rG6KeBCiv9vKXdwb5JL9Nb?usp=sharing
from bs4 import BeautifulSoup
import requests
url = requests.get("https://www.mxtakatak.com/xt0.3a7ed6f84ded3c0f678638602b48bb1b840bea7edb3700d62cebcf7a400d4279/video/20000kCCF0")
page = url.content
soup = BeautifulSoup(page, "html.parser")
#print(soup.prettify())
result = soup.find_all('video', class_="video-player")
print(result)
uj5u.com熱心網友回復:
使用正則運算式
import requests
import re
response = requests.get("....../video/20000kCCF0")
videoId = '20000kCCF0'
videos = re.findall(r'https://[^"] ' videoId '[^"] mp4', response.text)
print(videos)
uj5u.com熱心網友回復:
你總是得到一個空白的回報,因為soup.find_all()沒有找到任何東西。也許您應該手動檢查收到的 url.content,然后決定要查找的內容find_all()
編輯:經過一番挖掘后,我發現了如何獲得content_url_orig:
from bs4 import BeautifulSoup
import requests
import json
url = requests.get("https://www.mxtakatak.com/xt0.3a7ed6f84ded3c0f678638602b48bb1b840bea7edb3700d62cebcf7a400d4279/video/20000kCCF0")
page = url.content
soup = BeautifulSoup(page, "html.parser")
result = str(soup.find_all('script')[1]) #looking for script tag inside the html-file
result = result.split('window._state = ')[1].split("</script>']")[0].split('\n')[0]
#separating the json from the whole script-string, digged around in the file to find out how to do it
result = json.loads(result)
#navigating in the json to get the video-url
entity = list(result['entities'].items())[0][1]
download_url = entity['content_url_orig']
print(download_url)
有趣的旁注:如果我正確閱讀了JSON,您可以找到創建者上傳的所有帶有下載 URL 的視頻 :)
轉載請註明出處,本文鏈接:https://www.uj5u.com/net/455140.html
