我想從回應中提取鏈接。
請求:
import requests
headers = {
'authority': 'www.xxxxxx.net',
'sec-ch-ua': '"Google Chrome";v="95", "Chromium";v="95", ";Not A Brand";v="99"',
'accept': 'text/javascript, application/javascript, application/ecmascript, application/x-ecmascript, */*; q=0.01',
'x-requested-with': 'XMLHttpRequest',
'sec-ch-ua-mobile': '?0',
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like
Gecko) Chrome/95.0.4638.54 Safari/537.36',
'sec-ch-ua-platform': '"Windows"',
'sec-fetch-site': 'same-origin',
'sec-fetch-mode': 'cors',
'sec-fetch-dest': 'empty',
'referer': 'https://www.xxxxxx.net/',
'accept-language': 'en-GB,en-US;q=0.9,en;q=0.8',
'cookie': 'bnState={"impressions":1,"delayStarted":0};
pnState="impressions":2,"delayStarted":1635254046187}',
}
params = (
('alt', 'json-in-script'),
('max-results', '12'),
('start-index', '13'),
('callback', 'jQuery22404432064732296963_1635254045161'),
('_', '1635254045166'),
)
url='https://www.xxxxxx.net/feeds/posts/default?alt=json-in-script&max-results=12&start-index=1&callback=jQuery22404432064732296963_1635254045161&_=1635254045166'
response = requests.get('https://www.xxxxxx.net/feeds/posts/default', params=params)
列印(回應。文本)
回應:
googleusercontent.com/img/a/AVvXsEgPYR-GZ9U9-k1L7h3J4MSR1bWYocsC5PbI-bCPjF2Yb3d73n-rGbJQwZqzRqCfPrlzPdPrtzjWWTPvrYFBGl-gcO6cPiccSygST8yR23o6z4Tq8ptl4vVaeduWYfxAsRdh6gvVsjCpIfiWod9Qd_--wU"/\u003E\u003C/a\u003E\u003C/div\u003E\u003Cdiv類=“分隔符”風格=“明確:兩者; “\ u003E \ u003Ca HREF =” https://blogger.googleusercontent.com/img/a/AVvXsEjEV6skKy5be_5LoMzHD-AeZWFV80c7KXV4BVpS7KTKkNTzl0U5-itDje-DbDgE0KHuoGI3ePDmfn_0AQMP1BjXPx2nn4mB1jUI9Rb7u9NQNMURGSAmk4aQK7h8qqiGH_lafBcHeNupHrm “風格=” 顯示:塊; 填充:1em 0; 文本對齊:居中;"\u003E\u003Cimg alt="我正在嘗試使用 Python 進行網路抓取,并提出了如下所示的請求并得到了回應。但不知道如何處理" border="0" data-original-height="2048" data-original-width="1367" src="https://blogger.googleusercontent. COM / IMG /一個/ AVvXsEjEV6skKy5be_5LoMzHD-AeZWFV80cQNKXV4BVpS7KTKkNTzl0U5-itDje-DbDgIS8A18QP7aVvME1wzZMb53ePDmfn_0AQMP1BjXRGSAmk4aQK7h8qqiGH_lafBcHeNupHrm “/ \ u003E \ u003C / A \ u003E \ u003C /格\ u003E \ u003Cdiv類=” 分離器”的風格= “明確:兩者;” \ u003E \ u003Ca HREF = "https://blogger.googleusercontent.com/img/a/AVvXsEhvK-fVZGPmgnkif5OWAMDk-d22Y73FDLYRSXQQe4AYOazvk25-0DQ-o4XX35meuORitAk7WoN1vKSLdtH_P1wTa91B99GhloyFoYEZZGPmgnkif5WAMDk-d22Y73FDLYRSXQQe4AYOazvk25-0DQ-o4XX35meuORitAk7WoN1vKSLdtH_P1wTa91B99GhloyFoYEz 填充:1em 0; 文本對齊:居中;com/img/a/AVvXsEhvK-fVZGPmgnkif5OWAMDk-d22Y73FDLYRSXQQe4AYOazvk25-0DQ-o4XX35meuORitAk7WoN1vKSLdtH_P1wTa91B94vAI4ZGhlho0yoLuX_display:block 填充:1em 0; 文本對齊:居中;com/img/a/AVvXsEhvK-fVZGPmgnkif5OWAMDk-d22Y73FDLYRSXQQe4AYOazvk25-0DQ-o4XX35meuORitAk7WoN1vKSLdtH_P1wTa91B94vAI4ZGhlho0yoLuX-display:block 填充:1em 0; 文本對齊:居中;
注意:請告訴我如何處理回應。另請注意,出于隱私原因,我更改了網址。
在此先感謝您的幫助。
uj5u.com熱心網友回復:
如果您正在進行網路抓取,我強烈建議您使用BeautifulSoup庫來決議您的回應。如下圖初始化:
from bs4 import BeautifulSoup
response = "" # your response
soup = BeautifulSoup(response) # Parse response and save it into a variable
獲取所有href:
hrefs = soup.find_all(href=True)
links = [i['href'] for i in hrefs] # An array with all your links
uj5u.com熱心網友回復:
這是一個包含 unicode 內容的“ascii”字串。您需要先將其轉換為普通的“unicode”字串。嘗試這個:
html_content = bytes(response.text, "ascii").decode("unicode-escape")
之后,您將獲得“HTML/XML”格式的普通字串。然后你就可以使用“ BeautifulSoup4 ”來決議它并獲取你需要的內容。
轉載請註明出處,本文鏈接:https://www.uj5u.com/ruanti/338004.html
上一篇:使用Python跟蹤累積值
下一篇:從嵌套串列的串列理解構建字典
