繞過歐盟同意請求-有解無憂

我一直在嘗試從谷歌搜索中提取資料，但我無法繞過“在您繼續谷歌搜索之前”同意書。

我試圖找到一種解決方法，并看到其他人建議使用該引數CONSENT=PENDING 999，或者CONSENT = YES HU.hu V10 B 256在 get 請求中使用類似的東西。不幸的是，我無法使前者作業，在后一種情況下，我不完全確定應該用什么替換最后三個元素。

我從這里附上下面的示例代碼。

import requests
import bs4

headers = {'User-Agent':'Chrome 83 (Toshiba; Intel(R) Core(TM) i3-2367M CPU @ 1.40 GHz)'\
           'Windows 7 Home Premium',
           'Accept':'text/html,application/xhtml xml,application/xml;'\
           'q=0.9,image/webp,*/*;q=0.8',
           #'cookie': 'CONSENT = YES HU.hu V10 B 256' # what are the last three elements?  
           'cookie':'CONSENT=PENDING 999'
           }

text= "geeksforgeeks"
url = 'https://google.com/search?q='   text
  
request_result=requests.get( url , headers = headers) # here's where the trouble happens 

soup = bs4.BeautifulSoup(request_result.text, "html.parser")

print(soup) # not what one would expect

heading_object=soup.find_all( 'h3' ) 
  
for info in heading_object:
    print(info.getText())
    print("------")

任何幫助將非常感激。

uj5u.com熱心網友回復：

是的，Google 確實使用CONSENTcookie 來確定同意彈出視窗是否會顯示。我已經通過調整值來處理 cookie，我可以得出結論，在撰寫本文時，將CONSENTcookie 值設定YES 為足以阻止顯示同意視窗。

在您的代碼中，您嘗試通過headers引數傳遞 cookie 。我建議使用該cookies引數。

使用此調整您的代碼（并從標題中洗掉 cookie）：

request_result = requests.get( url, headers = headers, cookies = {'CONSENT' : 'YES '} )

使用我的解決方案運行代碼后的輸出：

GeeksforGeeks
------
GeeksforGeeks - YouTube
------
GeeksforGeeks | LinkedIn
------
GeeksforGeeks (@geeks_for_geeks) ? Instagram photos and videos
------
GeeksforGeeks - Twitter
------
GeeksforGeeks - Home | Facebook
------
Geeks for Geeks - Crunchbase Company Profile & Funding
------

轉載請註明出處，本文鏈接：https://www.uj5u.com/gongcheng/401839.html

標籤：Python 网页抓取饼干蟒蛇请求

上一篇：將groupby()用于Pandas中的資料幀導致索引錯誤

下一篇：將分鐘轉換為整小時，無需天