我想盡可能快地對多個尿液進行網路搜索和決議,但for回圈對我來說并不快,有什么辦法可以用異步或多處理或多執行緒來實作嗎?
import grequests
from bs4 import BeautifulSoup
links1 = [] #多個鏈接
while True:
try:
reqs = (grequests.get(link) for link in links1)
resp = grequests.imap(reqs, size=25, stream=False)
for r in resp: # 我想盡可能快地運行這個for回圈,這可能嗎?
soup = BeautifulSoup(r.text, 'lxml')
parse = soup.find('div', class_='txt')
uj5u.com熱心網友回復:
如何使用multiprocessing與requests/BeautifulSoup的例子:
import requests
from tqdm import tqdm# for pretty progress bar
from bs4 import BeautifulSoup
from multiprocessing import Pool
# some 1000 links to analyze[/span]。
links1 = [
"https://en.wikipedia.org/wiki/2021_Moroccan_general_election"。
"https://en.wikipedia.org/wiki/Tangerang_prison_fire"。
"https://en.wikipedia.org/wiki/COVID-19_pandemic"。
"https://en.wikipedia.org/wiki/Yolanda_Fernández_de_Cofiño"。
] 250, "", ].
def parse(url)。
soup = BeautifulSoup(requests.get(url).content, "html.parser")
return soup.select_one("h1").get_text( strip=True)
if __name__ == "__main__"。
with Pool() as p。
out = []
for r in tqdm(p.imap(parse, links1), total=len(link1))。
out.append(r)
print(len(out))
以我的網路連接/CPU(Ryzen 3700x),我能夠在30秒內獲得所有1000個鏈接的結果:
100%|██████████| 1000/1000 [00: 30<00:00, 33. 12it/s]
1000,1000。
我所有的CPU都被利用了(來自htop的螢屏截圖):
轉載請註明出處,本文鏈接:https://www.uj5u.com/caozuo/308336.html
標籤:

