我正試圖使用waybackpy庫從回溯機中獲取一個URL串列。問題是,它的速度非常慢,我認為可以通過多執行緒來加快速度。
我知道為什么我的代碼不能作業(每個執行緒都在函式中對同一個串列進行迭代),但我不知道如何使其作業。以下是我的代碼:
import waybackpy
url_list = ["https://www.google.com"/span>, "https://www.facebook.com"/span>, "https://www.wikipedia. com", "https://www.walmart.com/", "https://www.ebay.com/", "https://www.amazon.com"]
def get_archive_url(threads)。
counter = 1
for url in url_list:
try:
target_url = waybackpy.Url(url, user_agent)
newest_archive = target_url.newest()
archive_url_list.append(newest_archive)
counter = counter 1
except 例外。
return("Error Retrieving URL from Archive.org"/span>)
passwith concurrent.futures.ThreadPoolExecutor() as executor:
f1 = executor.submit(get_archive_url, 2)
print(f1.result()
我無法找出一種方法來分割出串列,以便將其分配給不同的執行緒。我已經搜索并嘗試了這里的許多頂級答案,但我無法想出辦法,也無法使其發揮作用。
uj5u.com熱心網友回復:
你是對的,我也總是發現并發期貨有點難以理解,但正如你所說的,你在錯誤的地方進行了回圈,所以整個回圈是發生在一個單執行緒中。你可以試試這樣的方法:
import concurrent.futures
import waybackpy
CONNECTIONS = 2 # 增加這個數字以在同一時間運行更多的作業。
user_agent = "Mozilla/5.0 (Windows NT 5.1; rv:40.0) Gecko/20100101 Firefox/40.0"
url_list = ["https://www.google.com"/span>, "https://www.facebook.com"/span>, "https://www.wikipedia. com"/span>, "https://www.walmart.com/"/span>, "https://www.ebay.com/"/span>, "https://www.amazon.com"/span>]
archive_url_list = []
def get_archive_url(url)。
target_url = waybackpy.Url(url, user_agent)
newest_archive = target_url.newest()
return newest_archive
def concurrent_calls() 。
with concurrent.futures.ThreadPoolExecutor(max_workers=CONNECTIONS) as executor:
f1 = (executor.submit(get_archive_url, url) for url in url_list)
for future in concurrent.futures.as_completed(f1):
try:
data = future.result().archive_url
except Exception as e:
data = ('error', e)
finally:
archive_url_list.append(data)
print(data)
if __name__ == '__main__'/span>:
并行呼叫()
print(archive_url_list)
轉載請註明出處,本文鏈接:https://www.uj5u.com/gongcheng/325310.html
標籤:
