我正在使用并發期貨來加速一個有IO約束的程序(從Wayback Machine上發現的一個URL串列中檢索H1標題。這段代碼是有效的,但它以任意的順序回傳串列。我正在尋找一種方法,以與原始串列相同的順序回傳URL。
archive_url_list = ['https://web.archive.org/web/20171220002410/http://www.manueldrivingschool.co.uk:80/areas-covered-for-driving-lessons', 'https://web.archive.org/web/20210301102140/https://www.manueldrivingschool.co.uk/contact.php', 'https://web.archive.org/web/20210301102140/https://www. manueldrivingschool.co.uk/contact.php', 'https://web.archive.org/web/20171220002415/http://www.manueldrivingschool.co.uk:80/contact', 'https://web.archive.org/web/20160520140505/http://www.manueldrivingschool.co.uk:80/about.php', 'https://web.archive.org/web/20180102123922/http://www.manueldrivingschool.co.uk:80/about']
import waybackpy
import concurrent.futures
archive_h1_list = []
def get_archive_h1(h1_url)。
html = urlopen(h1_url)
bsh = BeautifulSoup(html.read(), 'lxml'/span>)
return bsh.h1.text.strip()
def concurrent_calls() 。
with concurrent.futures.ThreadPoolExecutor(max_workers=CONNECTIONS) as executor:
f1 = (executor.submit(get_archive_h1, h1_url) for h1_url in archive_url_list)
for future in concurrent.futures.as_completed(f1):
try:
data = future.result()
archive_h1_list.append(data)
except 例外。
archive_h1_list.append("沒有收到資料!")
pass
if __name__ == '__main__'/span>:
并行呼叫()
print(archive_h1_list)
我試著在代碼運行時創建第二個串列來追加原始URL,希望能在事后把它綁回去,但我得到的只是一個空串列。我是并發期貨的新手,希望有一個標準的方法。
uj5u.com熱心網友回復:
用ThreadPoolExecutor.submit代替生成器,使用ThreadPoolExecutor.map進行排序:
def concurrent_calls()。
with concurrent.futures.ThreadPoolExecutor(max_workers=CONNECTIONS) as executor:
f1 = executor.map(get_archive_h1, archive_url_list)
...
這樣做更有效率。
轉載請註明出處,本文鏈接:https://www.uj5u.com/net/325307.html
標籤:
下一篇:多執行緒點擊宏/點擊記錄器
