我想使用 ThreadPoolExecutor 從下面的鏈接中讀取不同數字的不同頁面,并將相關數字作為新列保存到資料框中。
https://booking.snav.it/api/v1/rates/1030/2019-02-25/1042/2019-02-25?lang=1
數字變化如下:
from concurrent.futures import ThreadPoolExecutor, as_completed
from pandas import json_normalize
import pandas as pd
import requests
def download_file(url):
url_info = requests.get(url, stream=True)
jdata = url_info.json()
return jdata
nums = [1030,1031,1040,1050,1020,1021,1010,1023]
urls= [f"https://booking.snav.it/api/v1/rates/{i}/2019-02-25/1042/2019-02-25?lang=1" for i in nums]
with ThreadPoolExecutor(max_workers=14) as executor:
for url in urls:
sleep(0.1)
processes.append(executor.submit(download_file, url))
for index, task in enumerate(as_completed(processes)):
jdata = task.result()
tmp = json_normalize(jdata)
tmp["num"] = nums[index]
df = df.append(tmp)
print(df.head())
在上面的代碼中,我嘗試使用多執行緒讀取資料,并將每個 json 回應的相關數字作為df資料幀的新列讀取。但是這段代碼不起作用,因為使用多執行緒,nums的數字順序與抓取的 json 回應不同。我該怎么辦?
uj5u.com熱心網友回復:
試試這個:
from concurrent.futures import ThreadPoolExecutor
...
with ThreadPoolExecutor(max_workers=14) as executor:
rv = executor.map(download_file, urls)
for index, jdata in enumerate(rv):
tmp = json_normalize(jdata)
tmp["num"] = nums[index]
df.append(tmp)
print(df.head())
轉載請註明出處,本文鏈接:https://www.uj5u.com/houduan/382692.html
