我有一個網頁抓取功能,可以獲取 190 個 URL 的資料。為了快速完成它,我使用了 concurrent.future.Threadpool.executor。我將該資料保存到 SQL Server 資料庫。從上午 9 點到下午 4 點,我必須每 3 分鐘重復一次這些程序。但是當我使用 while 回圈或調度程式時,并發未來不起作用。沒有錯誤也沒有輸出。
# required libraries
import request
urls = []
def data_fetched(url):
# data fetching
# operations on data
# data saving to SQL server
return ''
while True:
with concurrent.future.ThreadPool.executor() as executor:
executor.map(data_fetched, url)
time.sleep(60)
我想每 3 分鐘重復一次所有這些事情,解釋代碼流。請幫助我如何安排它。
start = dt.strptime("09:15:00", "%H:%M:%S")
end = dt.strptime("15:30:00", "%H:%M:%S")
# min_gap
min_gap = 3
# compute datetime interval
arr = [(start timedelta(hours=min_gap*i/60)).strftime("%H:%M:%S")
for i in range(int((end-start).total_seconds() / 60.0 / min_gap))]
while True:
weekno = datetime.datetime.today().weekday()
now = dt.now() # gets current datetime
hour = str(now.hour) # gets current hour
minute = str(now.minute) # gets current minute
second = str(now.second)
current_time = f"{hour}:{minute}:{second}" # combines current hour and minute
# checks if current time is in the hours list
if weekno < 5 and current_time in arr:
print('data_loaded')
else: # 5 Sat, 6 Sun
pass
time.sleep(60)
因此,在這些 while 回圈下,我想使用 concurrent.futures 呼叫該函式。
uj5u.com熱心網友回復:
您可以創建一個單獨的函式并安排它執行data_fetched(). 我希望您的urls變數包含 url 串列而不是空串列。
from schedule import every, repeat, run_pending
import time
import request
urls = []
def data_fetched(url):
# data fetching
# operations on data
# data saving to SQL server
return ''
@repeat(every(3).minutes)
def execute_script():
with concurrent.future.ThreadPool.executor() as executor:
executor.map(data_fetched, urls)
while True:
run_pending()
time.sleep(1)
轉載請註明出處,本文鏈接:https://www.uj5u.com/qita/524470.html
