我想知道什么是實作 Concurrent Futures 以遍歷New Program的大量股票的好方法。
在我之前的程式中,我嘗試使用并發期貨,但在列印資料時它不一致。例如,當運行一個大的股票串列時,它每次都會給出不同的資訊(正如你在前面程式的輸出 1 和 2 中看到的那樣)。我想提供我之前的程式,看看我在實作并發期貨時做錯了什么。
謝謝!
新節目
tickers = ["A","AA","AAC","AACG","AACIU","AADI","AAIC","AAIN","AAL","AAMC","AAME","AAN","AAOI","AAON","AAP","AAPL"]
def create_df(tickers):
all_info = []
for ticker in tickers:
all_info.append(yf.Ticker(ticker).info)
df = pd.DataFrame.from_records(all_info)
df = df[['symbol','ebitda', 'enterpriseValue', 'trailingPE', 'sector']]
df.dropna(inplace=True)
# This is where you can add calculations and other columns not in Yfinance Library
df['EV/Ratio'] = df['enterpriseValue'] / df['ebitda']
return df
df = create_df(tickers)
print(df)
print('It took', time.time()-start, 'seconds.')
輸出
symbol ebitda enterpriseValue trailingPE sector EV/Ratio
0 A 1.762000e 09 5.311271e 10 60.754720 Healthcare 30.143422
9 AAMC -2.015600e 07 1.971329e 08 1.013164 Financial Services -9.780359
10 AAME 2.305600e 07 1.175756e 08 7.652329 Financial Services 5.099566
11 AAN 8.132960e 08 1.228469e 09 9.329710 Consumer Cyclical 1.510483
13 AAON 1.178790e 08 3.501286e 09 55.615944 Industrials 29.702376
14 AAP 1.239876e 09 1.609877e 10 25.986680 Consumer Cyclical 12.984181
15 AAPL 1.109350e 11 2.489513e 12 33.715443 Technology 22.441190
It took 101.81006002426147 seconds.
上期節目供參考
tickers = ["A","AA","AAC","AACG","AACIU","AADI","AAIC","AAIN","AAL","AAMC","AAME","AAN","AAOI","AAON","AAP","AAPL"]
start = time.time()
col_a = []
col_b = []
col_c = []
col_d = []
print('Lodaing Data... Please wait for results')
def do_something(tickers):
print('---', tickers, '---')
all_info = yf.Ticker(tickers).info
try:
a = all_info.get('ebitda')
b = all_info.get('enterpriseValue')
c = all_info.get('trailingPE')
d = all_info.get('sector')
except:
None
col_a.append(a)
col_b.append(b)
col_c.append(c)
col_d.append(d)
return
with concurrent.futures.ThreadPoolExecutor() as executer:
executer.map(do_something, tickers)
# Dataframe Set Up
pd.set_option("display.max_rows", None)
df = pd.DataFrame({
'Ticker': tickers,
'Ebitda': col_a,
'EnterpriseValue' :col_b,
'PE Ratio': col_c,
'Sector': col_d,
})
print(df.dropna())
print(len('Total Companies with Information'))
print('It took', time.time()-start, 'seconds.')
先前程式的輸出 1
Ticker Ebitda EnterpriseValue PE Ratio Sector
1 AA 1.651000e 09 5.031802e 10 49.183292 Healthcare
3 AACG 2.216000e 09 1.168140e 10 11.711775 Basic Materials
5 AADI 1.928800e 07 1.108360e 08 6.954397 Financial Services
7 AAIN 1.128370e 08 3.960835e 09 57.706764 Industrials
8 AAL 8.303301e 08 1.103969e 09 9.111819 Consumer Cyclical
10 AAME 1.202330e 11 2.534678e 12 26.737967 Technology
12 AAOI -1.848400e 07 1.277540e 08 0.355233 Financial Services
14 AAP 1.224954e 09 1.770882e 10 26.059464 Consumer Cyclical
32
It took 4.2548089027404785 seconds.
先前程式的輸出 2
Ticker Ebitda EnterpriseValue PE Ratio Sector
0 A -1.848400e 07 1.277540e 08 0.355233 Financial Services
4 AACIU 1.202330e 11 2.534678e 12 26.737967 Technology
5 AADI 1.651000e 09 5.031802e 10 49.183292 Healthcare
7 AAIN 1.128370e 08 3.960835e 09 57.706764 Industrials
9 AAMC 8.303301e 08 1.103969e 09 9.111819 Consumer Cyclical
10 AAME 2.216000e 09 1.168140e 10 11.711775 Basic Materials
13 AAON 1.224954e 09 1.770882e 10 26.059464 Consumer Cyclical
14 AAP 1.928800e 07 1.108360e 08 6.954397 Financial Services
32
It took 4.003742933273315 seconds.
uj5u.com熱心網友回復:
您有一個多執行緒程式。函式 ThreadPoolExecutor.map 啟動多個并發運行的執行緒。每個執行緒都包含一次對 do_something() 的呼叫,但您無法控制這些執行緒執行或完成的順序。出現問題是因為您將結果(a、b、c、d)附加到 do_something 中的各個串列 col_a、col_b 等。這些串列是全域的,因此資料以或多或少的隨機順序附加到它們。甚至有可能在四次 append() 呼叫中間發生執行緒切換。因此資料的順序將是隨機的,并且各個行可能會混亂。
股票代碼串列被添加到主執行緒中的資料幀中。所以交易品種串列和資料本身是不同步的。這正是你觀察到的。
最簡單的解決方案是在主執行緒中設定所有資料結構。這很容易做到,因為函式 map() 回傳一個迭代器,并且保證迭代順序被保留。迭代器遍歷 do_something() 回傳的值。因此,與其嘗試更新該函式中的 col_a、col_b 等串列,不如將值 a、b、c、d 作為元組回傳。回到主執行緒,獲取這些值并將它們附加到列中。
不同執行緒的執行順序不受控制,但是map()會為你整理出來;它首先收集所有結果,然后按順序遍歷它們。
更改程式的這一部分 - 其他一切都可以保持不變。
def do_something(tickers):
print('---', tickers, '---')
all_info = yf.Ticker(tickers).info
try:
a = all_info.get('ebitda')
b = all_info.get('enterpriseValue')
c = all_info.get('trailingPE')
d = all_info.get('sector')
except:
return None, None, None, None # must return a 4-tuple
return a, b, c, d
with concurrent.futures.ThreadPoolExecutor() as executer:
for a, b, c, d in executer.map(do_something, tickers):
col_a.append(a)
col_b.append(b)
col_c.append(c)
col_d.append(d)
uj5u.com熱心網友回復:
這是 @iudeen 提供的關于如何對New Function實作多執行緒的答案
import pandas as pd
import yfinance as yf
from concurrent.futures import ThreadPoolExecutor
import time
from stocks import tickers
start = time.time()
print('Lodaing Data... Please wait for results')
all_info = []
def create_df(ticker):
all_info.append(yf.Ticker(ticker).info)
with ThreadPoolExecutor(max_workers=10) as executor:
futures = [executor.submit(create_df, x) for x in tickers]
df = pd.DataFrame.from_records(all_info)
df = df[['symbol','ebitda', 'enterpriseValue', 'trailingPE', 'sector']]
df.dropna(inplace=True)
df['EV/Ratio'] = df['enterpriseValue'] / df['ebitda']
print(df)
print('It took', time.time()-start, 'seconds.')
轉載請註明出處,本文鏈接:https://www.uj5u.com/ruanti/357485.html
