使用下面的示例資料和代碼,我試圖按年-月分組并在所有列中找到具有最小標準值的前 K列 endwith _values:
import pandas as pd
import numpy as np
from statistics import stdev
np.random.seed(2021)
dates = pd.date_range('20130226', periods=90)
df = pd.DataFrame(np.random.uniform(0, 10, size=(90, 6)), index=dates, columns=['A_values', 'B_values', 'C_values', 'D_values', 'E_values', 'target'])
k = 3 # set k as 3
value_cols = df.columns[df.columns.str.endswith('_values')]
def find_topK_smallest_std(group):
std = stdev(group[value_cols])
cols = std.nsmallest(k).index
out_cols = [f'std_{i 1}' for i in range(k)]
rv = group.loc[:, cols]
rv.columns = out_cols
return rv
df.groupby(pd.Grouper(freq='M'), dropna=False).apply(find_topK_smallest_std)
但它引發了一個型別錯誤,我該如何解決這個問題?在此表示衷心的感謝。
出去:
TypeError: can't convert type 'str' to numerator/denominator
參考鏈接:
Groupby year-month 并在Python中找到前N個最小值列
uj5u.com熱心網友回復:
在您的解決方案中DataFrame.apply為stdev每列添加,如果需要每行添加axis=1:
def find_topK_smallest_std(group):
#procssing per columns
std = group[value_cols].apply(stdev)
cols = std.nsmallest(k).index
out_cols = [f'std_{i 1}' for i in range(k)]
rv = group.loc[:, cols]
rv.columns = out_cols
return rv
df = df.groupby(pd.Grouper(freq='M'), dropna=False).apply(find_topK_smallest_std)
print (df)
std_1 std_2 std_3
2013-02-26 7.333694 3.126731 1.389472
2013-02-27 7.529254 7.843101 6.621605
2013-02-28 6.165574 5.612724 0.866300
2013-03-01 5.693051 3.711608 4.521452
2013-03-02 7.322250 4.763135 5.178144
... ... ...
2013-05-22 8.795736 3.864723 6.316478
2013-05-23 7.959282 5.140268 1.839659
2013-05-24 5.412016 5.890717 9.081583
2013-05-25 1.088414 1.610210 9.016004
2013-05-26 4.930571 6.893207 2.338785
[90 rows x 3 columns]
轉載請註明出處,本文鏈接:https://www.uj5u.com/gongcheng/366383.html
