資料框看起來像:
decade rain snow
1910 0.2 0.2
1910 0.3 0.4
2000 0.4 0.5
2010 0.1 0.1
我希望在 python 中運行一個函式來比較給定列的十年組合的一些幫助。這個函式很好用,除了不接受輸入列,如雨或雪。
from itertools import combinations
def ttest_run(c1, c2):
results = st.ttest_ind(cat1, cat2,nan_policy='omit')
df = pd.DataFrame({'dec1': c1,
'dec2': c2,
'tstat': results.statistic,
'pvalue': results.pvalue},
index = [0])
return df
df_list = [ttest_run(i, j) for i, j in combinations(data['decade'].unique().tolist(), 2)]
final_df = pd.concat(df_list, ignore_index = True)
uj5u.com熱心網友回復:
我想你想要這樣的東西:
import pandas as pd
from itertools import combinations
from scipy import stats as st
d = {'decade': ['1910', '1910', '2000', '2010', '1990', '1990', '1990', '1990'],
'rain': [0.2, 0.3, 0.3, 0.1, 0.1, 0.2, 0.3, 0.4],
'snow': [0.2, 0.4, 0.5, 0.1, 0.1, 0.2, 0.3, 0.4]}
df = pd.DataFrame(data=d)
def all_pairwise(df, compare_col = 'decade'):
decade_pairs = [(i,j) for i, j in combinations(df[compare_col].unique().tolist(), 2)]
# or add a list of colnames to function signature
cols = list(df.columns)
cols.remove(compare_col)
list_of_dfs = []
for pair in decade_pairs:
for col in cols:
c1 = df[df[compare_col] == pair[0]][col]
c2 = df[df[compare_col] == pair[1]][col]
results = st.ttest_ind(c1, c2, nan_policy='omit')
tmp = pd.DataFrame({'dec1': pair[0],
'dec2': pair[1],
'tstat': results.statistic,
'pvalue': results.pvalue}, index = [col])
list_of_dfs.append(tmp)
df_stats = pd.concat(list_of_dfs)
return df_stats
df_stats = all_pairwise(df)
df_stats
Nan現在,如果您執行該代碼,您將在計算導致輸出中的 s 的t 統計量時從太少的資料點發生除以 0 錯誤的運行時警告
>>> df_stats
dec1 dec2 tstat pvalue
rain 1910 2000 NaN NaN
snow 1910 2000 NaN NaN
rain 1910 2010 NaN NaN
snow 1910 2010 NaN NaN
rain 1910 1990 0.000000 1.000000
snow 1910 1990 0.436436 0.685044
rain 2000 2010 NaN NaN
...
如果您不想要所有列,而只想要一些指定的集合,請將函式簽名/定義行更改為:
def all_pairwise(df, cols, compare_col = 'decade'):
wherecols應該是字串列名的可迭代(串列可以正常作業)。您需要洗掉這兩行:
cols = list(df.columns)
cols.remove(compare_col)
從函式體中,否則將正常作業。
除非您在傳遞給函式之前過濾掉記錄太少的幾十年,否則您將始終收到運行時警告。
這是來自版本的示例呼叫,它接受列串列作為引數并顯示運行時警告。
>>> all_pairwise(df, cols=['rain'])
/usr/local/lib/python3.8/site-packages/numpy/core/fromnumeric.py:3723: RuntimeWarning: Degrees of freedom <= 0 for slice
return _methods._var(a, axis=axis, dtype=dtype, out=out, ddof=ddof,
/usr/local/lib/python3.8/site-packages/numpy/core/_methods.py:254: RuntimeWarning: invalid value encountered in double_scalars
ret = ret.dtype.type(ret / rcount)
dec1 dec2 tstat pvalue
rain 1910 2000 NaN NaN
rain 1910 2010 NaN NaN
rain 1910 1990 0.0 1.0
rain 2000 2010 NaN NaN
rain 2000 1990 NaN NaN
rain 2010 1990 NaN NaN
>>>
轉載請註明出處,本文鏈接:https://www.uj5u.com/net/435663.html
標籤:Python 功能 scipy scipy.stats
下一篇:我怎樣才能把這個總和列印出來呢?
