pandas在一個單元格中取兩個字串值的平均值-有解無憂

下面的示例有一些買入/賣出價格。計算整個 df 的每個單元格中的平均值（中值）的好方法是什么？

#---sample df
prices = pd.DataFrame({
    'tenor':['5Y', '10Y', '15Y', '20Y', '30Y'],
    '1M':['0.67/0.62', '1.10/1.05', '1.23/1.18', '1.38/1.33', '1.55/1.50'],
    '3M':['0.79/0.74', '1.19/1.14', '1.32/1.27', '1.49/1.44', '1.65/1.60'],
    '6M':['0.89/0.84', '1.29/1.24', '1.42/1.37', '1.60/1.55', '1.76/1.71'],
    '12M':['1.14/1.07', '1.47/1.40', '1.61/1.54', '1.80/1.72', '1.95/1.87']
    })

0.645例如，下面將回傳。

prices.iat[0,1]
Out[112]: '0.67/0.62'

uj5u.com熱心網友回復：

雖然applymap很好很簡單，但不幸的是這很慢。

更有效的矢量解決方案是在之前split和explode之前：groupbymean

(prices.set_index('tenor')
       .apply(lambda c: c.str.split('/').explode())
       .astype(float)
       .groupby(level=0, sort=False).mean()
)

輸出：

          1M     3M     6M    12M
tenor                            
5Y     0.645  0.765  0.865  1.105
10Y    1.075  1.165  1.265  1.435
15Y    1.205  1.295  1.395  1.575
20Y    1.355  1.465  1.575  1.760
30Y    1.525  1.625  1.735  1.910

這在 50k 行上快了約 8 倍

注意。如果列多于行，則可以反轉邏輯以在另一個軸上作業

uj5u.com熱心網友回復：

您可以將它們全部拆分，/然后取平均值。首先將非數字列設定為索引允許您一次applymap完成整個 df 的其余部分。

import numpy as np
import pandas as pd
prices = pd.DataFrame({
    'tenor':['5Y', '10Y', '15Y', '20Y', '30Y'],
    '1M':['0.67/0.62', '1.10/1.05', '1.23/1.18', '1.38/1.33', '1.55/1.50'],
    '3M':['0.79/0.74', '1.19/1.14', '1.32/1.27', '1.49/1.44', '1.65/1.60'],
    '6M':['0.89/0.84', '1.29/1.24', '1.42/1.37', '1.60/1.55', '1.76/1.71'],
    '12M':['1.14/1.07', '1.47/1.40', '1.61/1.54', '1.80/1.72', '1.95/1.87']
    })

prices = prices.set_index('tenor').applymap(lambda x: np.mean(list(map(float,x.split('/'))))).reset_index()

輸出

  tenor     1M     3M     6M    12M
0    5Y  0.645  0.765  0.865  1.105
1   10Y  1.075  1.165  1.265  1.435
2   15Y  1.205  1.295  1.395  1.575
3   20Y  1.355  1.465  1.575  1.760
4   30Y  1.525  1.625  1.735  1.910

uj5u.com熱心網友回復：

對于每一列，您可以按以下方式拆分字串/并運行 lambda 操作以獲得平均值

prices["1M"].str.split('/').apply(lambda x : (float(x[0]) float(x[1]))/2)

0    0.645
1    1.075
2    1.205
3    1.355
4    1.525
Name: 1M, dtype: float64

uj5u.com熱心網友回復：

這是另一個解決方案：

x = prices.iloc[:,1:].unstack().swaplevel(1,0).str.split('/').explode().astype(float)
temp1 = x.groupby(x.index).mean().reindex(pd.MultiIndex.from_tuples(x.index.drop_duplicates()))
prices.iloc[:,1:] = temp1.unstack()[prices.iloc[:,1:].columns]

輸出：

  tenor     1M     3M     6M    12M
0    5Y  0.645  0.765  0.865  1.105
1   10Y  1.075  1.165  1.265  1.435
2   15Y  1.205  1.295  1.395  1.575
3   20Y  1.355  1.465  1.575   1.76
4   30Y  1.525  1.625  1.735   1.91

uj5u.com熱心網友回復：

另一種選擇，以避免炸毀資料，這可能有助于提高性能：

temp = prices.set_index('tenor').transform(lambda df: df.str.split('/'))
A = temp.transform(lambda df: pd.to_numeric(df.str[0])) 
B = temp.transform(lambda df: pd.to_numeric(df.str[-1]))

A.add(B).div(2)

         1M     3M     6M    12M
tenor
5Y     0.645  0.765  0.865  1.105
10Y    1.075  1.165  1.265  1.435
15Y    1.205  1.295  1.395  1.575
20Y    1.355  1.465  1.575  1.760
30Y    1.525  1.625  1.735  1.910

當然，如果你有更多的條目，那么爆炸是更好的選擇。

另一個應該可以很好擴展的選擇是在 Pandas 中進行最終處理之前，在 vanilla python 中進行字串作業。我們將利用 Pandas 的 MultiIndexing 來獲得最終輸出：

reshaped = pd.concat({key : pd.DataFrame(string.split('/') 
                                          for string in ent)  
                       for key, ent 
                       in prices.drop(columns='tenor').items()}, 
                       axis = 1)

(reshaped
  .astype(float)
  .groupby(level=0,axis = 1, sort = False)
  .mean(1) 
  .assign(tenor = prices.tenor)
   # you can ignore the line below,
   # if column order is not important
  .loc[:, [*prices]]
)

  tenor     1M     3M     6M    12M
0    5Y  0.645  0.765  0.865  1.105
1   10Y  1.075  1.165  1.265  1.435
2   15Y  1.205  1.295  1.395  1.575
3   20Y  1.355  1.465  1.575  1.760
4   30Y  1.525  1.625  1.735  1.910

同樣，這里的目標是盡量不要炸毀資料幀，并希望獲得更多性能。您應該通過在 Python 中處理字串整形來獲得更高的性能（Pandas str 方法建立在 Python 的字串方法之上）。與往常一樣，只有測驗才能說明性能。

轉載請註明出處，本文鏈接：https://www.uj5u.com/net/389665.html

標籤：Python 熊猫

上一篇：當月累計金額

下一篇：JavaAPI連接失敗：PKIX路徑