Python中重復的列寬乘法和除法-有解無憂

我正在嘗試分組進行一些列范圍的乘法和除法并將它們連接在一起。但我想大規模地進行 - 目前它有點重復 - 具有 A 字串的列除以具有 D 字串的列并乘以 WC 列，然后在組 B 和 C 上對 D 列和 WC 列重復類似的程序. 最后，我會將它們合并到同一個資料框中。我怎樣才能使這個程序更有效率？

輸入：

df = pd.DataFrame({"cid" : {0 : "cd1", 1 : "cd2", 2 : "cd3"},
                   "A1970" : {0 : 3.2, 1 : 3.5, 2 : .4},
                   "A1980" : {0 : 3.1, 1 : 3.6, 2 : .5},
                   "B1970" : {0 : 2.5, 1 : 1.2, 2 : .7},
                   "B1980" : {0 : 3.2, 1 : 1.3, 2 : .1},
                   "C1970" : {0 : 3.2, 1 : 3.3, 2 : .3},
                   "C1980" : {0 : 3.3, 1 : 3.4, 2 : .3},
                   "D1970" : {0 : 2.4, 1 : 1.3, 2 : .7},
                   "D1980" : {0 : 3.2, 1 : 1.3, 2 : .2},
                   "WC" : {0 :0.5, 1 : 0.3, 2 : .1}
                  }).set_index(['cid'])

#      A1970  A1980  B1970  B1980  C1970  C1980  D1970  D1980   WC
# cid                                                             
# cd1    3.2    3.1    2.5    3.2    3.2    3.3    2.4    3.2  0.5
# cd2    3.5    3.6    1.2    1.3    3.3    3.4    1.3    1.3  0.3
# cd3    0.4    0.5    0.7    0.1    0.3    0.3    0.7    0.2  0.1

加工：

df_a = (df.filter(regex='A')
        .div(df.filter(regex='D').values)
        .multiply(df["WC"], axis="index")
        .add_suffix("_rt"))

#      A1970_rt  A1980_rt
# cid                    
# cd1  0.666667  0.484375
# cd2  0.807692  0.830769
# cd3  0.057143  0.250000

df_b = (df.filter(regex='B')
        .div(df.filter(regex='D').values)
        .multiply(df["WC"], axis="index")
        .add_suffix("_rt"))

df_c = (df.filter(regex='C')
        .div(df.filter(regex='D').values)
        .multiply(df["WC"], axis="index")
        .add_suffix("_rt"))

uj5u.com熱心網友回復：

對于可能的匹配列的解決方案，MultiIndex按年計算A,B,C和D：

df1 = df.set_index('WC', append=True)
df1.columns = pd.MultiIndex.from_frame(df1.columns.str.extract('(\D )(\d )', expand=True))

wc = df1.index.get_level_values('WC')
df1 = df1.loc[:, ['A','B','C']].div(df1.xs('D', axis=1)).mul(wc, axis='index')
df1.columns = df1.columns.map('_'.join)
df1 = df1.add_suffix("_rt").reset_index()
print (df1)
   cid   WC  A_1970_rt  A_1980_rt  B_1970_rt  B_1980_rt  C_1970_rt  C_1980_rt
0  cd1  0.5   0.666667   0.484375   0.520833       0.50   0.666667   0.515625
1  cd2  0.3   0.807692   0.830769   0.276923       0.30   0.761538   0.784615
2  cd3  0.1   0.057143   0.250000   0.100000       0.05   0.042857   0.150000

uj5u.com熱心網友回復：

使用底層 numpy 資料：

創建一個新的資料框：

regex = r'^[ABC]'

N = 3 # number of A/B/C columns
# or if needed to calculate it programmatically
# N = df.filter(regex=regex).shape[1] // df.filter(regex='^D').shape[1]

(df.filter(regex=regex)   # get A/B/C columns
   .mul(df['WC'], axis=0) # multiply by WC
   # divide by D (tiled to match the number of A/B/C)
   .div(np.tile(df.filter(regex='D').values, (1,N)))
   .add_suffix('_rt')     # rename columns
)

輸出：

     A1970_rt  A1980_rt  B1970_rt  B1980_rt  C1970_rt  C1980_rt
cid                                                            
cd1  0.666667  0.484375  0.520833      0.50  0.666667  0.515625
cd2  0.807692  0.830769  0.276923      0.30  0.761538  0.784615
cd3  0.057143  0.250000  0.100000      0.05  0.042857  0.150000

就地更新原始資料框

df.update( # update the dataframe with the output from:
 df.filter(regex=regex)   # get A/B/C columns
   .mul(df['WC'], axis=0) # multiply by WC
   # divide by D (tiled to match the number of A/B/C)
   .div(np.tile(df.filter(regex='D').values, (1,N))) 
)

輸出：

        A1970     A1980     B1970  B1980     C1970     C1980  D1970  D1980   WC
cid                                                                            
cd1  0.666667  0.484375  0.520833   0.50  0.666667  0.515625    2.4    3.2  0.5
cd2  0.807692  0.830769  0.276923   0.30  0.761538  0.784615    1.3    1.3  0.3
cd3  0.057143  0.250000  0.100000   0.05  0.042857  0.150000    0.7    0.2  0.1

uj5u.com熱心網友回復：

您可以轉換df.filter(regex='D')為 numpy 陣列，然后使用np.tile它來重復它，然后將“A”-“C”列相除并相乘：

col_msk = ~df.columns.str.contains('D|WC')
out = df.loc[:, col_msk].div(np.tile(df.filter(regex='D').to_numpy(), (1, sum(col_msk)//2))).multiply(df["WC"], axis="index").add_suffix("_rt")

輸出：

     A1970_rt  A1980_rt  B1970_rt  B1980_rt  C1970_rt  C1980_rt
cid                                                            
cd1  0.666667  0.484375  0.520833      0.50  0.666667  0.515625
cd2  0.807692  0.830769  0.276923      0.30  0.761538  0.784615
cd3  0.057143  0.250000  0.100000      0.05  0.042857  0.150000

轉載請註明出處，本文鏈接：https://www.uj5u.com/qiye/424175.html

標籤：Python 熊猫麻木的

上一篇：使用opencv和numpy陣列的RGB影像遮罩（形狀不匹配）

下一篇：如何僅將陣列轉換為列中的資料框