從pandasDataFrame中的每一列中減去另一個DataFrame中的值-有解無憂

我有兩個 20 行和 4 列的 DataFrame。列的名稱和值型別相同。其中一列是title，其他 3 列是值。

df1
title  col1 col2 col3
apple    a    d    g
pear     b    e    h
grape    c    f    i

df2
title  col1 col2 col3
carrot   q    t    w
pumpkin  r    u    x
sprouts  s    v    y

現在我想創建 3 個單獨的表/串列減去df1.col1 - df2.col1|的每個值。df1.col2 - df2.col2| df1.col3 - df2.col3. 因為df1.col1 - df2.col1我希望輸出看起來像以下幾行：

df1.title  df2.title score
apple      carrot    (a - q)
apple      pumpkin    (a - r)
apple      sprouts   (a - s)
pear       carrot    (b - t)
pear       pumpkin   (b - u)
pear       sprouts   (b - v)
grape      carrot    (c - w)
grape      pumpkin   (c - x)
grape      sprouts   (c - y)

我嘗試使用以下代碼創建一個 for 回圈：

for i in df1.iterrows():
    score_col1 = df1.col1[[i]] - df2.col2[[j]]
    score_col2 = df1.col2[[i]] - df2.col2[[j]]
    score_col3 = df1.col3[[i]] - df2.col3[[j]]
    score_total = score_col1   score_col2   score_col3
    i = i   1

作為回報，我收到了score_col1如下所示的輸出：

df1.title  df2.title score
apple      carrot    (a - q)
pear       carrot    (b - t)
grape      carrot    (c - w)

有人可以幫我獲得預期的輸出嗎？

uj5u.com熱心網友回復：

a1 = ['apple','pear', 'banana']
b1 = [56,32,23]
c1 = [12,34,90]
d1 = [87,65,23]

a2 = ['carrot','pumpkin','sprouts']
b2 = [16,12,93]
c2 = [12,32,70]
d2 = [81,55,21]

df1 = pd.DataFrame({'title':a1, 'col1':b1, 'col2':c1, 'col3':d1})
df2 = pd.DataFrame({'title':a2, 'col1':b2, 'col2':c2, 'col3':d2})

res_df = pd.DataFrame([])
cols = ['col1','col2','col3']

for c in cols:
    res_df = pd.DataFrame([])
    for i,j in df1.iterrows():
        for k,l in df2.iterrows():
            res_df = res_df.append(pd.DataFrame({'title_df1':j.title, 'title_df2':l.title, 'score':j[str(c)] - l[str(c)]},index=[0]), ignore_index=True)

    print(res_df)

uj5u.com熱心網友回復：

由于您需要 3 個單獨的 DataFrame，我們可以使用一個回圈（如果您想要一個 DataFrame，我們可以做類似的作業，但略有不同）。

我們可以unstack df2迭代地從repeated 列中減去它df1：

out = []
df2_stacked = df2.set_index('title').unstack().droplevel(0).reset_index(name='score')
for col in df1.filter(like='col'):
    tmp = (df1[['title', col]]
           .loc[df1.index.repeat(len(df2))]
           .reset_index(drop=True)
           .join(df2_stacked, lsuffix='_df1', rsuffix='_df2'))
    tmp['score'] = tmp[col] - tmp['score']
    out.append(tmp.drop(columns=col))

讓我們在一個數值示例上對其進行測驗：

df1：

   title  col1  col2  col3
0  apple  1000   100    10
1   pear  2000   200    20
2  grape  3000   300    30

df2：

     title  col1  col2  col3
0   carrot     1     4     7
1  pumpkin     2     5     8
2  sprouts     3     6     9

然后如果運行上面的代碼并列印out，它包含以下三個 DataFrame：

      title_df1 title_df2  score
    0     apple    carrot    999
    1     apple   pumpkin    998
    2     apple   sprouts    997
    3      pear    carrot   1996
    4      pear   pumpkin   1995
    5      pear   sprouts   1994
    6     grape    carrot   2993
    7     grape   pumpkin   2992
    8     grape   sprouts   2991


      title_df1 title_df2  score
    0     apple    carrot     99
    1     apple   pumpkin     98
    2     apple   sprouts     97
    3      pear    carrot    196
    4      pear   pumpkin    195
    5      pear   sprouts    194
    6     grape    carrot    293
    7     grape   pumpkin    292
    8     grape   sprouts    291



      title_df1 title_df2  score
    0     apple    carrot      9
    1     apple   pumpkin      8
    2     apple   sprouts      7
    3      pear    carrot     16
    4      pear   pumpkin     15
    5      pear   sprouts     14
    6     grape    carrot     23
    7     grape   pumpkin     22
    8     grape   sprouts     21

轉載請註明出處，本文鏈接：https://www.uj5u.com/qiye/445060.html

標籤：Python 熊猫数据框 for循环

上一篇：計算R中某些動作/變數的持續時間和關鍵數字（平均、標準、最小值、最大值）？

下一篇：如何滾動過濾？