我有一個函式,它接收一個資料幀并回傳一個新的資料幀,它是相同的,但增加了一些列。舉個例子:
def arbitrary_function_that_adds_columns(df):
# In this trivial example I am adding only 1 column, but this function may add an arbitrary number of columns.
df['new column'] = df['A'] df['B'] / 8 df['A']**3
return df
將此函式應用于整個資料框很容易:
import pandas
df = pandas.DataFrame({'A': [1,2,3,4], 'B': [2,3,4,5]})
df = arbitrary_function_that_adds_columns(df)
print(df)
如何將arbitrary_function_that_adds_columns函式應用于行的子集?我正在嘗試這個
import pandas
df = pandas.DataFrame({'A': [1,2,3,4], 'B': [2,3,4,5]})
rows = df['A'].isin({1,3})
df.loc[rows] = arbitrary_function_that_adds_columns(df.loc[rows])
print(df)
但我收到了原始資料幀。我期待得到的結果是
A B new column
0 1 2 NaN
1 2 3 10.375
2 3 4 NaN
3 4 5 68.625
uj5u.com熱心網友回復:
用 pandas.combine_first
請注意,根據預期的輸出,您想要的是rows=[1,3],而不是rows = df['A'].isin({1,3})。后者選擇“A”值為 1 或 3 的所有行。
import pandas as pd
def arbitrary_function_that_adds_columns(df):
# make sure that the function doesn't mutate the original DataFrame
# Otherwise, you will get a SettingWithCopyWarning
df = df.copy()
df['new column'] = df['A'] df['B'] / 8 df['A']**3
return df
df = pd.DataFrame({'A': [1,2,3,4], 'B': [2,3,4,5]})
rows = [1, 3]
# the function is applied to a copy of a DataFrame slice
>>> sub_df = arbitrary_function_that_adds_columns(df.loc[rows])
>>> sub_df
A B new column
1 2 3 10.375
3 4 5 68.625
# Add the new information to the original df
>>> df = df.combine_first(sub_df)
>>> df
A B new column
0 1 2 NaN
1 2 3 10.375
2 3 4 NaN
3 4 5 68.625
uj5u.com熱心網友回復:
用你給出的例子:
df['A B'] = df.loc[df['A'].isin({1,3})].sum(axis=1)
或者
df['A B'] = np.nan
df.loc[df['A'].isin({1,3}),['A B']] = sum_AB(df)
更普遍:
df.loc[ [row mask], [column mask] ] = [returned df of same shape]
#optionally, use fillna/bfill/ffill as appropriate
對于更復雜的東西,看看DataFrame.transformand DataFrame.apply; 將它們與df.loc適當的布爾掩碼相結合將完成您的需求。
轉載請註明出處,本文鏈接:https://www.uj5u.com/qiye/353845.html
上一篇:將列添加到基于同一DF中其他兩個列的值在DF中進行查找的PandasDF
下一篇:for回圈中的條件不明確
