我有一些資料幀(df、tmp_df 和 final_df),我想將兩列 tmp_df 作為串列型別輸入到 final_df 的兩個不同單元格中。我的代碼和資料框如下所示。但是,回圈部分無法正常作業。stackoverflow 或其他網站中的其他問題,如果程式開頭的串列有可用的字典,請回答此問題。但在這里,tmp_df 資料幀在 for 回圈和每次迭代中都會發生變化,計算出合適的價格。此外,最相關的資料被建立,它們必須定位為 final_df 的相關單元格。
import pandas as pd
df = pd.read_csv('myfile.csv')
tmp_df = pd.DataFrame()
final_df = pd.DataFrame()
tmp_df = df[df['Type'] == True]
cnt = 0
for c in tmp_df['Category']:
#################
# Apply some calculations and call different methods to do some changes on Price column of tmp_df.
#################
final_df.at[cnt,'Data'] = list(set(tmp_sub['Data']))
final_df ['Category'], final_df['Acceptable'], final_df['Rank'],final_df['Price'] = \
tmp_df['Rank'], list(tmp_sub['Price'])
cnt =1
df:
| Data | Category | Acceptable | Rank | Price |
| ------- | -------- | ---------- | ---- | ----- |
| 30275 | A | Yes | 1 | 52787 |
| 35881 | C | No | 2 | 14804 |
| 28129 | C | Yes | 3 | 180543|
| 30274 | D | No | 2 | 8066 |
| 30351 | D | Yes | 3 | 273478|
| 35886 | A | Yes | 2 | 10808 |
| 39900 | C | Yes | 1 | 21893 |
| 35887 | A | No | 2 | 2244 |
| 35883 | A | Yes | 1 | 10066 |
| 35856 | D | Yes | 3 | 19011 |
| 35986 | C | No | 2 | 6895 |
| 30350 | D | No | 3 | 5243 |
| 28129 | C | Yes | 1 | 112859|
| 31571 | C | Yes | 1 | 20701 |
tmp_df:
| Data | Category | Acceptable | Rank | Price |
| ------- | -------- | ---------- | ---- | ----- |
| 30275 | A | Yes | 1 | 52787 |
| 38129 | C | Yes | 3 | 180543|
| 30351 | D | Yes | 3 | 273478|
| 35886 | A | Yes | 2 | 10808 |
| 39900 | C | Yes | 1 | 21893 |
| 35883 | A | Yes | 1 | 10066 |
| 35856 | D | Yes | 3 | 19011 |
| 28129 | C | Yes | 1 | 112859|
| 31571 | C | Yes | 1 | 20701 |
The prices in the final dataframe (final_df) are changed because of the calculations over the tmp_df. Now, what should I do if I want the following result?
final_df:
| Data | Category | Acceptable | Rank | Price |
| ------- | -------- | ---------- | ---- | ----- |
| [30275,35886,35883] | A | Yes | [1,2]| 195543|
| [28129,39900,38129,31571] | C | Yes | [1,3]| 210089|
| [30351,35856] | D | Yes | 3 | 113859|
uj5u.com熱心網友回復:
您可以聚合list和Price另一個聚合函式,例如sum,mean...:
#generate custom function for Price
def func(x):
return x.sum()
d = {'Data':list,'Rank':lambda x: list(set(x)), 'Price':func}
final_df = (tmp_df.groupby(['Category','Acceptable'],as_index=False)
.agg(d)
.reindex(tmp_df.columns, axis=1))
d = {'Data':list,'Rank':lambda x: list(set(x)), 'Price':'max'}
final_df = (tmp_df.groupby(['Category','Acceptable'],as_index=False)
.agg(d)
.reindex(tmp_df.columns, axis=1))
print (final_df)
Data Category Acceptable Rank Price
0 [30275, 35886, 35883] A Yes [1, 2] 52787
1 [38129, 39900, 28129, 31571] C Yes [1, 3] 180543
2 [30351, 35856] D Yes [3] 273478
使用自定義函式的解決方案:
def func1(x):
return x.sum()
def f(x):
a = list(x['Data'])
b = list(set(x['Rank']))
c = func1(x['Price'])
return pd.Series({'Data':a,'Rank':b,'Price':c})
final_df = (tmp_df.groupby(['Category','Acceptable'])
.apply(f)
.reset_index()
.reindex(tmp_df.columns, axis=1))
轉載請註明出處,本文鏈接:https://www.uj5u.com/gongcheng/353102.html
上一篇:檢查字典值中是否存在串列元素
