我試圖避免iterrows()在 pandas 中使用并實作更高性能的解決方案。這是我擁有的代碼,我在其中回圈了一個 DataFrame,對于每條記錄,我需要再添加三個:
import pandas as pd
fruit_data = pd.DataFrame({
'fruit': ['apple','orange','pear','orange'],
'color': ['red','orange','green','green'],
'weight': [5,6,3,4]
})
array = []
for index, row in fruit_data.iterrows():
row2 = { 'fruit_2': row['fruit'], 'sequence': 0}
array.append(row2)
for i in range(2):
row2 = { 'fruit_2': row['fruit'], 'sequence': i 1}
array.append(row2)
print(array)
我真正的 DataFrame 有數百萬條記錄。有沒有辦法優化這段代碼而不使用iterrows()或for回圈?
uj5u.com熱心網友回復:
您可以使用repeat每個水果重復 3 次;然后groupby cumcount分配sequence數字;最后to_dict是最終輸出:
tmp = fruit_data['fruit'].repeat(3).reset_index(name='fruit_2')
tmp['sequence'] = tmp.groupby('index').cumcount()
out = tmp.drop(columns='index').to_dict('records')
輸出:
[{'fruit_2': 'apple', 'sequence': 0},
{'fruit_2': 'apple', 'sequence': 1},
{'fruit_2': 'apple', 'sequence': 2},
{'fruit_2': 'orange', 'sequence': 0},
{'fruit_2': 'orange', 'sequence': 1},
{'fruit_2': 'orange', 'sequence': 2},
{'fruit_2': 'pear', 'sequence': 0},
{'fruit_2': 'pear', 'sequence': 1},
{'fruit_2': 'pear', 'sequence': 2},
{'fruit_2': 'orange', 'sequence': 0},
{'fruit_2': 'orange', 'sequence': 1},
{'fruit_2': 'orange', 'sequence': 2}]
uj5u.com熱心網友回復:
試試這個:
array = (
fruit_data['fruit']
.repeat(3)
.to_frame(name='fruit_2')
.set_index(np.tile(np.arange(3), len(fruit_data['fruit'])))
.reset_index()
.rename({'index':'sequence'},axis=1)
[['fruit_2', 'sequence']]
.to_dict('records')
)
輸出:
>>> array
[{'fruit_2': 'apple', 'sequence': 0},
{'fruit_2': 'apple', 'sequence': 1},
{'fruit_2': 'apple', 'sequence': 2},
{'fruit_2': 'orange', 'sequence': 0},
{'fruit_2': 'orange', 'sequence': 1},
{'fruit_2': 'orange', 'sequence': 2},
{'fruit_2': 'pear', 'sequence': 0},
{'fruit_2': 'pear', 'sequence': 1},
{'fruit_2': 'pear', 'sequence': 2},
{'fruit_2': 'orange', 'sequence': 0},
{'fruit_2': 'orange', 'sequence': 1},
{'fruit_2': 'orange', 'sequence': 2}]
轉載請註明出處,本文鏈接:https://www.uj5u.com/caozuo/444476.html
上一篇:如何根據R中組內的行順序選擇組
下一篇:折疊行并保留最高/最低日期
