Pandas對列進行求和，直到>=sum(values)，然后將df左邊的專案分割到dfNEW中。 -有解無憂

我有混合資料型別的df，例如：

df:

3rd

名稱	價值
1	1
第2名
第2名	5
5
3．	3.5
3.5	第4名
第4名

當df['value'].sum()>=10時，分割到dfNew(前3個的總和=9.5，所以需要分割左邊的專案到dfNew，在我的例子中需要分割最后一行)

dfNew:

名稱	價值
第4名

我想我可以通過iterating(itertuples/iteritems)& sum(items)來完成，然后獲得索引和分割，但是 "更多的pandas "的方式是什么？

uj5u.com熱心網友回復：

使用cumsum，然后做binning，然后在binning中做groupby，例如：

df['cumsum_val'] = df['value'].cumsum()

binning = list(range(0, math.ceil(df['cumsum_val'].iloc[-1]) 10), 10) #在這里，你每隔10分鐘就會檢查一次（加上最新的）

。

df['binning'] = pd.cut(df['cumsum_val'], bins= binning)

在這之后，你可以在binnings中做groupby：

grouped_df = df.groupby('binning')

最后得到分割后的df_lists

df_new_lists = [grouped_df.get_group(x) forx in grouped_df.group]

df_new_lists是資料框架的串列

uj5u.com熱心網友回復：

你可以使用以下的組合:

.cumsum以獲得累積的總和；
cut以比較累計和與你的閾值；
.groupby將收到相同結果的行分組。

import pandas as pd

df = pd. DataFrame({'name': ['1st', '2nd', '3th', '4th', '5th', '6th', '7th'], '值': [1,5,3. 5,8,2,9,2]})
閾值 = 10 

df['cums'] = df['value'].cumsum（)

bins = range(0, int(df['cums']。 iloc[-1] threshold 1）, threshold)
df['group'] = pd.cut(df['cums'], bins=bins)

df_list = [pd.DataFrame(g) for _,g in df.groupby('group'>) ]

print(df)
# name value cums groups
# 0 1st 1.0 1.0 (0, 10]
# 1 2nd 5.0 6.0 (0, 10]
# 2 3rd 3.5 9.5 (0, 10]
# 3 4th 8.0 17.5 (10, 20]
# 4 5th 2.0 19.5 (10, 20)
# 5 6th 9.0 28.5 (20, 30]
# 6 7th 2.0 30.5 (30, 40]

print(df_list)
# [
# name value cums groups
# 0 1st 1.0 1.0 (0, 10]
# 1 2nd 5.0 6.0 (0, 10]
# 2 3rd 3.5 9.5 (0, 10],
# name value cums groups
# 3 4th 8.0 17.5 (10, 20]
# 4 5th 2.0 19.5 (10, 20], 
# name value cums groups
# 5 6th 9.0 28.5 (20, 30]
# name value cums groups
# 6 7th 2.0 30.5 (30, 40)
# ]

注意，這段代碼假設值是非負的，所以和值是正的，而且是遞增的。如果不是這樣，那么你必須使用一個更穩健的bins的定義，比如：

bins = range（int(min(df['cums'])。int(max(df['cums'])) threshold 1, threshold)

uj5u.com熱心網友回復：

使用cumsum和idxmax然后對資料幀進行切片([idx:])

dfNew = df.iloc[df['value'].cumsum().ge(10) .idxmax():]

輸出

>>> dfNew
  名值
3 48.0

轉載請註明出處，本文鏈接：https://www.uj5u.com/yidong/310851.html

標籤：

上一篇：使用比較器將一個物件串列按特定順序排序

下一篇：如何對dayjs()時間進行排序？