我有一個如下形式的資料框:
df = [["john","2019","30.2"] , ["john","2019","40"] , ["john","2020","50.3"] ,
["amy","2019","60"] , ["amy","2019","20"] , ["amy","2020","40.1"]]
我想要的結果是最后一個索引的多條件求和串列,而前兩個是相等的:
> [["john", "2019", "70.2"] , ["john","2020","50.3"] , ["amy","2019","80"] , ["amy","2020","40.1"]]
我試圖做的是一個 for 回圈,它檢查每個條件的相等性,然后總結最后一個索引,如果條件為真——這是某種偽代碼:
for i in df[i]:
if df[i][0] == df[i 1][0] and df[i][1] == df[i 1][1]: #if both conditions are true
sum1 = sum(float(df[i][2]))
lst = []
lst.append(df[i][0])
lst.append(df[i][1])
lst.append(str(sum1))
編輯:希望有一個不使用包的解決方案。
uj5u.com熱心網友回復:
以下代碼不使用任何包。從Python 3.7所有字典開始都是插入順序的,這個事實用于以下代碼,以便最終結果具有元素的原始外觀順序。如果由于某種原因您的 python 低于3.7,請告訴我,我將修改代碼以明確進行排序而不是依賴此語言功能。
在線試試吧!
df = [["john","2019","30.2"], ["john","2019","40"], ["john","2020","50.3"],
["amy","2019","60"], ["amy","2019","20"], ["amy","2020","40.1"]]
r = {}
for *a, b in df:
a = tuple(a)
if a not in r:
r[a] = 0
r[a] = float(b)
r = [list(k) [str(v)] for k, v in r.items()]
print(r)
輸出:
[['john', '2019', '70.2'], ['john', '2020', '50.3'], ['amy', '2019', '80.0'], ['amy', '2020', '40.1']]
uj5u.com熱心網友回復:
由于您使用的是df變數名,因此我假設您熟悉 Pandas。
您可以在熊貓中輕松做到這一點。只需將您的串列轉換為 df。
以及您想要唯一值的 groupby 列并選擇最后一行
df.groupby(['col_a', 'col_b'], as_index=False).last()
如果您有任何自定義邏輯,您可以在呼叫 groupby 之前對 df 進行排序
uj5u.com熱心網友回復:
這是一種使用方法defaultdict:
from collections import defaultdict
sums = defaultdict(lambda: defaultdict(float))
for item in df:
sums[item[0]][item[1]] = float(item[2])
lst = [[key, inner_key, value] for key in sums for inner_key, value in sums[key].items()]
uj5u.com熱心網友回復:
字典有一個方便的setdefault方法,它檢查它的第一個引數是否是字典的鍵,并回傳相應的值或默認值。
在我們的例子中,因為我們想對數值求和,當然默認值必須是0。
我們使用一個臨時字典,由 tuple 索引(name, year),當我們完成求和后,我們按照您在問題偽代碼中顯示的方向將字典資料展開成一個串列串列。
In [15]: data = [["john","2019","30.2"] , ["john","2019","40"] , ["john","2020","50.3"] ,
...: ["amy","2019","60"] , ["amy","2019","20"] , ["amy","2020","40.1"]]
...: d_temp = {}
...: for n, y, v in data:
...: d_temp[(n,y)] = d_temp.setdefault((n,y),0) float(v)
...: lol = [list(k) [v] for k, v in d_temp.items()]
...: lol
Out[15]:
[['john', '2019', 70.2],
['john', '2020', 50.3],
['amy', '2019', 80.0],
['amy', '2020', 40.1]]
uj5u.com熱心網友回復:
一種選擇,使用標準庫中的工具:
from itertools import groupby
from decimal import Decimal
from operator import itemgetter
# itertools' groupby requires the data to be sorted
key_func = itemgetter(0,1)
df = sorted(df, key = key_func)
# compute values within the groupby
[[*key, str(sum(Decimal(e) for *_, e in ent))]
for key, ent
in groupby(df, key = key_func)]
[['amy', '2019', '80'],
['amy', '2020', '40.1'],
['john', '2019', '70.2'],
['john', '2020', '50.3']]
轉載請註明出處,本文鏈接:https://www.uj5u.com/net/385907.html
