為另一列中的每個值找到一列中的最小值-Pandas-有解無憂

我正在使用 pandas 庫作為資料框。在以下資料中，對于每個團隊，每年（2020 年、2019 年、2018 年）每個月（1-6 分）都有積分。

month  team  points2020  points2019  points2018
1      team1   50           10         5
2      team1   20           40         2
3      team1   12           14        17
4      team1   8            9          3
5      team1   2            3          1 
6      team1   30           18         60
1      team2   8            9          10
2      team2   40           70         30
3      team2   25           19         34
4      team2   88           70          1
5      team2   23           45          5
6      team2   55           77          90

我要顯示的是每個月，只顯??示每年得分最低的團隊

因此，例如，根據上述資料，對于“points2020”的月份“1”，我只想在“team”列中回傳 team2，因為 team2 的“points2020”得分最低。

對于 points2019 的月份“1”，我只想回傳 team2，在 team 列中，因為 team2 的“points2019”得分最低，依此類推。

我將如何實作這一目標？

所需輸出的示例：

month  year  team   points
1      2020  team2   8
2      2020  team1   20
3      2020  team1   12
4      2020  team1   8
5      2020  team1   2
6      2020  team1   30
1      2019  team2   9
2      2019  team1   40
3      2019  team1   14
4      2019  team1   9
5      2019  team1   3
6      2019  team1   18

uj5u.com熱心網友回復：

將列定義team為索引和分組，month然后用于idxmin提取得分最低的團隊（索引）：

out = df.set_index('team').groupby('month', as_index=False).idxmin()
print(out)

# Output
   month points2020 points2019 points2018
0      1      team2      team2      team1
1      2      team1      team1      team1
2      3      team1      team1      team1
3      4      team1      team1      team2
4      5      team1      team1      team1
5      6      team1      team1      team1

uj5u.com熱心網友回復：

用于將df.melt列轉換為行，然后在 groupby 為我作業后找到具有最小值的行：

首先，將點列轉換為行（創建“年”和“點”列）

>> df = df.melt(id_vars=["month", "team"], var_name="year", value_name="points")
>> print(df.head())
   month   team        year  points
0      1  team1  points2020      50
1      2  team1  points2020      20
2      3  team1  points2020      12
3      4  team1  points2020       8
4      5  team1  points2020       2

對于每個月和年，找到分數最低的行

>> df = df.loc[df.groupby(["month", "year"]).points.idxmin()]

以與預期輸出匹配的方式對值進行排序

>> print(df.sort_values(["year", "month"]))
    month   team        year  points
24      1  team1  points2018       5
25      2  team1  points2018       2
26      3  team1  points2018      17
33      4  team2  points2018       1
28      5  team1  points2018       1
29      6  team1  points2018      60
18      1  team2  points2019       9
13      2  team1  points2019      40
14      3  team1  points2019      14
15      4  team1  points2019       9
16      5  team1  points2019       3
17      6  team1  points2019      18
6       1  team2  points2020       8
1       2  team1  points2020      20
2       3  team1  points2020      12
3       4  team1  points2020       8
4       5  team1  points2020       2
5       6  team1  points2020      30

uj5u.com熱心網友回復：

嘗試這個：

s = df.set_index(['month','team']).stack().rename_axis(['month','team','year'])

(s.loc[s.groupby(level=[0,2]).idxmin()]
 .sort_index(level=[2,0],ascending=[0,1])
 .reset_index(name='points')
 .assign(year = lambda x: x['year'].str.extract('(\d )',expand=False)))

輸出：

    month   team  year  points
0       1  team2  2020       8
1       2  team1  2020      20
2       3  team1  2020      12
3       4  team1  2020       8
4       5  team1  2020       2
5       6  team1  2020      30
6       1  team2  2019       9
7       2  team1  2019      40
8       3  team1  2019      14
9       4  team1  2019       9
10      5  team1  2019       3
11      6  team1  2019      18

uj5u.com熱心網友回復：

在計算 groupby 聚合之前，您需要從寬轉換為長：

(
pd.wide_to_long(df, stubnames="points", i=["month", "team"], j="year")
.reset_index()
.groupby(["month", "year"], as_index=False, sort=False)
.agg(points=("points", "min"))
)

    month  year  points
0       1  2020       8
1       1  2019       9
2       1  2018       5
3       2  2020      20
4       2  2019      40
5       2  2018       2
6       3  2020      12
7       3  2019      14
8       3  2018      17
9       4  2020       8
10      4  2019       9
11      4  2018       1
12      5  2020       2
13      5  2019       3
14      5  2018       1
15      6  2020      30
16      6  2019      18
17      6  2018      60

另一種選擇是先進行 groupby，然后再轉換為長格式（轉換為長格式時行數較少）：

temp = df.groupby("month").min()
temp = temp.set_index('team', append = True)
temp.columns = temp.columns.str.split("(\d )", expand = True).droplevel(-1)
temp.columns.names = [None, 'year']
temp.stack().reset_index()

    month   team  year  points
0       1  team1  2018       5
1       1  team1  2019       9
2       1  team1  2020       8
3       2  team1  2018       2
4       2  team1  2019      40
5       2  team1  2020      20
6       3  team1  2018      17
7       3  team1  2019      14
8       3  team1  2020      12
9       4  team1  2018       1
10      4  team1  2019       9
11      4  team1  2020       8
12      5  team1  2018       1
13      5  team1  2019       3
14      5  team1  2020       2
15      6  team1  2018      60
16      6  team1  2019      18
17      6  team1  2020      30

上面的步驟可以用pivot_longerfrom抽象出來pyjanitor：

# pip install pyjanitor
import pandas as pd
import janitor

(df
.groupby("month", as_index=False)
.min()
.pivot_longer(index = ["month", "team"],
              names_to = (".value", "year"),
              names_pattern = r"(\D )(\d )")
)

    month   team  year  points
0       1  team1  2020       8
1       2  team1  2020      20
2       3  team1  2020      12
3       4  team1  2020       8
4       5  team1  2020       2
5       6  team1  2020      30
6       1  team1  2019       9
7       2  team1  2019      40
8       3  team1  2019      14
9       4  team1  2019       9
10      5  team1  2019       3
11      6  team1  2019      18
12      1  team1  2018       5
13      2  team1  2018       2
14      3  team1  2018      17
15      4  team1  2018       1
16      5  team1  2018       1
17      6  team1  2018      60

轉載請註明出處，本文鏈接：https://www.uj5u.com/caozuo/428699.html

標籤：Python 熊猫数据框

上一篇：如何比較兩列并從第一列中洗掉重復項

下一篇：如何在不使用for回圈的情況下更改某些特定行的單元格值？蟒蛇熊貓