我有一個不平衡的資料面板,其中每個 ID 的觀察結果已經按 ID 學生期間升序排列。
基本上,我想創建一個新列,其中對于特定 ID 的每一行將記錄學生在進入資料面板期間獲得的第一種獎學金:
例如:如果我在 4、5、6 和 7 期間對 ID xxxx 有觀察,我想在這個新列中記錄該 ID 的所有期間,他在 4 期間獲得的獎學金型別。
在 R 中,我可以使用dplyr::first:
df = df %>%
dplyr::mutate(scholarship_first = dplyr::first(scholarship))
輸入:
ID student_period scholarship
4567 1 scholarship_level_1
4567 2 scholarship_level_2
4567 3 scholarship_level_2
4567 4 scholarship_level_3
5478 4 scholarship_level_3
5478 5 scholarship_level_3
6758 7 scholarship_level_1
6758 8 scholarship_level_2
6758 9 scholarship_level_2
輸出:
ID student_period scholarship scholarship_first
4567 1 scholarship_level_1 scholarship_level_1
4567 2 scholarship_level_2 scholarship_level_1
4567 3 scholarship_level_2 scholarship_level_1
4567 4 scholarship_level_3 scholarship_level_1
5478 4 scholarship_level_3 scholarship_level_3
5478 5 scholarship_level_3 scholarship_level_3
6758 7 scholarship_level_1 scholarship_level_1
6758 8 scholarship_level_2 scholarship_level_1
6758 9 scholarship_level_2 scholarship_level_1
由于我現在剛剛開始學習 Python,我還不知道如何使用這種語言來做到這一點。有人能幫我嗎?
uj5u.com熱心網友回復:
IIUC
df["scholarship_first"] = df.groupby(level = 0)["scholarship"].first()
student_period scholarship scholarship_first
ID
4567 1 scholarship_level_1 scholarship_level_1
4567 2 scholarship_level_2 scholarship_level_1
4567 3 scholarship_level_2 scholarship_level_1
4567 4 scholarship_level_3 scholarship_level_1
5478 4 scholarship_level_3 scholarship_level_3
5478 5 scholarship_level_3 scholarship_level_3
6758 7 scholarship_level_1 scholarship_level_1
6758 8 scholarship_level_2 scholarship_level_1
6758 9 scholarship_level_2 scholarship_level_1
pandas.DataFrame.groupbyID如果您指定,讓您分組level = 0。然后你可以得到組的第一次出現pandas.core.groupby.GroupBy.first。
uj5u.com熱心網友回復:
通過datar對 pandas API 的重新構想,您可以像在 R 中一樣簡單地完成它:
>>> from datar.all import f, tribble, first, mutate, group_by
>>> df = tribble(
... f.ID, f.student_period, f.scholarship,
... 4567, 1, "scholarship_level_1",
... 4567, 2, "scholarship_level_2",
... 4567, 3, "scholarship_level_2",
... 4567, 4, "scholarship_level_3",
... 5478, 4, "scholarship_level_3",
... 5478, 5, "scholarship_level_3",
... 6758, 7, "scholarship_level_1",
... 6758, 8, "scholarship_level_2",
... 6758, 9, "scholarship_level_2",
... )
>>>
>>> df >> group_by(f.ID) >> mutate(scholarship_first=first(f.scholarship))
ID student_period scholarship scholarship_first
<int64> <int64> <object> <object>
0 4567 1 scholarship_level_1 scholarship_level_1
1 4567 2 scholarship_level_2 scholarship_level_1
2 4567 3 scholarship_level_2 scholarship_level_1
3 4567 4 scholarship_level_3 scholarship_level_1
4 5478 4 scholarship_level_3 scholarship_level_3
5 5478 5 scholarship_level_3 scholarship_level_3
6 6758 7 scholarship_level_1 scholarship_level_1
7 6758 8 scholarship_level_2 scholarship_level_1
8 6758 9 scholarship_level_2 scholarship_level_1
[TibbleGrouped: ID (n=3)]
轉載請註明出處,本文鏈接:https://www.uj5u.com/gongcheng/451436.html
標籤:熊猫
