我有以下熊貓資料框。
d = {'id1': ['85643', '85644','8564312','8564314','85645','8564316','85646','8564318','85647','85648','85649','85655'],'ID': ['G-00001', 'G-00001','G-00002','G-00002','G-00001','G-00002','G-00001','G-00002','G-00001','G-00001','G-00001','G-00001'],'col1': [1, 2,3,4,5,60,0,0,6,3,2,4],'Goal': [np.nan, 56,np.nan,89,73,np.nan ,np.nan ,np.nan, np.nan, np.nan, 34,np.nan ], 'col2': [3, 4,32,43,55,610,0,0,16,23,72,48],'col3': [1, 22,33,44,55,60,1,5,6,3,2,4],'Name': ['a1asd', 'a2asd','aabsd','aabsd','a3asd','aabsd','aasd','aabsd','aasd','aasd','aasd','aasd'],'Date': ['2021-06-13', '2021-06-13','2021-06-13','2021-06-14','2021-06-15','2021-06-15','2021-06-13','2021-06-16','2021-06-13','2021-06-13','2021-06-13','2021-06-16']}
dff = pd.DataFrame(data=d)
dff
id1 ID col1 Goal col2 col3 Name Date
0 85643 G-00001 1 NaN 3 1 a1asd 2021-06-13
1 85644 G-00001 2 56.0 4 22 a2asd 2021-06-13
2 8564312 G-00002 3 NaN 32 33 aabsd 2021-06-13
3 8564314 G-00002 4 89.0 43 44 aabsd 2021-06-14
4 85645 G-00001 5 73.0 55 55 a3asd 2021-06-15
5 8564316 G-00002 60 NaN 610 60 aabsd 2021-06-15
6 85646 G-00001 0 NaN 0 1 aasd 2021-06-13
7 8564318 G-00002 0 NaN 0 5 aabsd 2021-06-16
8 85647 G-00001 6 NaN 16 6 aasd 2021-06-13
9 85648 G-00001 3 NaN 23 3 aasd 2021-06-13
10 85649 G-00001 2 34.0 72 2 aasd 2021-06-13
11 85655 G-00001 4 NaN 48 4 aasd 2021-06-16
另外,我有一些“id1”列切片。
b65 = ['85643','85645', '85655','85646']
b66 = ['85643','85645','85647','85648','85649','85644']
b67 = ['8564312','8564314','8564316','8564318']
基于“id1”列切片,我想更改“ID”列,然后將該行添加回同一個資料幀。
例如,如果我們考慮 b65 切片機。
b65 = ['85643','85645', '85655','85646']
我想要類似下面的資料框。
id1 ID col1 Goal col2 col3 Name Date
0 85643 G-00001 1 NaN 3 1 a1asd 2021-06-13
1 85644 G-00001 2 56.0 4 22 a2asd 2021-06-13
2 8564312 G-00002 3 NaN 32 33 aabsd 2021-06-13
3 8564314 G-00002 4 89.0 43 44 aabsd 2021-06-14
4 85645 G-00001 5 73.0 55 55 a3asd 2021-06-15
5 8564316 G-00002 60 NaN 610 60 aabsd 2021-06-15
6 85646 G-00001 0 NaN 0 1 aasd 2021-06-13
7 8564318 G-00002 0 NaN 0 5 aabsd 2021-06-16
8 85647 G-00001 6 NaN 16 6 aasd 2021-06-13
9 85648 G-00001 3 NaN 23 3 aasd 2021-06-13
10 85649 G-00001 2 34.0 72 2 aasd 2021-06-13
11 85655 G-00001 4 NaN 48 4 aasd 2021-06-16
12 85643 b-65 1 NaN 3 1 a1asd 2021-06-13
13 85645 b-65 5 73.0 55 55 a3asd 2021-06-15
14 85646 b-65 0 NaN 0 1 aasd 2021-06-13
15 85655 b-65 4 NaN 48 4 aasd 2021-06-16
我想對其余的切片器(b66、b67)做同樣的事情,并將它們添加回同一個資料幀。有可能這樣做嗎?有什么建議嗎?提前致謝
uj5u.com熱心網友回復:
您可以使用帶有切片的字典、串列理解和pandas.concat:
slices = {'b-65': ['85643','85645', '85655','85646'],
'b-66': ['85643','85645','85647','85648','85649','85644'],
'b-67': ['8564312','8564314','8564316','8564318'],
}
pd.concat([dff]
[dff[dff['id1'].isin(v)].assign(ID=k) for k,v in slices.items()],
ignore_index=True)
輸出:
id1 ID col1 Goal col2 col3 Name Date
0 85643 G-00001 1 NaN 3 1 a1asd 2021-06-13
1 85644 G-00001 2 56.0 4 22 a2asd 2021-06-13
2 8564312 G-00002 3 NaN 32 33 aabsd 2021-06-13
3 8564314 G-00002 4 89.0 43 44 aabsd 2021-06-14
4 85645 G-00001 5 73.0 55 55 a3asd 2021-06-15
5 8564316 G-00002 60 NaN 610 60 aabsd 2021-06-15
6 85646 G-00001 0 NaN 0 1 aasd 2021-06-13
7 8564318 G-00002 0 NaN 0 5 aabsd 2021-06-16
8 85647 G-00001 6 NaN 16 6 aasd 2021-06-13
9 85648 G-00001 3 NaN 23 3 aasd 2021-06-13
10 85649 G-00001 2 34.0 72 2 aasd 2021-06-13
11 85655 G-00001 4 NaN 48 4 aasd 2021-06-16
12 85643 b-65 1 NaN 3 1 a1asd 2021-06-13
13 85645 b-65 5 73.0 55 55 a3asd 2021-06-15
14 85646 b-65 0 NaN 0 1 aasd 2021-06-13
15 85655 b-65 4 NaN 48 4 aasd 2021-06-16
16 85643 b-66 1 NaN 3 1 a1asd 2021-06-13
17 85644 b-66 2 56.0 4 22 a2asd 2021-06-13
18 85645 b-66 5 73.0 55 55 a3asd 2021-06-15
19 85647 b-66 6 NaN 16 6 aasd 2021-06-13
20 85648 b-66 3 NaN 23 3 aasd 2021-06-13
21 85649 b-66 2 34.0 72 2 aasd 2021-06-13
22 8564312 b-67 3 NaN 32 33 aabsd 2021-06-13
23 8564314 b-67 4 89.0 43 44 aabsd 2021-06-14
24 8564316 b-67 60 NaN 610 60 aabsd 2021-06-15
25 8564318 b-67 0 NaN 0 5 aabsd 2021-06-16
uj5u.com熱心網友回復:
基于@mozway 的解決方案,我嘗試使用您的串列而不是他創建的 dict = 切片。變化非常簡單
`df2 = (pd.concat([dff]
[dff[dff['id1'].isin([val])].assign(ID='b-65') for val in b65]
[dff[dff['id1'].isin([val])].assign(ID='b-66') for val in b66]
[dff[dff['id1'].isin([val])].assign(ID='b-67') for val in b67]))`
但是你需要一一添加三個串列。如果我嘗試使用串列理解來執行此操作,我將無法訪問回圈中當前串列的名稱。
`df2 = (pd.concat([dff] [dff[dff['id1'].isin([val])]
.assign(ID="name_of_the_list") for sublst in lst for val in sublst]))`
有沒有辦法將串列的名稱分配給 theID或者你真的必須創建一個以 b65/b66/b67 作為鍵的字典?
轉載請註明出處,本文鏈接:https://www.uj5u.com/qiye/347389.html
上一篇:優化函式以用給定的前一行替換一行,這是Pandas中的一個條件
下一篇:如何添加任意數量的資料幀
