如果我有以下資料框:
Index Col1 Col2 Col3
1 10 x 40
2 y 50
3 z 60
4 20 a 30
我想將具有空白 Col1 的行與 Col1 中非空白的前一行合并。
預期輸出:
Index Col1 Col2 Col3
1 10 x,y,z 40,50,60
4 20 a 30
這可能嗎?
謝謝
uj5u.com熱心網友回復:
很有可能,您需要做的是創建一個唯一的組值,該值在每個非空值處遞增。
一氣呵成
df.drop('Col1',axis=1).groupby((df['Col1'].isna()==False).cumsum()).agg(list)
#p.s if really want strings use
#df.drop('Col1',axis=1).groupby((df['Col1'].isna()==False
# ).cumsum()).astype(str).agg(','.join)
Col2 Col3
Col1
1 [x, y, z] [40, 50, 60]
2 [a] [30]
在key這里的條件:
df[['Col1']].assign(con=df['Col1'].isna()==False)
Col1 con #for condition
0 10.0 True <-- first group
1 NaN False
2 NaN False
3 20.0 True <-- second group
現在,創建累積總和允許您創建石斑魚物件。
df[['Col1']].assign(con=(df['Col1'].isna()==False).cumsum())
Col1 con
0 10.0 1
1 NaN 1
2 NaN 1
3 20.0 2
uj5u.com熱心網友回復:
我們可以做的
out = df.drop(labels = 'Col1',axis = 1).astype(str).groupby(df['Col1'].mask(df['Col1']=='').ffill()).agg(','.join).reset_index()
Out[85]:
Col1 Col2 Col3
0 10.0 x,y,z 40,50,60
1 20.0 a 30
uj5u.com熱心網友回復:
發布的答案回答了我的“愚蠢的”資料集問題,但我無法讓它們在我的真實世界資料集的情況下作業。在此之前,我已經發布了另一個問題,涉及解決我的問題,同時提取資料而不是一次性操作資料,并且從回答這個問題的結果中形成了一個答案。
它在這里
答案是:
last_valid = None
check_cols = ['Col1'] # if only need to check a subset of cols for validity, do it here
df = df.astype(str) #convert all columns to strings as I have to combine numbers in the same cell
df = df.replace('nan','') #get rid of the nan created back to a blank string
for i, s in df.iterrows(): # This is slow, but probably necessary in this case
""" If all the rows are valid, we want to keep it as a reference in case
the following rows are not """
if all(s[check_cols] != ''):
lvi, last_valid = i, s
# need to store index and series so we can go back and replace it
continue
else: # here is the critical part
extra_vals = s[s != ''] # find cells in row that have actual values
for col in extra_vals.index:
""" I'm creating a list and appending here since I don't know
your values or how they need to be handled exactly"""
last_valid[col] = last_valid[col] "," extra_vals[col] #separate by whatever you wish, list was causing issues
# replace that row in the dataframe
df.iloc[lvi, :] = last_valid
# drop extra rows:
df = df[df['Col1'] != ''].reset_index(drop=True)
轉載請註明出處,本文鏈接:https://www.uj5u.com/net/349886.html
上一篇:如何消除熊貓資料框中的閏年
下一篇:用dplyr比較兩列串列型別行
