如何根據列中的空單元格合并資料框行-有解無憂

如果我有以下資料框：

Index Col1 Col2 Col3
1     10   x    40
2          y    50
3          z    60
4     20   a    30

我想將具有空白 Col1 的行與 Col1 中非空白的前一行合并。

預期輸出：

Index Col1 Col2  Col3
1     10   x,y,z 40,50,60
4     20   a     30

這可能嗎？

謝謝

uj5u.com熱心網友回復：

很有可能，您需要做的是創建一個唯一的組值，該值在每個非空值處遞增。

一氣呵成

df.drop('Col1',axis=1).groupby((df['Col1'].isna()==False).cumsum()).agg(list)
#p.s if really want strings use
#df.drop('Col1',axis=1).groupby((df['Col1'].isna()==False
#                      ).cumsum()).astype(str).agg(','.join)


           Col2          Col3
Col1
1     [x, y, z]  [40, 50, 60]
2           [a]          [30]

在key這里的條件：

df[['Col1']].assign(con=df['Col1'].isna()==False)

   Col1    con #for condition
0  10.0   True <-- first group
1   NaN  False
2   NaN  False
3  20.0   True <-- second group

現在，創建累積總和允許您創建石斑魚物件。

df[['Col1']].assign(con=(df['Col1'].isna()==False).cumsum())


   Col1  con
0  10.0    1
1   NaN    1
2   NaN    1
3  20.0    2

uj5u.com熱心網友回復：

我們可以做的

out = df.drop(labels = 'Col1',axis = 1).astype(str).groupby(df['Col1'].mask(df['Col1']=='').ffill()).agg(','.join).reset_index()
Out[85]: 
   Col1   Col2      Col3
0  10.0  x,y,z  40,50,60
1  20.0      a        30

uj5u.com熱心網友回復：

發布的答案回答了我的“愚蠢的”資料集問題，但我無法讓它們在我的真實世界資料集的情況下作業。在此之前，我已經發布了另一個問題，涉及解決我的問題，同時提取資料而不是一次性操作資料，并且從回答這個問題的結果中形成了一個答案。

它在這里

答案是：

last_valid = None

        check_cols = ['Col1']  # if only need to check a subset of cols for validity, do it here
       

        df = df.astype(str) #convert all columns to strings as I have to combine numbers in the same cell

        df = df.replace('nan','') #get rid of the nan created back to a blank string

        for i, s in df.iterrows():  # This is slow, but probably necessary in this case
           

            """ If all the rows are valid, we want to keep it as a reference in case

            the following rows are not """

            if all(s[check_cols] != ''):

                lvi, last_valid = i, s

                # need to store index and series so we can go back and replace it

                continue

            else:  # here is the critical part

                extra_vals = s[s != '']  # find cells in row that have actual values

                for col in extra_vals.index:

                    """ I'm creating a list and appending here since I don't know

                    your values or how they need to be handled exactly"""

                    last_valid[col] = last_valid[col]   ","   extra_vals[col] #separate by whatever you wish, list was causing issues

                # replace that row in the dataframe

                df.iloc[lvi, :] = last_valid

 

        # drop extra rows:

        df = df[df['Col1'] != ''].reset_index(drop=True)

轉載請註明出處，本文鏈接：https://www.uj5u.com/net/349886.html

標籤：Python 熊猫数据框

上一篇：如何消除熊貓資料框中的閏年

下一篇：用dplyr比較兩列串列型別行