如何處理元素為串列的Pandas列？-有解無憂

我已經加載了一些 JSON API 資料作為 Pandas 資料框，因此，有一些列作為串列出現。我也有一些NaN價值觀。

首先，我想用一個詞替換 NaN，例如“空”，但其余資料已經在串列形式中。我想最終創建一個對這個list結構進行操作的新列，并基本上將它轉換為一個字串，因為我稍后將使用這些字串來執行映射邏輯。

以下是一些示例資料和邏輯：

import pandas as pd
import numpy as np

df_test = pd.DataFrame(data={'id': [1,2,3,4],
                             'name': [['amanda','jen','edward','ralph'],
                                      np.NaN,
                                      ['megan','roger','greg','donald'],
                                      ['teddy','ellie','greg','jamie']]
                            })

# issue is here trying to coerce the element of data to a list.
# it takes in the elements of the string and creates a list of characters for the one I replace NaNs on
df_test['name'] = df_test['name'].fillna('empty').apply(list)

# here I take the lists and sort and rearrange them into a string so I can later use this format as a dictionary key. 
# Maybe there is a smarter way to do this
df_test['name_str'] = df_test['name'].apply(lambda x: ", ".join(sorted(x)).lower())
print(df_test.head())

   id                          name                    name_str
0   1  [amanda, jen, edward, ralph]  amanda, edward, jen, ralph
1   2               [e, m, p, t, y]               e, m, p, t, y
2   3  [megan, roger, greg, donald]  donald, greg, megan, roger
3   4   [teddy, ellie, greg, jamie]   ellie, greg, jamie, teddy

關于如何以一種使 NaN 仍然“類似于串列”的方式處理 NaN 的任何想法？我無法在列上執行我的 lambda 函式，因為 NaN 被視為浮點數。

編輯：@SimonHawe 在評論中提供的解決方案。fillna解決方案是在 lambda 函式中使用 if else 來處理 NaN 情況，而不是根本不使用。

解決方案：

import pandas as pd
import numpy as np

df_test = pd.DataFrame(data={'id': [1,2,3,4],
                             'name': [['amanda','jen','edward','ralph'],
                                      np.NaN,
                                      ['megan','roger','greg','donald'],
                                      ['teddy','ellie','greg','jamie']]
                            })


# here I take the lists and sort and rearrange them into a string so I can later use this format as a dictionary key. 
# Maybe there is a smarter way to do this
df_test['name_str'] = df_test['name'].apply(lambda x: ", ".join(sorted(x)).lower() if isinstance(x,list) else 'empty')
print(df_test.head())

   id                          name                    name_str
0   1  [amanda, jen, edward, ralph]  amanda, edward, jen, ralph
1   2               [e, m, p, t, y]                       empty
2   3  [megan, roger, greg, donald]  donald, greg, megan, roger
3   4   [teddy, ellie, greg, jamie]   ellie, greg, jamie, teddy

uj5u.com熱心網友回復：

IIUC，您可以獲得所有行NaN并填充它們，['empty']然后您可以通過該eval函式：

m = df_test['name'].isna()
df_test.loc[m, 'name'] = df_test.loc[mask, 'name'].fillna("['empty']").apply(eval)

轉載請註明出處，本文鏈接：https://www.uj5u.com/ruanti/404640.html

標籤：

上一篇：如何將特殊字符轉換為常規字符（é到e、?到a等）？

下一篇：無法以我在Javascript中輸入的相同方式讀取阿拉伯語輸入文本的值