我已經加載了一些 JSON API 資料作為 Pandas 資料框,因此,有一些列作為串列出現。我也有一些NaN價值觀。
首先,我想用一個詞替換 NaN,例如“空”,但其余資料已經在串列形式中。我想最終創建一個對這個list結構進行操作的新列,并基本上將它轉換為一個字串,因為我稍后將使用這些字串來執行映射邏輯。
以下是一些示例資料和邏輯:
import pandas as pd
import numpy as np
df_test = pd.DataFrame(data={'id': [1,2,3,4],
'name': [['amanda','jen','edward','ralph'],
np.NaN,
['megan','roger','greg','donald'],
['teddy','ellie','greg','jamie']]
})
# issue is here trying to coerce the element of data to a list.
# it takes in the elements of the string and creates a list of characters for the one I replace NaNs on
df_test['name'] = df_test['name'].fillna('empty').apply(list)
# here I take the lists and sort and rearrange them into a string so I can later use this format as a dictionary key.
# Maybe there is a smarter way to do this
df_test['name_str'] = df_test['name'].apply(lambda x: ", ".join(sorted(x)).lower())
print(df_test.head())
id name name_str
0 1 [amanda, jen, edward, ralph] amanda, edward, jen, ralph
1 2 [e, m, p, t, y] e, m, p, t, y
2 3 [megan, roger, greg, donald] donald, greg, megan, roger
3 4 [teddy, ellie, greg, jamie] ellie, greg, jamie, teddy
關于如何以一種使 NaN 仍然“類似于串列”的方式處理 NaN 的任何想法?我無法在列上執行我的 lambda 函式,因為 NaN 被視為浮點數。
編輯:@SimonHawe 在評論中提供的解決方案。fillna解決方案是在 lambda 函式中使用 if else 來處理 NaN 情況,而不是根本不使用。
解決方案:
import pandas as pd
import numpy as np
df_test = pd.DataFrame(data={'id': [1,2,3,4],
'name': [['amanda','jen','edward','ralph'],
np.NaN,
['megan','roger','greg','donald'],
['teddy','ellie','greg','jamie']]
})
# here I take the lists and sort and rearrange them into a string so I can later use this format as a dictionary key.
# Maybe there is a smarter way to do this
df_test['name_str'] = df_test['name'].apply(lambda x: ", ".join(sorted(x)).lower() if isinstance(x,list) else 'empty')
print(df_test.head())
id name name_str
0 1 [amanda, jen, edward, ralph] amanda, edward, jen, ralph
1 2 [e, m, p, t, y] empty
2 3 [megan, roger, greg, donald] donald, greg, megan, roger
3 4 [teddy, ellie, greg, jamie] ellie, greg, jamie, teddy
uj5u.com熱心網友回復:
IIUC,您可以獲得所有行NaN并填充它們,['empty']然后您可以通過該eval函式:
m = df_test['name'].isna()
df_test.loc[m, 'name'] = df_test.loc[mask, 'name'].fillna("['empty']").apply(eval)
轉載請註明出處,本文鏈接:https://www.uj5u.com/ruanti/404640.html
標籤:
