我有一個如下所示的 DataFrame:
| characters | result |
|:----------:|:------:|
| b | TP |
| a | TP |
| t | FN |
| NaN | None |
| c | TN |
| o | FP |
| p | TP |
我之前從“蝙蝠”和“警察”爆炸了它。每個單詞由 NaN 行分隔。我想將它們帶回這樣的 DataFrame 格式:
| characters | result | word |
|:----------:|:----- :|:----:|
| b | TP | bat |
| a | TP | bat |
| t | FN | bat |
| NaN | None | None |
| c | TN | cop |
| o | FP | cop |
| p | TP | cop |
編輯:請忽略結果列。這只是characters和word這里很重要。原始資料框由word列組成,并應用 Pandasexplode()來獲取characters列。
uj5u.com熱心網友回復:
您可以創建一個自定義組來標識連續的非 NaN 值,然后加入并映射到原始資料幀:
m = df['characters'].isna()
group = (m!=m.shift()).cumsum().mask(m)
to_map = df.groupby(group)['characters'].apply(lambda g: ''.join(g))
df['word'] = group.map(to_map)
輸出:
characters result word
0 b TP bat
1 a TP bat
2 t FN bat
3 NaN None NaN
4 c TN cop
5 o FP cop
6 p TP cop
轉載請註明出處,本文鏈接:https://www.uj5u.com/yidong/335384.html
