假設我有一個這樣的資料框:
Col1 Col2
0 AAA_BBB_123_DD 123
1 AAA_123_BBB_DD 123
2 123_AAA_BBB_DD 123
3 123_AAA_BB_DDD NaN
4 456_AAA_BBB_DD 456
5 AAA_BBB_456_DD 456
6 AAA_789_BBB_DD NaN
7 AAA_BBB_789_DD 789
8 AAA_000_BBB_DD NaN
我想要的是,對于 Col2 中的 NaN 值,我想檢查 Col1 中的字串,用“_”將其拆分,如果它包含某些內容,則將其放入 Col2。
在沒有資料框的正常情況下,如果我有這樣的字串,123_AAA_BB_DDD我會這樣做:
str = 123_AAA_BB_DDD
values = ['123','456','789']
split_str = str.split("_")
for i in split_str:
if any(value in i for value in values):
col2_value = i
else:
col2_value = 'Not Found'
我想要的輸出如下所示:
Col1 Col2
0 AAA_BBB_123_DD 123
1 AAA_123_BBB_DD 123
2 123_AAA_BBB_DD 123
3 123_AAA_BB_DDD 123
4 456_AAA_BBB_DD 456
5 AAA_BBB_456_DD 456
6 AAA_789_BBB_DD 789
7 AAA_BBB_789_DD 789
8 AAA_000_BBB_DD Not Found
編輯:
該解決方案適用于串列中的值與 Col1 中的字串完全匹配的情況,例如(串列中的 123 和 Col1 字串中的 123)。但是,如果我有這樣的東西:AAA_PORT123_BBB_DD,靈魂會在 Col2 中放置“未找到”,所以可以說我有這樣的 df:
Col1 Col2
0 AAA_BBB_PORT123_DD PORT123
1 AAA_123_BBB_DD 123
2 STD123_AAA_BBB_DD STD123
3 123_AAA_BB_DDD NaN
4 456_AAA_BBB_DD 456
5 AAA_BBB_456_DD 456
6 AAA_MAN789_BBB_DD NaN
7 AAA_BBB_789_DD 789
8 AAA_000_BBB_DD NaN
我想要的輸出是:
Col1 Col2
0 AAA_BBB_PORT123_DD PORT123
1 AAA_123_BBB_DD 123
2 STD123_AAA_BBB_DD STD123
3 123_AAA_BB_DDD 123
4 456_AAA_BBB_DD 456
5 AAA_BBB_456_DD 456
6 AAA_MAN789_BBB_DD MAN789
7 AAA_BBB_789_DD 789
8 AAA_000_BBB_DD Not Found
uj5u.com熱心網友回復:
對于在Col2呼叫自定義函式以匹配 list 中的第一個值時缺少值的行value,對于僅用于匹配行的運行函式DataFrame.loc,在雙方都使用掩碼:
values = ['123','456','789']
m = df['Col2'].isna()
f = lambda x: next((y for y in x.split('_') if y in values), 'Not Found')
df.loc[m, 'Col2'] = df.loc[m, 'Col1'].apply(f)
print (df)
Col1 Col2
0 AAA_BBB_123_DD 123.0
1 AAA_123_BBB_DD 123.0
2 123_AAA_BBB_DD 123.0
3 123_AAA_BB_DDD 123
4 456_AAA_BBB_DD 456.0
5 AAA_BBB_456_DD 456.0
6 AAA_789_BBB_DD 789
7 AAA_BBB_789_DD 789.0
8 AAA_000_BBB_DD Not Found
uj5u.com熱心網友回復:
df = pd.DataFrame([
{
"col1": "456_AAA_BBB_DD",
"col2": "123",
},
{
"col1": "456_AAA_BBB_DD",
"col2": np.NaN,
},
{
"col1": "000_AAA_BBB_DD",
"col2": np.NaN,
}
])
df["col2"] = df["col2"]
values = ['123','456','789']
df.loc[df['col2'].isnull(), 'col2'] = df['col1'].str.split("_").apply(lambda row: next((x for x in row if x in values), "Not Found"))
初始資料框
col1 col2
0 456_AAA_BBB_DD 123
1 456_AAA_BBB_DD NaN
2 000_AAA_BBB_DD NaN
輸出 :
col1 col2
0 456_AAA_BBB_DD 123
1 456_AAA_BBB_DD 456
2 000_AAA_BBB_DD Not Found
df.loc[df['col2'].isnull(), 'col2']col2如果col2為空,將僅更新列
首先我們將col1與df['col1'].str.split("_")
然后我們在串列中搜索一個元素是否在values
x for x in a row if x in values將回傳一個生成器物件。
next允許我們只取生成器的第一個值。函式的第二個引數是默認值
轉載請註明出處,本文鏈接:https://www.uj5u.com/caozuo/360274.html
