我想在df基于另一列的熊貓資料框中創建一列ID。對于ID包含字串的內容SAT,我想提取由特殊字符“-”連接的浮點數,并將提取的內容放入名為new_col. 如果ID不包含SAT字串,則將其保留為NaN。
df 如下:
Date ID Time
0 2007-01-10 SAT 1 HHSP 900
1 2007-01-10 DOUBLE 7 HHSP 900
2 2007-01-10 SAT GF-06-5CSBG.431 1000
3 2007-01-10 MA HYDRO HHSP 900
4 2007-01-10 2.233 HHSP 900
5 2007-01-10 SAT L2-15-3CSB1.252 1000
6 2007-01-10 SECTION 6 HHSP 900
預期輸出:
Date ID Time new_col
0 2007-01-10 SAT 1 HHSP 900 NaN
1 2007-01-10 DOUBLE 7 HHSP 900 NaN
2 2007-01-10 SAT GF-06-5CSBG.431 1000 06-5
3 2007-01-10 MA HYDRO HHSP 900 NaN
4 2007-01-10 2.233 HHSP 900 NaN
5 2007-01-10 SAT L2-15-3 CSB1.252 1000 15-3 * In this case 15-3 instead of 2-15 is extracted because L2 is not completely floats.
6 2007-01-10 SECTION 6 HHSP 900 NaN
uj5u.com熱心網友回復:
與之前用Series.str.extractwith 連接的數字一起-使用-,并且僅用于SAT過濾器的值Series.str.contains:
m = df['ID'].str.contains('SAT')
df['new_col'] = df.loc[m, 'ID'].str.extract('[-\s ](\d \-\d )')
print (df)
Date ID Time new_col
0 2007-01-10 SAT 1 HHSP 900 NaN
1 2007-01-10 DOUBLE 7 HHSP 900 NaN
2 2007-01-10 SAT GF-06-5CSBG.431 1000 06-5
3 2007-01-10 MA HYDRO HHSP 900 NaN
4 2007-01-10 2.233 HHSP 900 NaN
5 2007-01-10 SAT L2-15-3CSB1.252 1000 15-3
6 2007-01-10 SECTION 6 HHSP 900 NaN
如果SAT可以使用列中的值開始:
df['new_col'] = df['ID'].str.extract('^SAT.*[-\s ](\d \-\d )', expand=False)
轉載請註明出處,本文鏈接:https://www.uj5u.com/gongcheng/321093.html
