我有一個資料框:
,nsn,nsn2,cage,part_number,company_name
0,6520-01-533-3775 ,6520015333775 ,1TDD0,973-0404,
1,6520-01-533-3775 ,6520015333775 ,4N2Q0,973-0404,
3,5995-01-633-1445 ,5995016331445 ,0RAG4,945923,"OHIO ASSOCIATED ENTERPRISES, LLC"
4,5331-00-157-6630 ,5331001576630 ,99167,3803-28,HAMILTON SUNDSTRAND CORPORATION
5,2915-00-908-6032 ,2915009086032 ,06848,2523862,HONEYWELL INTERNATIONAL INC.
6,5905-01-446-7000 ,5905014467000 ,63005,23054911,ROLLS-ROYCE CORPORATION
7,2840-01-440-7755 ,2840014407755 ,99207,5124T01G01,GENERAL ELECTRIC COMPANY
我想添加一個matches只有兩個字串值Possible和Close. 值過濾條件是如果零件號是包含“-”符號的完整數字,則close如果零件號包含字符則possible匹配。
所以這里預期的資料框將是:
,nsn,nsn2,cage,part_number,company_name, matches
0,6520-01-533-3775 ,6520015333775 ,1TDD0,973-0404,,close
1,6520-01-533-3775 ,6520015333775 ,4N2Q0,973-0404,,close.
3,5995-01-633-1445 ,5995016331445 ,0RAG4,945923,"OHIO ASSOCIATED ENTERPRISES, LLC",close
4,5331-00-157-6630 ,5331001576630 ,99167,3803-28,HAMILTON SUNDSTRAND CORPORATION,close
5,2915-00-908-6032 ,2915009086032 ,06848,2523862,HONEYWELL INTERNATIONAL INC.,close
6,5905-01-446-7000 ,5905014467000 ,63005,23054911,ROLLS-ROYCE CORPORATION,close
7,2840-01-440-7755 ,2840014407755 ,99207,5124T01G01,GENERAL ELECTRIC COMPANY,possible
uj5u.com熱心網友回復:
使用Series.str.contains正則運算式匹配數字或-在字串開頭和字串^結尾之間$設定新列numpy.where:
df['matches'] = np.where(df['part_number'].str.contains('^[0-9-] $'),'close','possible')
print (df)
nsn nsn2 cage part_number \
0 6520-01-533-3775 6520015333775 1TDD0 973-0404
1 6520-01-533-3775 6520015333775 4N2Q0 973-0404
3 5995-01-633-1445 5995016331445 0RAG4 945923
4 5331-00-157-6630 5331001576630 99167 3803-28
5 2915-00-908-6032 2915009086032 06848 2523862
6 5905-01-446-7000 5905014467000 63005 23054911
7 2840-01-440-7755 2840014407755 99207 5124T01G01
company_name matches
0 NaN close
1 NaN close
3 OHIO ASSOCIATED ENTERPRISES, LLC close
4 HAMILTON SUNDSTRAND CORPORATION close
5 HONEYWELL INTERNATIONAL INC. close
6 ROLLS-ROYCE CORPORATION close
7 GENERAL ELECTRIC COMPANY possible
uj5u.com熱心網友回復:
這應該作業
df['matches'] = np.where(df['part_number'].str.replace('-','').str.isdigit() == True,'close','possible')
在這里,我用空字串替換了 '-' 并使用了 isdigit() 函式,它檢查字串中的所有字符是否都是數字。基于此條件,使用 np.where() 創建了一個新列。
轉載請註明出處,本文鏈接:https://www.uj5u.com/ruanti/404638.html
標籤:
