我有一個資料幀 (atc_df),其中有一列名為“atc”的列由具有固定結構的固定長度字串組成,并且可以分為 5 個子編碼級別。下面是一個例子:
prin atc
0 Acarbosio A10BF01
1 Aceclofenac M01AB16
2 Aciclovir J05AB01
3 Acido acetilsalicilico B01AC06
4 Acido alendronico M05BA04
... ... ...
324 Voriconazolo J02AC03
325 Zofenopril C09AA15
326 Zofenopril idroclorotiazide C09BA15
327 Zolmitriptan N02CC03
328 Zonisamide N03AX15
我有一個函式,給定一個 atc 代碼將回傳 5 個子代碼的串列:
def atc_split(atc_str):
atc1 = atc_str[0]
atc2 = atc_str[1:3]
atc3 = atc_str[3]
atc4 = atc_str[4]
atc5 = atc_str[5:7]
return(atc1,atc2,atc3,atc4,atc5)
兩個問題:
是否有更有效/優雅的方式將 atc 代碼拆分為其五個子代碼?
我怎樣才能最好地將此函式應用于 atc_df 資料幀以向每行添加五個新列 (atc1..atc5)?
謝謝
uj5u.com熱心網友回復:
使用str.extract:
df_atc[[f"atc{i 1}" for i in range(5)]] = df_atc["atc"].str.extract("(\w)(\d{2})(\w)(\w)(\d{2})")
>>> df_atc
prin atc atc1 atc2 atc3 atc4 atc5
0 Acarbosio A10BF01 A 10 B F 01
1 Aceclofenac M01AB16 M 01 A B 16
2 Aciclovir J05AB01 J 05 A B 01
3 Acido acetilsalicilico B01AC06 B 01 A C 06
4 Acido alendronico M05BA04 M 05 B A 04
如果您希望包含以前的代碼的每個代碼,您可以使用切片.str:
df_atc["atc1"] = df_atc["atc"].str[0]
df_atc["atc2"] = df_atc["atc"].str[:3]
df_atc["atc3"] = df_atc["atc"].str[:4]
df_atc["atc4"] = df_atc["atc"].str[:5]
df_atc["atc5"] = df_atc["atc"]
>>> df_atc
prin atc atc1 atc2 atc3 atc4 atc5
0 Acarbosio A10BF01 A A10 A10B A10BF A10BF01
1 Aceclofenac M01AB16 M M01 M01A M01AB M01AB16
2 Aciclovir J05AB01 J J05 J05A J05AB J05AB01
3 Acido acetilsalicilico B01AC06 B B01 B01A B01AC B01AC06
4 Acido alendronico M05BA04 M M05 M05B M05BA M05BA04
轉載請註明出處,本文鏈接:https://www.uj5u.com/caozuo/360264.html
