我有一個包含很多變數的熊貓資料框:
df.columns
Out[0]:
Index(['COUNADU_SOIL_P_NUMBER_16_DA_B_VE_count_nr_lesion_PRATZE',
'COUNEGG_SOIL_P_NUMBER_50_DA_B_VT_count_nr_lesion_PRATZE',
'COUNJUV_SOIL_P_NUMBER_128_DA_B_V6_count_nr_lesion_PRATZE',
'COUNADU_SOIL_P_SAUDPC_150_DA_B_V6_lesion_saudpc_PRATZE',
'CONTRO_SOIL_P_pUNCK_150_DA_B_V6_lesion_p_control_PRATZE',
'COUNJUV_SOIL_P_p_0_100_16_DA_B_V6_lesion_incidence_PRATZE',
'COUNADU_SOIL_P_p_0_100_50_DA_B_VT_lesion_incidence_PRATZE',
'COUNEGG_SOIL_P_p_0_100_128_DA_B_VT_lesion_incidence_PRATZE',
'COUNEGG_SOIL_P_NUMBER_50_DA_B_V6_count_nr_spiral_HELYSP',
'COUNJUV_SOIL_P_NUMBER_128_DA_B_V10_count_nr_spiral_HELYSP', # and so on
我只想保留后面跟著 DA 的數字,所以第一列是16_DA. 我一直在使用熊貓功能findall():
df.columns.str.findall(r'[0-9]*\_DA')
Out[595]:
Index([ ['16_DA'], ['50_DA'], ['128_DA'], ['150_DA'], ['150_DA'],
['16_DA'], ['50_DA'], ['128_DA'], ['50_DA'], ['128_DA'], ['150_DA'],
['150_DA'], ['50_DA'], ['128_DA'],
但這會回傳一個我想避免的串列,因此我最終會得到一個如下所示的列索引:
df.columns
Out[595]:
Index('16_DA', '50_DA', '128_DA', '150_DA', '150_DA',
'16_DA', '50_DA', '128_DA', '50_DA', '128_DA', '150_DA',
有沒有更順暢的方法來做到這一點?
uj5u.com熱心網友回復:
您可以使用.str.join(", ")逗號和空格連接所有找到的匹配項:
df.columns.str.findall(r'\d _DA').str.join(", ")
或者,僅用于str.extract獲取第一場比賽:
df.columns.str.extract(r'(\d _DA)', expand=False)
uj5u.com熱心網友回復:
from typing import List
pattern = r'[0-9]*\_DA'
flattened: List[str] = sum(df.columns.str.findall(pattern), [])
output: str = ",".join(flattened)
uj5u.com熱心網友回復:
另一種方法:
def check_name(col: str) -> bool:
cond1 = col.split("_")[1].__eq__("DA")
cond2 = col.split("_")[0].isdigit()
return cond1 and cond2
list(filter(lambda col: check_name, df.columns))
轉載請註明出處,本文鏈接:https://www.uj5u.com/qiye/390639.html
