我在下面使用的串列中有以下 df 列,以確保添加列的順序正確:
import pandas as pd
column_header = ['blast_id', 'labels', 'name', 'subject', 'list', 'mode', 'copy_template', 'stats',
'start_time', 'modify_time', 'schedule_time', 'email_count']
df = df[column_header]
但是 df 缺少一些值,例如labelsandname等。我如何確保如果column_header缺少任何列,我們只是添加該列并具有空值?
uj5u.com熱心網友回復:
假設這個輸入:
copy_template list modify_time blast_id name stats mode
0 1 1 1 1 1 1 1
1 2 2 2 2 2 2 2
你需要reindex:
column_header = ['blast_id', 'labels', 'name', 'subject', 'list', 'mode', 'copy_template', 'stats',
'start_time', 'modify_time', 'schedule_time', 'email_count']
df.reindex(columns=column_header)
輸出:
blast_id labels name subject list mode copy_template stats start_time modify_time schedule_time email_count
0 1 NaN 1 NaN 1 1 1 1 NaN 1 NaN NaN
1 2 NaN 2 NaN 2 2 2 2 NaN 2 NaN NaN
uj5u.com熱心網友回復:
您可以檢查任何缺少的列,然后將它們設定為 Nan 為所有行:
# identify any cols that we don't have
missing_cols = [
col for col in df.columns
if col not in set(column_header)
]
# fill in each one with nan
for col in missing_cols:
df[col] = pd.np.nan
df[column_header]
或者在一段非常短的代碼中:
for col in column_header:
if col not in df.columns:
df[col] = pd.np.nan
uj5u.com熱心網友回復:
單線:
df[pd.Index(column_header).difference(df.columns)] = np.nan
為確保所有列的順序正確,請改為執行以下操作:
missing = pd.Index(column_header).difference(df.columns)
df[missing] = np.nan
df = df[column_header]
轉載請註明出處,本文鏈接:https://www.uj5u.com/caozuo/443606.html
下一篇:如何旋轉特定的資料框?
