我有一個這樣的資料框:
import pandas as pd
import numpy as np
df = pd.DataFrame({"a":[10, 13, 15, 30],
"b:1":[np.nan, np.nan, 13, 14],
"b:2":[6, 7, np.nan, np.nan]})
當它們以“b:”開頭時,我想將列組合成一列“b”。在這種情況下我可以簡單地使用df["b"] = df["b:1"].combine_first(df["b:2"]),但這是一個更大資料框的示例,有時它也可以有類似“b:3”和轉發的東西,甚至還有其他帶有“c:1,c:2”的列,這些是最后一個那些我不想合并的。
任何人都可以告訴我如何做到這一點,所以我的最終資料框將是:
df
Out[23]:
a b:1 b:2 b
0 10 NaN 6.0 6.0
1 13 NaN 7.0 7.0
2 15 13.0 NaN 13.0
3 30 14.0 NaN 14.0
uj5u.com熱心網友回復:
您可以使用str.containsfordf.columns然后 sum on axis=1:
col_b = df.columns[df.columns.str.contains('b')]
df['b'] = df[col_b].sum(axis=1)
uj5u.com熱心網友回復:
這可能會幫助你:
from functools import reduce
import pandas as pd
import numpy as np
df = ... # define DataFrame
exclude_cols = ['c', 'd'] # List the columns that should be excluded from merging
included_cols = []
for col in df.columns:
if ':' in col:
base_col = col.split(':')[0]
if base_col in included_cols:
continue
associated_cols = [c for c in df.columns if f"{base_col}:" in col]
df[base_col] = reduce(lambda x, y: x.combine_first(y), [df[c] for c in associated_cols])
included_cols.append(base_col)
uj5u.com熱心網友回復:
您可以遍歷所有首字母并回填:
df = pd.DataFrame({"a":[10, 13, 15, 30, 11],
"b:1":[np.nan, np.nan, 13, 14, np.nan],
"b:2":[6, 7, np.nan, np.nan, np.nan],
"b:3":[np.nan, np.nan, np.nan, np.nan, 11]})
df_combined = pd.DataFrame()
for first_letter in set([c[0] for c in df.columns]):
df_combined[first_letter] = \
df[[c for c in df.columns if c[0]==first_letter]].fillna(method='bfill', axis=1).iloc[:,0]
| b | 一個 | |
|---|---|---|
| 0 | 6 | 10 |
| 1 | 7 | 13 |
| 2 | 13 | 15 |
| 3 | 14 | 30 |
| 4 | 11 | 11 |
uj5u.com熱心網友回復:
另一種可能的解決方案:
df['b'] = df.T[lambda x: x.index.str.startswith('b:')].ffill().bfill().iloc[0]
輸出:
a b:1 b:2 b
0 10 NaN 6.0 6.0
1 13 NaN 7.0 7.0
2 15 13.0 NaN 13.0
3 30 14.0 NaN 14.0
轉載請註明出處,本文鏈接:https://www.uj5u.com/houduan/513796.html
標籤:Python熊猫合并
