我有一個約 700 列的資料集。我想將所有列合并到一個列中。
輸入:
id | A | B | C | D | E | F | ... | Z
0 | yes | no | yes | no | yes| no | ... | no
1 | no | no | yes | no | no | no | ... | no
2 | yes | yes| yes | yes| yes| no | ... | no
輸出:
id | A | B | C | D | E | F | ... | Z | joined_column
0 | yes | no | yes | no | yes| no | ... | no | yes no yes no yes no ... no
1 | no | no | yes | no | no | no | ... | no | no no yes no no no ... no
2 | yes | yes| yes | yes| yes| no | ... | no | yes yes yes yes yes no ... no
我過去使用過以下內容。但是,當您使用大量列時,我正在尋找一種方法來擴展它。
def join(df):
list = []
for i in range(0, df.shape[0]):
list.append( str(df['A'][i]) ' ' str(df['B'][i]) ' ' str(df['C'][i]))
return list
uj5u.com熱心網友回復:
給定df:
A B C D E F
0 yes no yes no yes no
1 no no yes no no no
2 yes yes yes yes yes no
正在做:
cols = df.columns
# As a string:
df['joined_column_str'] = df[cols].agg(' '.join, axis=1)
# As a list:
df['joined_column_list'] = df[cols].agg(list, axis=1)
輸出:
A B C D E F joined_column_str joined_column_list
0 yes no yes no yes no yes no yes no yes no [yes, no, yes, no, yes, no]
1 no no yes no no no no no yes no no no [no, no, yes, no, no, no]
2 yes yes yes yes yes no yes yes yes yes yes no [yes, yes, yes, yes, yes, no]
uj5u.com熱心網友回復:
您可以使用agg特定操作在特定軸上聚合資料:
df['joined_column'] = df.agg(' '.join, axis=1)
潛在用途
df['joined'] = df.iloc[:,1:].agg(' '.join, axis=1)
如果您不想加入第一列(或任何其他列)。
我曾經' '.join用空格作為分隔符來連接值。
axis設定為,1因為您要連接列而不是行。
uj5u.com熱心網友回復:
考慮資料框df
id A B ... Y Z joined_column
0 0 yes no ... yes no yes no yes no yes no yes no yes no yes no yes ...
1 1 no no ... no no no no yes no no no no no no no no no no no no ...
2 2 yes yes ... yes yes yes yes yes yes yes no yes yes yes yes yes yes...
可以在內部使用自定義 lambda 函式pandas.DataFrame.apply,如下所示
df['joined_column'] = df.apply(lambda x: ' '.join(x.astype(str)), axis=1)
[Out]:
id A B ... Y Z joined_column
0 0 yes no ... yes no 0 yes no yes no yes no yes no yes no yes no ye...
1 1 no no ... no no 1 no no yes no no no no no no no no no no no n...
2 2 yes yes ... yes yes 2 yes yes yes yes yes no yes yes yes yes yes y...
但是,由于不希望id出現在 中joined_column,因此可以添加x.drop('id')為
df['joined_column'] = df.apply(lambda x: ' '.join(x.drop('id').astype(str)), axis=1)
[Out]:
id A B ... Y Z joined_column
0 0 yes no ... yes no yes no yes no yes no yes no yes no yes no yes ...
1 1 no no ... no no no no yes no no no no no no no no no no no no ...
2 2 yes yes ... yes yes yes yes yes yes yes no yes yes yes yes yes yes...
或者,對于最后一種情況,也可以利用 Numpy,更具體地說numpy.array如下
df['joined_column'] = np.array([' '.join(x.astype(str)) for x in df.drop('id', axis=1).values])
[Out]:
id A B ... Y Z joined_column
0 0 yes no ... yes no yes no yes no yes no yes no yes no yes no yes ...
1 1 no no ... no no no no yes no no no no no no no no no no no no ...
2 2 yes yes ... yes yes yes yes yes yes yes no yes yes yes yes yes yes...
uj5u.com熱心網友回復:
也許您可以使用串列推導:
df['joined_column'] = [' '.join(item) for item in df.iloc[:,1:].values]
id A B C D E F Z joined_column
0 0 yes no yes no yes no no yes no yes no yes no no
1 1 no no yes no no no no no no yes no no no no
2 2 yes yes yes yes yes no no yes yes yes yes yes no no
uj5u.com熱心網友回復:
一個可能的解決方案:
df['joined_column'] = df.iloc[:,1:].add(' ').sum(axis=1).str.rstrip()
另一種可能的解決方案,使用functools.reduce:
from functools import reduce
df['joined_column'] = reduce(lambda x,y: x ' ' y, [df[i] for i in df.columns[1:]])
輸出:
id A B C D E F Z joined_column
0 0 yes no yes no yes no no yes no yes no yes no no
1 1 no no yes no no no no no no yes no no no no
2 2 yes yes yes yes yes no no yes yes yes yes yes no no
轉載請註明出處,本文鏈接:https://www.uj5u.com/caozuo/515052.html
