我正在嘗試連接兩個 Pandas 資料框并且遇到了 IndexError。這是一些模擬資料:
import pandas as pd
df1 = pd.DataFrame({'col1': [1,2,3],
'col2': [4,5,6]
})
df2 = pd.DataFrame({'col1': [7,8,9],
'col2': ['10','11','12'],
'col3': ['13','14','15']
})
# Concat and keep only cols from df1
df3 = pd.concat([df1, df2], ignore_index=True).reindex(df1.columns, axis='columns')
預期輸出:
df3
col1 col2
1 4
2 5
3 6
7 10
8 11
9 12
完整追溯:
/Applications/Anaconda/anaconda3/lib/python3.9/site-packages/pandas/core/indexes/base.py in get_indexer(self, target, method, limit, tolerance)
3440
3441 if not self._index_as_unique:
-> 3442 raise InvalidIndexError(self._requires_unique_msg)
3443
3444 if not self._should_compare(target) and not is_interval_dtype(self.dtype):
InvalidIndexError: Reindexing only valid with uniquely valued Index objects
uj5u.com熱心網友回復:
對我來說,正確處理樣本資料。
我嘗試更改引發錯誤的資料,原因是列名稱重復:
df1 = pd.DataFrame({'col1': [1,2,3],
'col2': [4,5,6]
}).rename(columns={'col2':'col1'})
print (df1)
col1 col1 <- col1 is duplicated
0 1 4
1 2 5
2 3 6
df2 = pd.DataFrame({'col1': [7,8,9],
'col2': ['10','11','12'],
'col3': ['13','14','15']
})
# Concat and keep only cols from df1
df3 = pd.concat([df1, df2], ignore_index=True).reindex(df1.columns, axis='columns')
print (df3)
InvalidIndexError:重新索引僅對唯一值的索引物件有效
你可以找到它們:
print (df1.columns[df1.columns.duplicated(keep=False)])
Index(['col1', 'col1'], dtype='object')
print (df2.columns[df2.columns.duplicated(keep=False)])
Index([], dtype='object')
解決方案是對它們進行重復資料洗掉:
print (pd.io.parsers.ParserBase({'names':df1.columns})._maybe_dedup_names(df1.columns))
['col1', 'col1.1']
轉載請註明出處,本文鏈接:https://www.uj5u.com/caozuo/377541.html
上一篇:帶復位的Numpy陣列計數器
