用pandas重塑DataFrame（Python）-有解無憂

首先，我的英語不會很完美，對此感到抱歉。

所以我在 python 上使用 pandas。我以多種方式收集按時間戳索引的資料。

這意味著我可以擁有一個具有 2 個可用特性的索引（其他具有 NaN 值的索引，這是正常的）或所有特性，這取決于。

所以我的問題是當我為相同的索引添加一些具有多個值的資料時，請參見下面的示例：

想象一下這是我們要添加新資料的集合：

Index col1 col2
    1   a    A
    2   b    B
    3   c    C

這是我們將添加的資料：

Index new col 
    1      z    
    1      y

然后結果是這樣的：

Index col1 col2 new col
    1   a    A    NaN
    1   NaN  NaN  z
    1   NaN  NaN  y
    2   b    B    NaN
    3   c    C    NaN

因此，我希望結果是：

Index col1 col2 new col1 new col2
    1   a    A    z        y
    2   b    B    NaN      NaN
    3   c    C    NaN      NaN

我希望不是在 1 個功能中有多個索引，而是有多個功能的 1 個索引。

我不知道這是否可以理解。另一種方式是說我想要這個：每個時間戳的值數=特征數而不是=索引數。

非常感謝您的幫助，是否有任何與此問題相關的主題我不知道，請給我一個鏈接。

uj5u.com熱心網友回復：

此解決方案假定您需要添加的資料是一個系列。

原始df：

df = pd.DataFrame(np.random.randint(0,3,size=(3,3)),columns = list('ABC'),index = [1,2,3])

要添加的資料（系列）：

s = pd.Series(['x','y'],index = [1,1])

解決方案：

df.join(s.to_frame()
        .assign(cc = lambda x: x.groupby(level=0)
                .cumcount().add(1))
        .set_index('cc',append=True)[0]
        .unstack()
        .rename('New Col{}'.format,axis=1))

輸出：

   A  B  C New Col1 New Col2
1  1  2  2        x        y
2  0  1  2      NaN      NaN
3  2  2  0      NaN      NaN

uj5u.com熱心網友回復：

替代答案（可能更簡單，可能不那么pythonic）。我認為您需要考慮將寬資料轉換為長資料然后再轉換回來（樞軸和轉置可能是查找此問題的好東西），但我也認為您的問題可能存在一些問題。您在后續陣列的宣告中沒有提到 new col 1 和 new col 2 。

這是我對您的資料框的宣告：

d = {'index': [1, 2, 3],'col1': ['a', 'b', 'c'], 'col2': ['A', 'B', 'C']}
df = pd.DataFrame(data=d)

e1 = {'index': [1], 'new col1': ['z']}
dfe1 = pd.DataFrame(data=e1)

e2 = {'index': [1], 'new col2': ['y']}
dfe2 = pd.DataFrame(data=e2)

它們看起來像這樣：

index   new col1
1       z

還有這個：

index   new col2
1       y

請注意，我將您的新列宣告為資料框的一部分。一旦它們被這樣宣告，它只是一個合并的問題：

dfr = pd.merge(df, dfe, on='index', how="outer")
dfr1 = pd.merge(df, dfe1, on='index', how="outer")
dfr2 = pd.merge(dfr1, dfe2, on='index', how="outer")

輸出如下所示：

    index   col1    col2    new col1    new col2
    1       a       A       z           y
    2       b       B       NaN         NaN
    3       c       C       NaN         NaN

uj5u.com熱心網友回復：

我認為您首先創建第二個資料框的方式可能會出現一個問題。實際上，根據其內容擴展特征的數量是什么使得重新格式化在這里有點煩人（正如您自己看到的那樣，當寫兩個新列名時，這反映了每個時間戳觀察到的特征數量）。

這是另一種解決方案，它試圖在所采取的步驟中比rhug123 的回答更加明確。

# Initial dataFrames
a = pd.DataFrame({'col1':['a', 'b', 'c'], 'col2':['A', 'B', 'C']}, index=range(1, 4))
b = pd.DataFrame({'new col':['z', 'y']}, index=[1, 1])

現在唯一重要的步驟基本上是轉置您的第二個 DataFrame，而在這里您還需要引入兩個新的列名。我們將根據其內容（y，z，...）對第二個資料幀進行分組：

c = b.groupby(b.index)['new col'].apply(list) # this has also one index per timestamp, but all features are grouped in a list

# New column names:
cols = ['New col%d'%(k 1) for in range(b.value_counts().sum())]
# Expanding dataframe "c" for each new column
d = pd.DataFrame(c.to_list(), index=b.index.unique(), columns=cols)

# Merge
a.join(d, how='outer')

輸出：

  col1 col2 New col1 New col2
1    a    A        z        y
2    b    B      NaN      NaN
3    c    C      NaN      NaN

最后，我的答案和 rhug123 的答案都遇到了一個問題，就目前而言，它無法正確處理不同時間戳的另一個特征。不確定OP在這里期望什么。例如，如果b是：

  new col
1       z
1       y
2       x

合并后的輸出將是：

  col1 col2 New col1 New col2
1    a    A        z        y
2    b    B        x     None
3    c    C      NaN      NaN

轉載請註明出處，本文鏈接：https://www.uj5u.com/ruanti/409874.html

標籤：

上一篇：使用Python在<tb>中獲取一個元素

下一篇：如何在java泛型的型別引數中定義型別引數