如何在低級列上合并多級列資料框-有解無憂

我有幾個來自資料庫的小資料集，顯示不同生物途徑中的基因。我的最終目標是找出不同資料集中出現的基因。出于這個原因，我嘗試從每個資料集制作多級資料框并將它們合并到一個列中。然而，它似乎一無所獲。

測驗樣本：https : //www.mediafire.com/file/bks9i9unfci0h1f/sample.rar/file

制作多級列：

import pandas as pd

df1 = pd.read_csv("Bacterial invasion of epithelial cells.csv")
df2 = pd.read_csv("C-type lectin receptor signaling pathway.csv")
df3 = pd.read_csv("Endocytosis.csv")

title1 = "Bacterial invasion of epithelial cells"
title2 = "C-type lectin receptor signaling pathway"
title3 = "Endocytosis"

final1 = pd.concat({title1: df1}, axis = 1)
final2 = pd.concat({title2: df2}, axis = 1)
final3 = pd.concat({title3: df3}, axis = 1)

我嘗試使用 pandas.merge() 合并“用戶 ID”列上的資料框：

pd.merge(final1, final2, on = "User ID", how = "outer")

但我得到一個錯誤。我不能使用 droplevel()，因為我需要頂部的標題。所以，我可以看到每個樣本屬于哪個資料集。有什么建議嗎？

uj5u.com熱心網友回復：

既然您想查看哪些基因出現在不同的資料集中，聽起來內部連接可能更有用？將用戶 ID 作為單行索引。

df1 = pd.read_csv("Bacterial invasion of epithelial cells.csv").set_index('User ID')
df2 = pd.read_csv("C-type lectin receptor signaling pathway.csv").set_index('User ID')
df3 = pd.read_csv("Endocytosis.csv").set_index('User ID')

final1 = pd.concat({"Bacterial invasion of epithelial cells": df1}, axis = 1)
final2 = pd.concat({"C-type lectin receptor signaling pathway": df2}, axis = 1)
final3 = pd.concat({"Endocytosis": df3}, axis = 1)

final1.merge(final3, left_index=True, right_index=True)#.merge(final2, left_index=True, right_index=True)

輸出：

    Bacterial invasion of epithelial cells  Endocytosis
    Gene Symbol     Gene Name   Entrez Gene     Score   Gene Symbol     Gene Name   Entrez Gene     Score
User ID                                 
P51636  CAV2    caveolin 2  858     1.3911  CAV2    caveolin 2  858     1.3911
Q03135  CAV1    caveolin 1  857     1.5935  CAV1    caveolin 1  857     1.5935

（我已經注釋掉了第二個合并操作，final2因為它和其他兩個之間沒有任何重疊的基因，但是您可以使用任意數量的資料集重復該程序。）

轉載請註明出處，本文鏈接：https://www.uj5u.com/qukuanlian/370806.html

標籤：Python 熊猫数据框多层次

上一篇：如何合并具有相同列和資料型別的PandasDataFrame

下一篇：如何在pandas資料框中獲取以下列的uniqueId？