我有兩個要按以下條件合并的資料框
- 2個資料框之間的一列之間的共同值
.query("CHROM == Chromosome_hg38") - 一個值的區域在另一個值的區域之間
.query('START.between(Gene_start_hg38, Gene_stop_hg38) | END.between(Gene_start_hg38, Gene_stop_hg38)')
這些是我的資料框:
data = [['chr1', 1,10,'Gene1','ID1'], ['chr2',15,20,'Gene2','ID2'],['chr2',21,40,'Gene3','ID3']]
LOUEF = pd.DataFrame(data, columns = ['Chromosome_hg38', 'Gene_start_hg38', 'Gene_stop_hg38','Gene_name','geneID'])
LOUEF
Chromosome_hg38 Gene_start_hg38 Gene_stop_hg38 Gene_name geneID
0 chr1 1 10 Gene1 ID1
1 chr2 15 20 Gene2 ID2
2 chr2 21 40 Gene3 ID3
data2 = [['chr1', 3,11,'location1'], ['chr1',11,17,'location2'],['chr2',20,30,'location3'],['chr2',15,17,'location3'],['chr5',1,19,'location4']]
cnvs = pd.DataFrame(data2, columns = ['CHROM', 'START', 'END','locations'])
CHROM START END locations
0 chr1 3 11 location1
1 chr1 11 17 location2
2 chr2 20 30 location3
3 chr2 15 17 location3
4 chr5 1 19 location4
我正在使用的代碼
cnvs['Gene_name'] = (
cnvs.loc[cnvs['CHROM'].isin(LOUEF['Chromosome_hg38'])]
.reset_index().merge(LOUEF, how='cross', suffixes=('', '_'))
.query("CHROM == Chromosome_hg38")
.query('START.between(Gene_start_hg38, Gene_stop_hg38) | END.between(Gene_start_hg38, Gene_stop_hg38)')
.groupby('index')['Gene_name'].apply(list).str.join(';')
)
有了這段代碼,我得到了這個
CHROM START END locations Gene_name
0 chr1 3 11 location1 Gene1
1 chr1 11 17 location2 NaN
2 chr2 20 30 location3 Gene2;Gene2
3 chr2 15 17 location3 Gene2
4 chr5 1 19 location4 NaN
正如我對gene_name 所做的那樣,我怎樣才能對geneID 做同樣的事情而無需再次重復相同的代碼?
CHROM START END locations Gene_name geneID
0 chr1 3 11 location1 Gene1 ID1
1 chr1 11 17 location2 NaN NaN
2 chr2 20 30 location3 Gene2;Gene2 ID2;ID2
3 chr2 15 17 location3 Gene2 ID2
4 chr5 1 19 location4 NaN NaN
uj5u.com熱心網友回復:
這邊走?
for key in ("Gene_name", "geneID"):
cnvs[key] = (
cnvs.loc[cnvs['CHROM'].isin(LOUEF['Chromosome_hg38'])]
.reset_index().merge(LOUEF, how='cross', suffixes=('', '_'))
.query("CHROM == Chromosome_hg38")
.query('START.between(Gene_start_hg38, Gene_stop_hg38) | END.between(Gene_start_hg38, Gene_stop_hg38)')
.groupby('index')[key].apply(list).str.join(';')
)
轉載請註明出處,本文鏈接:https://www.uj5u.com/net/444910.html
上一篇:帶有時間列的Pandas資料框
下一篇:基于其他列值的熊貓顏色單元格
