在python中創建一個wilcoxon秩和測驗的for回圈以生成p值串列？-有解無憂

我有一個遵循這種格式的資料框：

df = pd.DataFrame({'subtype': ['AC', 'SCC', 'SCC', 'AC', 'AC', 'SCC', 'AC'], 
               'geneA': ['0.56', '0.74', '0.89', '0.99', '0.24', '0.76', '0.60'],
               'geneB': ['0.54', '0.73', '0.82', '0.99', '0.23', '0.74', '0.61'],
               'geneC': ['0.53', '0.72', '0.84', '0.97', '0.23', '0.76', '0.62'],
               'geneD': ['0.52', '0.77', '0.89', '0.99', '0.23', '0.75', '0.64'],
               'geneE': ['0.51', '0.77', '0.89', '0.93', '0.23', '0.76', '0.64'],
               'geneF': ['0.50', '0.79', '0.89', '0.96', '0.26', '0.73', '0.65'],
               'geneG': ['0.56', '0.78', '0.89', '0.99', '0.23', '0.76', '0.64']})

它要大得多（它有大約 1000 個基因，即列）。每個數字對應一個 mRNA 豐度值。

我需要使用 Wilcoxon 秩和檢驗比較每個基因的 AC 和 SCC 亞型。我需要對資料集中的每個基因都這樣做，所以我基本上需要這樣做 1000 次。其中 group1 是基因的 AC 亞型的 mRNA 值，group2 是同一基因的 SCC 亞型的 mRNA 值。

import scipy.stats
ranksums(group1, group2)

我需要創建一個 for 回圈，該回圈將使用兩個子型別/組之間的秩和檢驗來比較 mRNA 值：AC 和 SCC，并生成一個 p 值串列。我基本上需要進行 1000 次 wilcoxon 秩和檢驗，以生成我為每個基因計算的長 p 值串列（其中有 1000 個，每列是一個基因）比較 AC 與 SCC。

我怎樣才能在python中實作這一點？這是我在沒有運氣的情況下嘗試過的。

p_vals= []

for i in range(1000):
new_data = subset.copy()
permuted_labels = list(subset['subtype'].sample(n=subset.shape[0], replace=False))
new_data['subtype'] = permuted_labels
group1= new_data.loc[new_data.subtype == 'AC']
group2= new_data.loc[new_data.subtype == 'SCC']
ranksums= ranksums(group1, group2)
p_vals.append(ranksums)

print(p_vals)

我需要做類似的事情，但不是計算 p 值，我需要計算每個基因的 AC 和 SCC 亞型之間平均 mRNA 豐度的倍數變化 (FC)（使用 FC 分子中的 AC 值） . 我需要將秩和檢驗中的基因 FC 和 p 值合并到一個表中。此外，我還需要使用

from statsmodels.stats.multitest import fdrcorrection
fdrcorrection(list_of_pvalues, alpha=0.05, method='indep', is_sorted=False)

def geneFC(df, geneColumnName):
    # function to return fold change for every gene in the matrix

    ac = df[(df['subtype'] == 'AC')]
    scc = df[(df['subtype'] == 'SCC')]

    acGene = ac[geneColumnName]
    sccGene = scc[geneColumnName]

    return acGene.mean()/sccGene.mean()


genes = list(df.columns) # list of genes from df columns
genes.remove('subtype') # removes "subtype" from list

fc_values = [] # list of pvalues to fill
for gene in genes: # loops through list of genes
    fc_values.append(geneFC(df, gene)) # adds FC value of gene to list

uj5u.com熱心網友回復：

我想我有一個可行的解決方案，但我不確定為什么它回傳的 pvalues 完全相同。這是您提供的資料的屬性嗎？

import pandas as pd
from scipy.stats import ranksums

df = pd.DataFrame({'subtype': ['AC', 'SCC', 'SCC', 'AC', 'AC', 'SCC', 'AC'], 
           'geneA': ['0.56', '0.74', '0.89', '0.99', '0.24', '0.76', '0.60'],
           'geneB': ['0.54', '0.73', '0.82', '0.99', '0.23', '0.74', '0.61'],
           'geneC': ['0.53', '0.72', '0.84', '0.97', '0.23', '0.76', '0.62'],
           'geneD': ['0.52', '0.77', '0.89', '0.99', '0.23', '0.75', '0.64'],
           'geneE': ['0.51', '0.77', '0.89', '0.93', '0.23', '0.76', '0.64'],
           'geneF': ['0.50', '0.79', '0.89', '0.96', '0.26', '0.73', '0.65'],
           'geneG': ['0.56', '0.78', '0.89', '0.99', '0.23', '0.76', '0.64']})

def geneRankSum(df, geneColumnName):
    # function to return rank sum for given gene

    ac = df[(df['subtype'] == 'AC')]
    scc = df[(df['subtype'] == 'SCC')]

    acGene = ac[geneColumnName]
    sccGene = scc[geneColumnName]

    return ranksums(acGene, sccGene).pvalue


genes = list(df.columns) # list of genes from df columns
genes.remove('subtype') # removes "subtype" from list

pvalues = [] # list of pvalues to fill
for gene in genes: # loops through list of genes
    pvalues.append(geneRankSum(df, gene)) # adds pvalue of gene to list

def geneFC(df, geneColumnName):
    # function to return fold change for every gene in the matrix

    ac = df[(df['subtype'] == 'AC')]
    scc = df[(df['subtype'] == 'SCC')]

    acGene = ac[geneColumnName]
    sccGene = scc[geneColumnName]

    return acGene.mean()/sccGene.mean()



genes = list(df.columns) # list of genes from df columns

genes.remove('subtype') # removes "subtype" from list
data = df[genes].astype(float)
data['subtype'] = df['subtype']


fc_values = [] # list of pvalues to fill
for gene in genes: # loops through list of genes
    fc_values.append(geneFC(data, gene)) # adds FC value of gene to list

print(fc_values)

轉載請註明出處，本文鏈接：https://www.uj5u.com/net/348116.html

標籤：Python for循环统计数据

上一篇：為什么此引數包中的函式呼叫會向后求值？

下一篇：為什么我在C中的main函式只列印第一個for回圈？