根據行名有條件地減去PandasDataframe行-有解無憂

我正在處理一個大型資料集，但可以使用以下較小的資料集來總結問題：

import pandas as pd
df = pd.DataFrame({"Filename":["fileName1_uniqueTag1", "fileName2_uniqueTag1", "fileName3_uniqueTag1", "fileName1_uniqueTag2", "fileName2_uniqueTag2", "fileName3_uniqueTag2"], 
                   "measurement":[1336.564888, 1090.852579, 990.320323, 1202.522612, 1098.045258, 923.600277],})
print(df)
>>>
               Filename  measurement
0  fileName1_uniqueTag1  1336.564888
1  fileName2_uniqueTag1  1090.852579
2  fileName3_uniqueTag1   990.320323
3  fileName1_uniqueTag2  1202.522612
4  fileName2_uniqueTag2  1098.045258
5  fileName3_uniqueTag2   923.600277

在“檔案名”列中有三個不同的檔案名，每個檔案名有兩個唯一的標簽。目標是計算每個檔案的 uniqueTag1/uniqueTag2 測量值的比率。結果應該是這樣的：

    Filename  uniqueTag2/uniqueTag1
0  fileName1               0.899711
1  fileName2               1.006593
2  fileName3               0.932627

我可以使用以下方法列出三個不同的檔案名和兩個不同的標簽：

nameList = df["Filename"].tolist()
fileNames = []                              #empty list to fill with different base file names
uniqueTags = []                             #empty list to fill with unique tags
for name in nameList:                       #iterate through list of full file names
    subStrings = name.split("_")            #splits each base file name at the underscore
    if subStrings[0] not in fileNames:      #if the base file name isn't already in the file names list...
        fileNames.append(subStrings[0])     #append it
    if subStrings[1] not in uniqueTags:     #if the unique tag isn't already in the unique tags list...
        uniqueTags.append(subStrings[1])    #append it

我認為我可以通過將檔案名變成索引并使用 df.at() 來訪問單個測量值，但這看起來非常混亂，我確信必須有更好的方法來使用 Pandas 中的功能來做到這一點。有什么建議？

uj5u.com熱心網友回復：

你可以做這樣的事情，這很簡單，使用str.split()：

df[['Filename','uniquetag']] = df['Filename'].str.split('_', expand=True)
tag1 = df.loc[df['uniquetag'] == 'uniqueTag1'].set_index('Filename')['measurement']
tag2 = df.loc[df['uniquetag'] == 'uniqueTag2'].set_index('Filename')['measurement']
tag2 / tag1

uj5u.com熱心網友回復：

嘗試

df[['one','two']] = df.filename.str.split("_",expand=True)

然后groupby使用那兩列

轉載請註明出處，本文鏈接：https://www.uj5u.com/qiye/343419.html

標籤：Python 熊猫数据框

上一篇：洗掉多索引是某個數字的行

下一篇：將資料幀與Pandas合并