在熊貓資料框中使用np.isclose報告最接近的值-有解無憂

我目前有兩個 DataFrames，一個有一個質量串列（列為 column 'mass_pos'）：

        entry  mass Precursor  Monoisotopic  mass_pos  masses match
0       KGTLP   110     KGTLP        581.69    691.69          True
1       KGTLP   125     KGTLP        581.69    706.69          True
2       KGTLP   133     KGTLP        581.69    714.69          True
3       KGTLP   139     KGTLP        581.69    720.69          True
4       KGTLP   153     KGTLP        581.69    734.69          True
      ...   ...       ...           ...       ...           ...
355675  GTKKP    42     GTKKP        596.70    638.70          True
355676  GTKKP    43     GTKKP        596.70    639.70          True
355677  GTKKP   210     GTKKP        596.70    806.70          True
355678  GTKKP   226     GTKKP        596.70    822.70          True
355679  GTKKP     0     GTKKP        596.70    596.70          True

另一個 DataFrame 如下所示：

如您所見，我曾經np.isclose查看第二個 DataFrame 中是否有一個'mass_pos'值在第一個 DataFrame中的值的某個容差范圍內，然后將布林值附加到第一個df. 我就是這樣做的：

tolerance = tol_in #provides margin of error
match_mass = lambda x: np.any(np.isclose(x, mass_q_sequence['Mass'], atol=tolerance))
df_seq2['masses match'] = df_seq2['mass_pos'].apply(match_mass)
df_seq2 = df_seq2[df_seq2['masses match'] == True] #remove all false rows from df

我開始意識到我需要計算 ppm 誤差，這涉及找到'mass pos'和'mass'值之間的誤差，因此簡單的布爾輸出不再足夠。有沒有辦法報告這些值之間的差異，或者將第二個 df 中的匹配值附加到滿足布林值的第一個 df 中？

本質上，我只需要報告第二個 df 中的哪些值滿足第一個中的布林值。

uj5u.com熱心網友回復：

如果我猜對了，您只想從第二個資料幀中找到最接近的值。

masses = mass_q_sequence['Mass']
mass_pos = df_seq2['mass_pos']
# using broadcasting and finding indices of closest mass for each mass_pos:
closest_mass_indices = np.argmin(np.abs(masses.reshape(1, -1) - mass_pos.reshape(-1, 1)), axis=1) 
df['closest_mass'] = masses[closest_mass_indices]

轉載請註明出處，本文鏈接：https://www.uj5u.com/shujuku/391882.html

標籤：Python 熊猫数据框麻木的

上一篇：修改matplotlib.pyplot圖中的網格

下一篇：Numpypower回傳負值