我正在嘗試在我的 pandas 資料框中創建一個新列,這將是其他兩列的差異,但新列的值與列值之間的差異有很大不同。我聽說 'float' 值通常不能精確減去,所以我嘗試通過將列的 dtypes 更改為 'int64' 將此處的十進制值轉換為整數(如此處建議的
編輯:這也是我的 CSV 資料的一部分:
names,0_scores,0_logfoldchanges,0_pvals,0_pvals_adj,1_scores,1_logfoldchanges,1_pvals,1_pvals_adj,2_scores,2_logfoldchanges,2_pvals,2_pvals_adj,3_scores,3_logfoldchanges,3_pvals,3_pvals_adj,4_scores,4_logfoldchanges,4_pvals,4_pvals_adj,5_scores,5_logfoldchanges,5_pvals,5_pvals_adj,6_scores,6_logfoldchanges,6_pvals,6_pvals_adj,7_scores,7_logfoldchanges,7_pvals,7_pvals_adj,8_scores,8_logfoldchanges,8_pvals,8_pvals_adj
0610005C13Rik,-0.06806567,-1.3434665,0.9457333570044608,0.9996994148075796,-0.06571575,-2.952315,0.9476041278614572,0.9998906553041256,0.17985639,1.9209933,0.8572653106998014,0.9994124851941415,-0.0023527155,0.85980946,0.9981228063933416,0.9993920957240323,0.0021153346,0.08053488,0.9983122084427253,0.9993417421686092,0.07239167,2.6473796,0.9422902189641795,0.9998255096296015,-0.029918168,-18.44805,0.9761323166853361,0.998901292435457,-0.021452557,-18.417543,0.9828846479876278,0.9994515175269552,-0.011279659,-18.393742,0.9910003250967939,0.9994694916208285
0610006L08Rik,-0.015597747,-15.159286,0.9875553033428832,0.9996994148075796,-0.015243248,-15.13933,0.9878381189626457,0.9998906553041256,-0.008116434,-14.795435,0.9935240935555751,0.9994124851941415,-0.0073064035,-14.765995,0.9941703851753109,0.9993920957240323,-0.0068988753,-14.752146,0.9944955375479378,0.9993417421686092,0.100005075,18.888618,0.9203402935001026,0.9998255096296015,-0.004986361,-14.696446,0.9960214758176429,0.998901292435457,-0.0035754263,-14.665947,0.9971472286106732,0.9994515175269552,-0.0018799432,-14.64215,0.9985000232597367,0.9994694916208285
0610009B22Rik,0.7292792,-0.015067068,0.46583086269639506,0.9070814087688549,0.42489842,0.18173021,0.67091072915639,0.9998906553041256,17.370018,1.0877438,1.3918130408174961e-67,6.801929840389262e-67,-6.5684495,-1.237505,5.084194721546539e-11,3.930798968247645e-10,-5.6669636,-0.42557448,1.4535041077956595e-08,5.6533712043729706e-08,-3.5668032,-0.5939982,0.0003613625764821466,0.001766427013499565,-7.15373,-1.7427195,8.445118618740649e-13,4.689924532441606e-12,-2.6011736,-0.66274893,0.009290541915973735,0.05767076032846401,1.7334439,1.2316034,0.08301681426158236,0.3860271115408991
理想情況下,我想創建一個“diff_logfoldchanges0”列,該列等于“0_logfoldchanges”列中的值減去“1_logfoldchanges”列中的值。在下面的 CSV 資料中,我認為可能是“-1.3434665 - -2.952315”、“-15.159286 - -15.13933”和“-0.015067068 - 0.18173021”。
uj5u.com熱心網友回復:
pd.read_csv默認情況下使用快速但不太精確的方式讀取浮點數
import pandas as pd
import io
csv = """names,0_scores,0_logfoldchanges,0_pvals,0_pvals_adj,1_scores,1_logfoldchanges,1_pvals,1_pvals_adj,2_scores,2_logfoldchanges,2_pvals,2_pvals_adj,3_scores,3_logfoldchanges,3_pvals,3_pvals_adj,4_scores,4_logfoldchanges,4_pvals,4_pvals_adj,5_scores,5_logfoldchanges,5_pvals,5_pvals_adj,6_scores,6_logfoldchanges,6_pvals,6_pvals_adj,7_scores,7_logfoldchanges,7_pvals,7_pvals_adj,8_scores,8_logfoldchanges,8_pvals,8_pvals_adj
0610005C13Rik,-0.06806567,-1.3434665,0.9457333570044608,0.9996994148075796,-0.06571575,-2.952315,0.9476041278614572,0.9998906553041256,0.17985639,1.9209933,0.8572653106998014,0.9994124851941415,-0.0023527155,0.85980946,0.9981228063933416,0.9993920957240323,0.0021153346,0.08053488,0.9983122084427253,0.9993417421686092,0.07239167,2.6473796,0.9422902189641795,0.9998255096296015,-0.029918168,-18.44805,0.9761323166853361,0.998901292435457,-0.021452557,-18.417543,0.9828846479876278,0.9994515175269552,-0.011279659,-18.393742,0.9910003250967939,0.9994694916208285
0610006L08Rik,-0.015597747,-15.159286,0.9875553033428832,0.9996994148075796,-0.015243248,-15.13933,0.9878381189626457,0.9998906553041256,-0.008116434,-14.795435,0.9935240935555751,0.9994124851941415,-0.0073064035,-14.765995,0.9941703851753109,0.9993920957240323,-0.0068988753,-14.752146,0.9944955375479378,0.9993417421686092,0.100005075,18.888618,0.9203402935001026,0.9998255096296015,-0.004986361,-14.696446,0.9960214758176429,0.998901292435457,-0.0035754263,-14.665947,0.9971472286106732,0.9994515175269552,-0.0018799432,-14.64215,0.9985000232597367,0.9994694916208285
0610009B22Rik,0.7292792,-0.015067068,0.46583086269639506,0.9070814087688549,0.42489842,0.18173021,0.67091072915639,0.9998906553041256,17.370018,1.0877438,1.3918130408174961e-67,6.801929840389262e-67,-6.5684495,-1.237505,5.084194721546539e-11,3.930798968247645e-10,-5.6669636,-0.42557448,1.4535041077956595e-08,5.6533712043729706e-08,-3.5668032,-0.5939982,0.0003613625764821466,0.001766427013499565,-7.15373,-1.7427195,8.445118618740649e-13,4.689924532441606e-12,-2.6011736,-0.66274893,0.009290541915973735,0.05767076032846401,1.7334439,1.2316034,0.08301681426158236,0.3860271115408991"""
data = pd.read_csv(io.StringIO(csv))
print(data["0_logfoldchanges"][0]) # -1.3434665000000001 instead of -1.3434665
這種差異很小(小于原始值的萬億分之一),并且通常不可見,因為顯示幕會將其四舍五入,因此在大多數情況下我不會稱其為“顯著”(相對于精度/準確度而言,它可能微不足道輸入資料),但如果您通過在 Python 解釋器中手動鍵入相同的數字來檢查您的計算,它確實會顯示出來。
要更準確地讀取值,請使用float_precision="round_trip":
data = pd.read_csv(io.StringIO(csv), float_precision="round_trip")
減法現在產生預期值(與傳統的 python 減法相同):
difference = data["0_logfoldchanges"] - data["1_logfoldchanges"]
print(difference[0] == -1.3434665 - -2.952315) # checking first row - True
這并不是因為浮點本身不精確,而是特定于 pandas 讀取 CSV 檔案的方式。這是浮點舍入的一個很好的指南。通常,轉換為整數不會有幫助,除非有時在處理貨幣或其他具有精確十進制表示的數量時。
uj5u.com熱心網友回復:
您是否嘗試使用subPandas 的方法?我已經多次對浮點值進行了這些算術運算,沒有任何問題。
請嘗試https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.sub.html
轉載請註明出處,本文鏈接:https://www.uj5u.com/houduan/422947.html
標籤:
上一篇:列操作-python
下一篇:如何正確減去兩個UNIX時間戳列
