將字串轉換為浮點值-有解無憂

我從 pdf 檔案中復制了一個包含三列的表格。我在此處附上 PDF 的螢屏截圖：

將字串轉換為浮點值

padj列中的值是指數值，但是，當您從 pdf 復制到 excel 然后用 pandas 打開它時，這些是字串或物件資料型別。因此，這些值不能被決議為浮點數或數值。我需要這些值作為浮點數，而不是字串。有人可以幫我一些建議嗎？到目前為止，這是我嘗試過的。

csv然后使用編碼在python中打開excel或檔案escape_unicode以規避UnicodeDecodeError

## open the file
df       = pd.read_csv("S2_GSE184956.csv",header=0,sep=',',encoding='unicode_escape')[["DEGs","LFC","padj"]]

   df.head()
        DEGs   padj          LFC
    0   JUNB    1.5 ×10-8   -1.273329
    1   HOOK2   2.39×10-7   -1.109320
    2   EGR1    3.17×10-6   -4.187828
    3   DUSP1   3.95×10-6   -3.251030
    4   IL6     3.95×10-6   -3.415500
    5   ARL4C   5.06×10-6   -2.147519
    6   NR4A2   2.94×10-4   -3.001167
    7   CCL3L1  4.026×10-4  -5.293694


# Convert the string to float by replacing the x10- with exponential sign 
    df['padj'] = df['padj'].apply(lambda x: (unidecode(x).replace('x10-','x10-e'))).astype(float)
That threw an error,

    ValueError: could not convert string to float: '1.5 x10-e8'

任何建議，將不勝感激。謝謝

uj5u.com熱心網友回復：

將字串轉換為浮點值

uj5u.com熱心網友回復：

如果你想要一個數值矢量解，你可以使用：

df['float'] = (df['padj'].str.extract(r'(\d (?:\.\d ))\s*×10(.?\d )')
 .apply(pd.to_numeric).pipe(lambda d: d[0].mul(10.**d[1]))
)

輸出：

     DEGs        padj       LFC         float
0    JUNB   1.5 ×10-8 -1.273329  1.500000e-08
1   HOOK2   2.39×10-7 -1.109320  2.390000e-07
2    EGR1   3.17×10-6 -4.187828  3.170000e-06
3   DUSP1   3.95×10-6 -3.251030  3.950000e-06
4     IL6   3.95×10-6 -3.415500  3.950000e-06
5   ARL4C   5.06×10-6 -2.147519  5.060000e-06
6   NR4A2   2.94×10-4 -3.001167  2.940000e-04
7  CCL3L1  4.026×10-4 -5.293694  4.026000e-04

中間的：

df['padj'].str.extract('(\d (?:\.\d ))\s*×10(.?\d )')

       0   1
0    1.5  -8
1   2.39  -7
2   3.17  -6
3   3.95  -6
4   3.95  -6
5   5.06  -6
6   2.94  -4
7  4.026  -4

轉載請註明出處，本文鏈接：https://www.uj5u.com/net/521118.html

標籤：Python熊猫数据框麻木的

上一篇：二維陣列的布爾索引

下一篇：熊貓資料框不創建新列