分離包含小數和單詞的字串，并使用Pandas/Python從該字串中的唯一值創建列-有解無憂

我有這個字串：

''
'Storage:9.22Checkoff:6.90InElevation:0.00OutCharge:0.00Freightother:0.00'
''

所以我需要做的第一件事是以下列形式單獨的值：

''
Storage:9.22 Checkoff:6.90 In_Elevation:0.00 Out_Charge:0.00 Freight_other:0.00
''

我將遍歷具有相似值的多行，因此我必須確保在看到名稱（并且是唯一的）后立即創建一個新列并為該特定行分配我找到的值，所以最后它應該看起來像這樣：

''
----------------------------------------------------------------
| Storage| Checkoff | In_Elevation | Out_Charge | Freight_other| 
---------------------------------------------------------------
|  9.22  |  6.90    |    0.00      |   0.00     |   0.00       |
----------------------------------------------------------------
''

我至少已經使用了幾個示例來開始分隔字串，但它并沒有給我真正需要的東西：

這是一：

'''
word = ""
value = ""

for i in  range(0, len(df['Original'])):
    for j in df['Original'][i]:
        if j.isalpha():
            word = word   j
        elif j.isdecimal():
             value = value   j
        elif j.isascii():
            #print(j)
            None
'''

但這是結果：

'''
StorageCheckoffInElevationOutChargeFreightotherStorageCheckoffMiscellaneousChargesPremiumFreightStorageCheckoffOptionPremiumsforMinimumPriceContractsFITRUCKDiscountsFORAILCarryCostStorageCheckoffFreightInElevationOutChargeFreightotherStorageCheckoffFreightWeighingChgsFORAILCheckoffInElevationOutChargeFreightotherStorageCheckoffFreightMiscellaneousChargesStorageCheckoffInElevationOutChargeFreightotherInElevationOutChargeDiscountsFreightother
922690000000000061014372018602158602167642563191927552232584968331307341840509672628262873068122661185213241367192248181900000000074061234124424074596189800000000000000016635000
'''

對于添加到資料框中的列，我正在使用以下代碼片段：

'''
cols = [i for i in new[0].unique()]
df1 = pd.DataFrame( index=range(len(cols)), columns=cols)
df1
'''

這可能有效，但我仍然需要正確分離字串，我使用的任何方法似乎都沒有真正給我理想的輸出。如果我使用正則運算式，它將單詞與值分開，但是沒有辦法映射哪個值對應于哪個單詞。

一如既往的任何提示，建議將不勝感激。

uj5u.com熱心網友回復：

Series.str.extractall與捕獲組一起使用以獲取以冒號分隔的單詞和數值（允許括號表示負值）。然后，pivot將此DataFrame轉換成適當的格式。由于提取將標簽與值配對，它們甚至可以在單獨的字串中亂序出現，就像我在下面創建的示例中一樣。

樣本資料

import pandas as pd
s = pd.Series(['Storage:9.22Checkoff:6.90InElevation:0.00OutCharge:0.00Freightother:0.00',
               'Checkoff:6.97Storage:19.22InElevation:0.00OutCharge:10.00Freightother:56.55',
               'Checkoff:(2.00)Storage:19.22InElevation:0.00OutCharge:10.00Freightother:56.55'])

代碼

df = s.str.extractall(r'(.*?):([\(\)0-9.] )').reset_index()
df = df.pivot(index='level_0', columns=0, values=1).rename_axis(index=None, columns=None)

print(df)
#  Checkoff Freightother InElevation OutCharge Storage
#0     6.90         0.00        0.00      0.00    9.22
#1     6.97        56.55        0.00     10.00   19.22
#2   (2.00)        56.55        0.00     10.00   19.22

轉載請註明出處，本文鏈接：https://www.uj5u.com/yidong/408951.html

標籤：

上一篇：如何將計算列添加到其他資料框的一系列連接中間的資料框？

下一篇：如果在另一列中找到值，則Pandas交換值