我有這個字串:
''
'Storage:9.22Checkoff:6.90InElevation:0.00OutCharge:0.00Freightother:0.00'
''
所以我需要做的第一件事是以下列形式單獨的值:
''
Storage:9.22 Checkoff:6.90 In_Elevation:0.00 Out_Charge:0.00 Freight_other:0.00
''
我將遍歷具有相似值的多行,因此我必須確保在看到名稱(并且是唯一的)后立即創建一個新列并為該特定行分配我找到的值,所以最后它應該看起來像這樣:
''
----------------------------------------------------------------
| Storage| Checkoff | In_Elevation | Out_Charge | Freight_other|
---------------------------------------------------------------
| 9.22 | 6.90 | 0.00 | 0.00 | 0.00 |
----------------------------------------------------------------
''
我至少已經使用了幾個示例來開始分隔字串,但它并沒有給我真正需要的東西:
這是一:
'''
word = ""
value = ""
for i in range(0, len(df['Original'])):
for j in df['Original'][i]:
if j.isalpha():
word = word j
elif j.isdecimal():
value = value j
elif j.isascii():
#print(j)
None
'''
但這是結果:
'''
StorageCheckoffInElevationOutChargeFreightotherStorageCheckoffMiscellaneousChargesPremiumFreightStorageCheckoffOptionPremiumsforMinimumPriceContractsFITRUCKDiscountsFORAILCarryCostStorageCheckoffFreightInElevationOutChargeFreightotherStorageCheckoffFreightWeighingChgsFORAILCheckoffInElevationOutChargeFreightotherStorageCheckoffFreightMiscellaneousChargesStorageCheckoffInElevationOutChargeFreightotherInElevationOutChargeDiscountsFreightother
922690000000000061014372018602158602167642563191927552232584968331307341840509672628262873068122661185213241367192248181900000000074061234124424074596189800000000000000016635000
'''
對于添加到資料框中的列,我正在使用以下代碼片段:
'''
cols = [i for i in new[0].unique()]
df1 = pd.DataFrame( index=range(len(cols)), columns=cols)
df1
'''
這可能有效,但我仍然需要正確分離字串,我使用的任何方法似乎都沒有真正給我理想的輸出。如果我使用正則運算式,它將單詞與值分開,但是沒有辦法映射哪個值對應于哪個單詞。
一如既往的任何提示,建議將不勝感激。
uj5u.com熱心網友回復:
Series.str.extractall與捕獲組一起使用以獲取以冒號分隔的單詞和數值(允許括號表示負值)。然后,pivot將此DataFrame轉換成適當的格式。由于提取將標簽與值配對,它們甚至可以在單獨的字串中亂序出現,就像我在下面創建的示例中一樣。
樣本資料
import pandas as pd
s = pd.Series(['Storage:9.22Checkoff:6.90InElevation:0.00OutCharge:0.00Freightother:0.00',
'Checkoff:6.97Storage:19.22InElevation:0.00OutCharge:10.00Freightother:56.55',
'Checkoff:(2.00)Storage:19.22InElevation:0.00OutCharge:10.00Freightother:56.55'])
代碼
df = s.str.extractall(r'(.*?):([\(\)0-9.] )').reset_index()
df = df.pivot(index='level_0', columns=0, values=1).rename_axis(index=None, columns=None)
print(df)
# Checkoff Freightother InElevation OutCharge Storage
#0 6.90 0.00 0.00 0.00 9.22
#1 6.97 56.55 0.00 10.00 19.22
#2 (2.00) 56.55 0.00 10.00 19.22
轉載請註明出處,本文鏈接:https://www.uj5u.com/yidong/408951.html
標籤:
