拆分Csv檔案中的列-有解無憂

我有一個 CSV 檔案，它非常混亂。第一列很好，但所有其余資料都在第二列中。所有資料如VariableName1=Variable1, VariableName2=Variable2, VariableName3=Variable3, ... 都在第二列。

<div class="jp-RenderedText jp-OutputArea-output" data-mime-type="text/plain">
<pre>              var1                                            var2  \
1    SfgvbdvbUJ05-1  var3=10,var4=/a/n/anghelo_rujo_edited-...   
2      OLBCANGR15  var3=10,var4=/c/a/cangrande_test.jpg,a...   
3        ZAMdvFIA19  var3=10,var4=/p/i/pierluigi_zampaglion...   
4        VINMUL18  var3=10,var4=/r/u/rudi_vindimian_mulle...   
5        PRACLA16  var3=10,var4=/p/r/pracla16_podere_prad...   
..            ...                                                ...   
175        WALLIM  var3=25,var4=/w/a/walcher_limoncello_w...   
239       SMROS20  var3=10,var4=/s/e/sella_e_mosca_rosato...   
288     SAELAMB19  var3=10,var6=Modena,bottleml=750,box_size=1...   
343        DILABB  var3=40,var4=/d/i/dilabb_distillerie_l...   
357       VANER19  var3=10,var4=/v/a/valdibella_kerasos_v...   

     var4  ...  var9  var10  var11  
1          NaN  ...   NaN           NaN            NaN  
2          NaN  ...   NaN           NaN            NaN  
3          NaN  ...   NaN           NaN            NaN  
4          NaN  ...   NaN           NaN            NaN  
5          NaN  ...   NaN           NaN            NaN  
..         ...  ...   ...           ...            ...  
175        NaN  ...   NaN           NaN            NaN  
239        NaN  ...   NaN           NaN            NaN  
288        NaN  ...   NaN           NaN            NaN  
343        NaN  ...   NaN           NaN            NaN  
357        NaN  ...   NaN           NaN            NaN  


</pre>
</div>

我將第二列作為單獨的新資料并用,. 但我無法將VariableName1=Variable1資料分成VariableName列。

當我使用 String Contains 執行此操作時，我陷入了困境=...。

請幫我。我在處理這個 CSV 時遇到了麻煩。我想要的是在每個列名下都有該值。

var1          var2         var3         var4
ZAMffFIA19     10           2         /a/n/anghelo_rujo_edited...
VINMUfgvL18    25           1         /r/u/rudi_vindimian_mulle...

uj5u.com熱心網友回復：

假設您有這樣的檔案：

123     A=2,B=asdjhf,C=jhdkfhskdf,D=1254
54878754    A=45786,D=asgfd,C=1234

并且您的檔案并不大，您可以迭代地附加到資料框：

df = pd.DataFrame(columns=["sku", "A", "B", "C", "D"])

with open("data_mangled.csv") as f:
    for line in f:
        d = {}
        col1, col2 = line.split()
        d["sku"] = col1
        cols = col2.split(",")
        for item in cols:
            k,v = item.split("=")
            d[k] = v
        for col in df.columns:   # add potentially missing columns as None
            if col not in d:
                d[col] = None
        df = df.append(d, ignore_index=True)
print(df)

這也將處理某些列名在第二位丟失或被切換的情況。

輸出：

        sku      A       B           C      D
0       123      2  asdjhf  jhdkfhskdf   1254
1  54878754  45786    None        1234  asgfd

編輯：對于您的具體資料：

with open("data_real.txt") as f:
    # use the first line as column names in the dataframe
    col_names = f.readline()
    df = pd.DataFrame(columns=col_names.split(","))
    print(col_names)

    for line in f:
        d = {}
        # lines have more than 2 columns, but the trailing values are empty
        # so the format is col1,large_col2,,,,,,,
        col1, *col2 = line.split(",")
        d["sku"] = col1
        for item in col2:
            try:
                if item.strip(): # disregard the empty trailing columns
                    k,v = item.split("=")
                    # we split on comma, so have occasional "
                    k = k.strip('"') 
                    v = v.strip('"')
                    d[k] = v
            except ValueError as e:
                # there is a column value with missing key
                print("Could not assign to column:", d["sku"], item)
        for col in df.columns:
            if col not in d:
                d[col] = None
        df = df.append(d, ignore_index=True)

    print(df)
    df.to_csv("data_parsed.csv") # save

其中一列不是 key=value 格式：無法分配給列：PRACLA16 16 個月少

Note: newer Python versions will complain that append is deprecated, I chose to ignore this here, can be solved by converting the dict to a dataframe and joining the two dataframes.

uj5u.com熱心網友回復：

編輯：使用提取而不是替換：

keys = ['alchool', 'animal', 'alt_image']
for item in keys:
    df[item] = df['data'].str.extract(f'{item}=(.*?)(,|$)')[0]

轉載請註明出處，本文鏈接：https://www.uj5u.com/caozuo/441537.html

標籤：Python 熊猫麻木的 CSV 分裂

上一篇：使用numpy創建具有特定對角線和非對角線元素的NxN矩陣的更好方法

下一篇：Pytorch/NumPy批量子矩陣索引