使用不同型別的單個單元格陣列清理資料框列-有解無憂

我正在處理具有多列的大型資料框。但是，某些列具有陣列形式的資料，其中包含陣列（單值）。我需要轉換只有單元格值的資料框列，即沒有陣列元素樣式。我嘗試過扁平化，以不同的方式擠壓，但無法以我正在尋找的方式獲得輸出。以下代碼重現了我目前正在使用的資料格式：

import pandas as pd
a = [[[10]],[[20]],[[30]],[[40]]]
b=[[50],[60],[70],[80]]
c=[90,100,110,120]
df = pd.DataFrame(list(zip(a,b,c)),columns=['a','b','c'])
print(df)

上面的輸出是：

        a     b    c
0  [[10]]  [50]   90
1  [[20]]  [60]  100
2  [[30]]  [70]  110
3  [[40]]  [80]  120

但是，我希望得到如下輸出：

    a   b    c
0  10  50   90
1  20  60  100
2  30  70  110
3  40  80  120

如果你能建議如何解決這個問題，那真的很有幫助。

實際資料幀的頭部如下所示：

           acoeff         bcoeff  refdiff  ref18
0  [[0.33907555]]  [11.51908656]    0.000  0.001
1  [[0.34024954]]  [11.45693353]    0.001  0.001
2  [[0.34134777]]  [11.40045124]    0.002  0.001
3  [[0.34297324]]  [11.33036004]    0.004  0.001
4  [[0.34373931]]   [11.2991075]    0.005  0.001

下面給出的字典格式的頭部：

{'acoeff': {0: '[[0.33907555]]', 1: '[[0.34024954]]', 2: '[[0.34134777]]', 3: '[[0.34297324]]', 4: '[[0.34373931]]'}, 'bcoeff': {0: '[11.51908656]', 1: '[11.45693353]', 2: '[11.40045124]', 3: '[11.33036004]', 4: '[11.2991075]'}, 'refdiff': {0: 0.0, 1: 0.001, 2: 0.002, 3: 0.004, 4: 0.005}, 'ref18': {0: 0.001, 1: 0.001, 2: 0.001, 3: 0.001, 4: 0.001}}

uj5u.com熱心網友回復：

字串

剝離[]并轉換為數字：

(df.update(df.select_dtypes(exclude='number')
             .apply(lambda c: pd.to_numeric(c.str.strip('[]'))))
 )
print(df)

真實清單

str您可以使用定位器取消嵌套串列：

df['a'].str[0].str[0]

輸出：

0    10
1    20
2    30
3    40
Name: a, dtype: int64

為了使事情自動化一點，您可以使用遞回函式：

def unnest(x):
    from pandas.api.types import is_numeric_dtype
    if is_numeric_dtype(x):
        return x
    else:
        return unnest(x.str[0])

df2 = df.apply(unnest)

變體使用每個系列的第一項來確定嵌套級別：

def unnest(x):
    from pandas.api.types import is_numeric_dtype
    if len(x)>0 and isinstance(x.iloc[0], list):
        return unnest(x.str[0])
    else:
        return x

df2 = df.apply(unnest)

輸出：

    a   b    c
0  10  50   90
1  20  60  100
2  30  70  110
3  40  80  120

任意嵌套

如果每個單元格都有任意嵌套，則可以對每個元素使用相同的邏輯：

def unnest(x):
    if isinstance(x, list) and len(x)>0:
        return unnest(x[0])
    else:
        return x
    
df2 = df.applymap(unnest)

uj5u.com熱心網友回復：

也許不是最好的解決方案。但它有效。

def ravel_series(s):
    try:
        return np.concatenate(s).ravel()
    except ValueError:
        return s

df.apply(ravel_series)

uj5u.com熱心網友回復：

你可以試試這個

代碼：

def clean(el):
  if any(isinstance(i, list) for i in el):
    return el[0][0]
  elif isinstance(row, list):
    return el[0]

df['a'] = df.a.apply(clean)
df['b'] = df.b.apply(clean)

print(df)

輸出：

    a   b    c
0  10  50   90
1  20  60  100
2  30  70  110
3  40  80  120

轉載請註明出處，本文鏈接：https://www.uj5u.com/qiye/473675.html

標籤：Python 熊猫数据框麻木的

上一篇：如何從16位LE資料中提取這些位？

下一篇：如何避免為大量資料建立索引的“for”回圈