是否有更密集的方法來有條件地分離資料？-有解無憂

我想知道是否有更密集的方法來執行以下操作，它本質上是將列分隔資料按行拆分，并根據行的最終條目分為三個類別之一：

xi_test_0 = [xi_test_sc[i] for i in range(len(xi_test_sc)) if y_test[i] == 0]
xii_test_0 = [xii_test_sc[i] for i in range(len(xii_test_sc)) if y_test[i] == 0]
y_test_0 = [y_test[i] for i in range(len(y_test)) if y_test[i] == 0]

xi_test_1 = [xi_test_sc[i] for i in range(len(xi_test_sc)) if y_test[i] == 1]
xii_test_1 = [xii_test_sc[i] for i in range(len(xii_test_sc)) if y_test[i] == 1]
y_test_1 = [y_test[i] for i in range(len(y_test)) if y_test[i] == 1]

xi_test_2 = [xi_test_sc[i] for i in range(len(xi_test_sc)) if y_test[i] == 2]
xii_test_2 = [xii_test_sc[i] for i in range(len(xii_test_sc)) if y_test[i] == 2]
y_test_2 = [y_test[i] for i in range(len(y_test)) if y_test[i] == 2]```

uj5u.com熱心網友回復：

為每個要中斷的條件創建一組布爾陣列：

split_index = {i: y_test == i for i in range(3)}

根據此布爾索引將資料分組。

xi_test_split = {i: xi_test_sc[idx, :] for i, idx in split_index.items()}
xii_test_split = {i: xii_test_sc[idx, :] for i, idx in split_index.items()}
y_test_split = {i: y_test[idx, :] for i, idx in split_index.items()}

uj5u.com熱心網友回復：

xi_test_0, xi_test_1, 等都由一個運算式初始化，該運算式僅通過一個數字常量不同——所以首先將它們轉換為串列：

xi_test = [
    [xi_test_sc[i] for i in range(len(xi_test_sc)) if y_test[i] == j]
    for j in range(3)
]
xii_test = [
    [xii_test_sc[i] for i in range(len(xii_test_sc)) if y_test[i] == j] 
    for j in range(3)
]
y_test_n = [
    [y_test[i] for i in range(len(y_test)) if y_test[i] == j] 
    for j in range(3)
]

我們還可以通過zipping 我們正在迭代的兩個串列而不是使用來簡化單個串列推導式range：

xi_test = [
    [a for a, b in zip(xi_test_sc, y_test) if b == j]
    for j in range(3)
]
xii_test = [
    [a for a, b in zip(xii_test_sc, y_test) if b == j]
    for j in range(3)
]
y_test_n = [
    [a for a, b in zip(y_test, y_test) if b == j]
    for j in range(3)
]

現在我們要查看的代碼減少了，很容易看出它們也有一個值不同——所以讓我們把整個事情變成一個字典：

test_data = {
    name: [[a for a, b in zip(sc_data, y_test) if b == j] for j in range(3)]
    for name, sc_data in (
        ('xi', xi_test_sc), ('xii', xii_test_sc), ('y', y_test)
    )
}

這與您的原始代碼產生的資料相同，沒有進行所有的復制和粘貼。 xi_test_0現在test_data['xi'][0]，y_test_1現在test_data['y'][1]，等等。

或者，如果您仍然希望將每個串列分配給不同的命名變數，而不是將名稱放在字典中：

xi, xii, y = (
    [[a for a, b in zip(sc_data, y_test) if b == j] for j in range(3)] 
    for sc_data in (xi_test_sc, xii_test_sc, y_test)
)

uj5u.com熱心網友回復：

如果這些都是 numpy 陣列，則可以使用掩碼：

xi_test_0 = xi_test_sc[y_test==0]
y_test_0  = y_test[y_test==0]

xi_test_1  = xi_test_sc[y_test== 1]
xii_test_1 = xii_test_sc[y_test == 1]

... and so on ...

或者，如果您希望結果是具有 0,1,2 insexes 的 2D 矩陣，對應于您的 _0, _1, _2 變數名稱后綴：

mask = y_test == np.arange(3)[:,None]
xi_test_n  = xi_test_sc[mask]
xii_test_n = xii_test_sc[mask]
...

轉載請註明出處，本文鏈接：https://www.uj5u.com/yidong/384714.html

標籤：Python 麻木的分类

上一篇：具有2個不同來源但收斂/共享變數的沖積圖[R]

下一篇：如何在現有的3d陣列上堆疊2d陣列（python）