根據numpy陣列中的布林值從串列中選擇值-有解無憂

對于 arrayb中的每個子串列，從串列中回傳a與子串列中正布林值相同位置的值b（即 True）。

import pandas as pd
import numpy as np

a = pd.Series([1, 3, 5, 7, 9])  # values to choose from
b = np.array([[False, True, False, True, False],  # based on bools
              [False, False, False, False, False]])

out = []
for i, v in enumerate(b):
    out.append([])
    for j in range(len(e)):
        if v[j]:
            out[i].append(a[j])

out = np.array(out)  # np.array([[3,7],[]])  # result

# In first sublist, True is on index 1 and 3 which corresponds to values 3 and 7.
# In second sublist, there is not True, hence empty.

以上似乎太費力了，它可能沒有使用 numpy 矢量化（在大資料上速度很慢）。

uj5u.com熱心網友回復：

你Series是 1d；b是一個二維陣列。也有行索引，而Series普通陣列沒有。

In [70]: a.shape, b.shape
Out[70]: ((5,), (2, 5))

In [71]: a
Out[71]: 
0    1
1    3
2    5
3    7
4    9
dtype: int64

我們可以使用b形狀為 (5,) 的 1d 行從中選擇元素a：

In [72]: a[b[0,:]]
Out[72]: 
1    3
3    7
dtype: int64

In [73]: a[b[1,:]]
Out[73]: Series([], dtype: int64)

由于行產生不同長度的結果，我們不能一步完成選擇。 a[b]給出錯誤，(5,) 和 (2,) 之間不匹配。

a使用1d的陣列版本可能更簡單，但沒有行索引：

In [103]: A = a.to_numpy(); A
Out[103]: array([1, 3, 5, 7, 9], dtype=int64)

將一行b應用于索引：

In [104]: A[b[0]]
Out[104]: array([3, 7], dtype=int64)

并迭代地對所有行執行此操作：

In [105]: [A[row] for row in b]
Out[105]: [array([3, 7], dtype=int64), array([], dtype=int64)]

我們可以從中創建一個 (2,5) 陣列A，并應用b布爾掩碼 - 但結果將是 1d，沒有跡象表明第二行沒有選擇任何內容：

In [106]: np.vstack((A,A))
Out[106]: 
array([[1, 3, 5, 7, 9],
       [1, 3, 5, 7, 9]], dtype=int64)

In [107]: np.vstack((A,A))[b]
Out[107]: array([3, 7], dtype=int64)

用一行b或b本身進行索引就是我所說的“整個陣列”操作。但是b不能那樣使用單獨的行；它需要 Python 級別的迭代。

還有其他一些使用Aand的方法b：

乘法有效，其中b被視為 0 和 1 的陣列：

In [111]: A*b
Out[111]: 
array([[0, 3, 0, 7, 0],
       [0, 0, 0, 0, 0]], dtype=int64)

還有一個masked array陣列的子類：

In [112]: np.ma.masked_array(np.vstack((A,A)),~b)
Out[112]: 
masked_array(
  data=[[--, 3, --, 7, --],
        [--, --, --, --, --]],
  mask=[[ True, False,  True, False,  True],
        [ True,  True,  True,  True,  True]],
  fill_value=999999,
  dtype=int64)

[105] 陣列串列可以變成一個objectdtype 陣列：

In [115]: np.array([A[row] for row in b],object)
Out[115]: array([array([3, 7], dtype=int64), array([], dtype=int64)], dtype=object)

這是 1d，形狀為 (2,)。有時它很有用，但在性能方面，它并不是對串列的改進。

uj5u.com熱心網友回復：

您可以簡單地使用：

a2 = a.to_numpy()
out = [a2[x] for x in b]

輸出：[array([3, 7]), array([], dtype=int64)]

uj5u.com熱心網友回復：

或者只使用陣列 b 作為掩碼，例如：

out = a[b[0]].to_numpy()

轉載請註明出處，本文鏈接：https://www.uj5u.com/yidong/519648.html

標籤：Python数组熊猫麻木的表现

上一篇：從較大的陣列中切出較小的陣列形狀

下一篇：避免python中的隨機矩陣乘法溢位