對每組相同的整數應用cumcount-有解無憂

假設我有以下升序整數陣列（有些可能是負數）：

a = np.array([ 1,  1,  1,  1, 10, 10, 20, 20, 20, 30, 40, 40, 40, 40])

我想把它變成這樣：

a = np.array([ 1,  2,  3,  4, 10, 11, 20, 21, 22, 30, 40, 41, 42, 43])

...其中每組相同整數中的每個整數都會遞增，因此對于第一個 1：

  1 1 1 1  <--- these are the numbers from the array
  0 1 2 3  <--- these are counts of the number for its group
  -------
  1 2 3 4

有沒有比下面更有效的方法來做到這一點？

a = np.array([ 1,  1,  1,  1, 10, 10, 20, 20, 20, 30, 40, 40, 40, 40])
ones = (a == np.pad(a, (1,0))[:-1]).astype(int)
ones[ones == 0] = -np.diff(np.concatenate(([0.], np.cumsum(ones != 0)[ones == 0])))
new_a = a   ones.cumsum()

請注意，陣列將始終按升序排列（從低到高），并且數字始終是整數，有些可能是負數。

解釋，如果你不明白：

在這篇文章的幫助下，我實際上已經開始作業了。我現在正在做的是生成一個這樣的陣列，其中 0 標記一組相同數字中的第一個，1 標記其余部分：

1  1  1  1 10 10 20 20 20 30 40 40 40 40
0  1  1  1  0  1  0  1  1  0  0  1  1  1
^ first 1   ^ first 10     ^ first 30
                  ^ first 20  ^ first 40

然后使用上面鏈接的帖子中的技術來累計該陣列中的所有內容：

# Shift `a` by one and compare it with the original array
>>> ones = (a == np.pad(a, (1,0))[:-1]).astype(int)
>>> ones
array([0, 1, 1, 1, 0, 1, 0, 1, 1, 0, 0, 1, 1, 1])

# This line is from the linked post (modified, of course)
>>> ones[ones == 0] = -np.diff(np.concatenate(([0.], np.cumsum(ones != 0)[ones == 0])))
>>> ones
array([ 0,  1,  1,  1, -3,  1, -1,  1,  1, -2,  0,  1,  1,  1])

>>> ones.cumsum()
array([0, 1, 2, 3, 0, 1, 0, 1, 2, 0, 0, 1, 2, 3])

現在，我們可以將結果陣列添加到原始陣列中：

>>> a
array([ 1,  1,  1,  1, 10, 10, 20, 20, 20, 30, 40, 40, 40, 40])

>>> a   ones.cumsum()
array([ 1,  2,  3,  4, 10, 11, 20, 21, 22, 30, 40, 41, 42, 43])

uj5u.com熱心網友回復：

使用np.unique可能更優雅一點：

u, i = np.unique(a, return_index=True)   # Indices where the sums restart
b = np.ones_like(a)
b[i] = u
b[i[1:]] -= np.add.reduceat(b, i)[:-1]   # Subtract the sum of the prior region from the next
result = b.cumsum()

由于陣列已經排序，您可以快捷方式到該部分np.unique：

i = np.r_[0, np.flatnonzero(np.diff(a))   1]  # Get the indices directly from the diff
b = np.ones_like(a)
b[i] = a[i]
b[i[1:]] -= np.add.reduceat(b, i)[:-1]
result = b.cumsum()

但是等等，每個區域的總和就是長度加上起始值減去一。這消除了求和兩次的需要：

i = np.r_[0, np.flatnonzero(np.diff(a))   1]
b = np.ones_like(a)
b[i] = a[i]
b[i[1:]] -= np.diff(i)   a[i[:-1]] - 1  # Simpler way to sum the prior region
result = b.cumsum()

您可以進一步簡化一點。鑒于這a[i[k]]是運行的開始，a[i[k] - 1]與a[i[k - 1]]. 換句話說，上一次運行的開始與上一次運行中的最后一個元素相同：

d = np.diff(a)
i = np.r_[0, np.flatnonzero(d)   1]
b = np.ones_like(a)
b[0] = a[0]
b[i[1:]] = d[i[1:] - 1] - np.diff(i)   1 # Current region minus prior, reusing diff
result = b.cumsum()

最后兩個版本中的任何一個都應該比您目前正在做的更好。

上面的代碼是為了簡單和速度而撰寫的。如果你想讓它更短更難以辨認，并且你使用的是 Python 3.8 ，你可以開始拋出 walrus 運算子：

i = np.r_[0, np.flatnonzero(d := np.diff(a))   1]
(b := np.ones_like(a))[0] = a[0]
b[i[1:]] = d[i[1:] - 1] - np.diff(i)   1
result = b.cumsum()

由于 walrus 從左到右計算，您可以創建一個最后的諷刺：

(b := np.ones_like(a))[0] = a[0]
b[(i := np.r_[0, np.flatnonzero(d := np.diff(a))   1])[1:]] = d[i[1:] - 1] - np.diff(i)   1
result = b.cumsum()

另一種方法類似：

(b := np.ones_like(a))[i := np.r_[0, np.flatnonzero(np.diff(a))   1]] = a[i]
b[i[1:]] -= np.diff(i)   a[i[:-1]] - 1
result = b.cumsum()

uj5u.com熱心網友回復：

我不確定這是否非常有效，但它是單行的：

np.hstack([x   np.r_[:x.size] for x in np.split(a, np.flatnonzero(np.diff(a)) 1)])

轉載請註明出處，本文鏈接：https://www.uj5u.com/caozuo/400514.html

標籤：Python 数组麻木的

上一篇：在任意位置對ndarray中的多行進行矢量化切片

下一篇：如何根據前N行計算Pandas資料框列的斜率