對于二進制陣列sum(array)和numpy.count_nonzero(array)當陣列為uint8時，對于大陣列給出不同的答案。為什么？-有解無憂

我有填充 1 和 0 的 3D 陣列（通過 pyellipsoid 創建）。陣列是 uint8。我想知道 1 的數量。我使用 sum(sum(sum(array))) 來執行此操作，它適用于小型陣列（最多約 5000 個條目）。對于已知數量的非零條目，我將 sum(sum(sum(array))) 與 numpy.count_nonzero(array) 進行了比較。對于更大的陣列，“sum”的答案總是錯誤的并且低于應有的值。

如果我使用 float64 陣列，它適用于大陣列。如果我將資料型別更改為 uint8 它不起作用。

這是為什么？我確信有一個非常簡單的原因，但我找不到答案。

小陣列示例：

test = numpy.zeros((2,2,2))
test[0,0,0] = 1  
test[1,0,0] = 1
In: test
Out: 
array([[[1., 0.],
        [0., 0.]],
In: sum(sum(sum(test)))
Out: 2.0

大例子（8000 個條目，只有一個零，7999 個）：

test_big=np.ones((20,20,20))
test_big[0,0,0] = 0
test_big
Out[77]: 
array([[[0., 1., 1., ..., 1., 1., 1.],
        [1., 1., 1., ..., 1., 1., 1.],
        [1., 1., 1., ..., 1., 1., 1.],
        ...,
        [1., 1., 1., ..., 1., 1., 1.],
        [1., 1., 1., ..., 1., 1., 1.],
        [1., 1., 1., ..., 1., 1., 1.]],

       [[1., 1., 1., ..., 1., 1., 1.],
        [1., 1., 1., ..., 1., 1., 1.],
        [1., 1., 1., ..., 1., 1., 1.],
        ...,
        [1., 1., 1., ..., 1., 1., 1.],
        [1., 1., 1., ..., 1., 1., 1.],
        [1., 1., 1., ..., 1., 1., 1.]],

       [[1., 1., 1., ..., 1., 1., 1.],
        [1., 1., 1., ..., 1., 1., 1.],
        [1., 1., 1., ..., 1., 1., 1.],
        ...,
        [1., 1., 1., ..., 1., 1., 1.],
        [1., 1., 1., ..., 1., 1., 1.],
        [1., 1., 1., ..., 1., 1., 1.]],

       ...,

       [[1., 1., 1., ..., 1., 1., 1.],
        [1., 1., 1., ..., 1., 1., 1.],
        [1., 1., 1., ..., 1., 1., 1.],
        ...,
        [1., 1., 1., ..., 1., 1., 1.],
        [1., 1., 1., ..., 1., 1., 1.],
        [1., 1., 1., ..., 1., 1., 1.]],

       [[1., 1., 1., ..., 1., 1., 1.],
        [1., 1., 1., ..., 1., 1., 1.],
        [1., 1., 1., ..., 1., 1., 1.],
        ...,
        [1., 1., 1., ..., 1., 1., 1.],
        [1., 1., 1., ..., 1., 1., 1.],
        [1., 1., 1., ..., 1., 1., 1.]],

       [[1., 1., 1., ..., 1., 1., 1.],
        [1., 1., 1., ..., 1., 1., 1.],
        [1., 1., 1., ..., 1., 1., 1.],
        ...,
        [1., 1., 1., ..., 1., 1., 1.],
        [1., 1., 1., ..., 1., 1., 1.],
        [1., 1., 1., ..., 1., 1., 1.]]])
In: sum(sum(sum(test_big)))
Out: 7999.0

So far so good. Here, the data type of the sum output is float64. But if I now change the data type of the array to the type that is used with pyellipsoid (uint8)...

In: test_big = test_big.astype('uint8')
In: sum(sum(sum(test_big)))
Out: 2879

So obviously 2879 is not 7999. Here, the data type of the sum output is int32 (-2147483648 to 2147483647) so this should be big enough for 7999, right...? I guess it has something to do with the data type, but how? Why?

Any answer would be appreciated. This is not urgent. I am just curious what I am missing. (It's my first post, so I hope this is understandable). Thanks!

(I am using spyder in anaconda on windows if that is of any help.)

uj5u.com熱心網友回復：

問題正如您所猜測的那樣 - 存在整數溢位。如果你看一下 sum(sum(test_big)) 你會注意到那里的值是錯誤的。

錯誤的部分是整數溢位可能發生在您sum()獲取部分和的函式中

我的建議是使用這個陣列求和，np.sum()因為它確實給出了一個適當的總和，盡管資料型別如何

轉載請註明出處，本文鏈接：https://www.uj5u.com/qiye/424159.html

標籤：python arrays numpy sum

上一篇：Numpy陣串列示每十個值

下一篇：根據其他列的條件添加和更新熊貓列