NumPy常用統計函式
目錄
1.求和函式 numpy.sum(a, axis=None)------a.sum(axis=None)
2.求均值 numpy.mean(a, axis=None)-----a.mean(axis=None)
3.加權平均值numpy.average(a,axis=None,weights=None)
4.標準差numpy.std(a,axis=None)-------a.std(axis=None)
5.方差numpy.var(a,axis=None)--------a.var(axis=None)
6.最大值/最小值 numpy.amin(a,axis=None)----------numpy.min(a,axis=None)-------a.min(axis=None)
7.最小值索引一維下標numpy.argmin(a,axis=None)---------a.argmin(axis=None)
8.最大值索引numpy.argmax(a,axis=None)---------a.argmax(axis=None)
9.原形狀索引下標numpy.unravel_index(index, shape)
10.中位數numpy.median(a,axis=None)
11.最值之差numpy.ptp(a,axis=None)------------a.ptp(a,axis=None)
12.百分位數numpy.percentile(a, q, axis=None)
引入模塊import numpy as np
1.求和
1.numpy.sum(a, axis=None)/a.sum(axis=None)
根據給定軸
axis計算陣列a相關元素之和,axis整數或元組,不指定軸則默認求全部元素之和,若
a的shape為(d0,d1,..,dn),當axis=(m1,m2,...mi)時,回傳結果應是一個shape為(d0,d1,...,dn)-(dm1,dm2,...dmi),每個元素是軸m1,m2,...mi上元素之和
例:
a = np.arange(24).reshape((2, 3, 4))
print("陣列a:\n", a)
print("np.sum(a):", np.sum(a)) # 全部元素和
print("np.sum(a, axis=0):\n", np.sum(a, axis=0)) # 第0軸(最外圍)的元素和
print("np.sum(a, axis=1):\n", np.sum(a, axis=1)) # 第1軸元素和
print("np.sum(a, axis=(0, 1)):\n", np.sum(a, axis=(0, 1))) # 第0軸和第1軸元素之和
print("np.sum(a, axis=(0, 2)):\n", np.sum(a, axis=(0, 2))) # 第0軸和第2軸元素之和
輸出:
陣列a:
[[[ 0 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]]
[[12 13 14 15]
[16 17 18 19]
[20 21 22 23]]]
np.sum(a): 276
np.sum(a, axis=0):
[[12 14 16 18] # 0+12=12 1+13=14 ...
[20 22 24 26] # 4+16=20 5+17=22
[28 30 32 34]]
np.sum(a, axis=1):
[[12 15 18 21] # 0+4+8=12 1+5+9=15 ...
[48 51 54 57]] # 12+16+20=48 13+17+21=51
np.sum(a, axis=(0, 1)):
[60 66 72 78] # 0+4+8+12+16+20=60 1+5+9+13+17+21=66...
np.sum(a, axis=(0, 2)):
[ 60 92 124] # 0+1+2+3+12+13+14+15=60 4+5+6+7+16+17+18+19=92....
2.求均值
2.numpy.mean(a, axis=None)/a.mean(axis=None)`
根據給定軸
axis計算陣列a相關元素的平均值,axis整數或元組,不指定
axis,默認求所有元素平均值,指定axis,求指定軸上元素平均值,若
a的shape為(d0,d1,..,dn),當axis=(m1,m2,...mi)時,回傳結果應是一個shape為(d0,d1,...,dn)-(dm1,dm2,...dmi),每個元素是軸m1,m2,...mi上所有元素的平均值
例:
print("陣列a:\n", a)
print("np.mean(a):", np.mean(a)) # 全部元素的平均值
print("np.mean(a, axis=0):\n", np.mean(a, axis=0)) # 0軸上的平均值
print("np.mean(a, axis=(0, 2)):\n", np.mean(a, axis=(0, 2))) # 0軸和2軸平均值
輸出:
陣列a:
[[[ 0 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]]
[[12 13 14 15]
[16 17 18 19]
[20 21 22 23]]]
np.mean(a): 11.5
np.mean(a, axis=0):
[[ 6. 7. 8. 9.] # (0+12)/2=6 (1+13)/2=7...
[10. 11. 12. 13.] # (4+16)/2=10 (5+17)/2=11...
[14. 15. 16. 17.]] # (8+20)/2=14 (9+21)/2=15..
np.mean(a, axis=(0, 2)):
[ 7.5 11.5 15.5] # (0+1+2+3+12+13+14+15)/2=7.5..
3.numpy.average(a,axis=None,weights=None)
根據給定軸
axis計算陣列a相關元素的加權平均值,
weights是一個權重陣列,形狀應與給定陣列a的shape相同,即:weights.shape=a.shape或者在指定一個軸axis時,weight則應是一個一維陣列,陣列元素個數與指定軸維度數相同,當不指定
weigts時,此時即為求平均值,效果同.mean相同
例:
print("陣列a:\n", a)
print("np.average(a, axis=0):\n", np.average(a, axis=0))
print("np.average(a, axis=0, weights=[10, 1]):\n", np.average(a, axis=0, weights=[10, 1]))
wei = np.random.randint(1, 60, (2, 3, 4 ))
print("權重陣列是:", wei)
print("np.average(a, axis=(0, 2), weights=wei):\n", np.average(a, axis=(0, 2), weights=wei))
輸出:
陣列a:
[[[ 0 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]]
[[12 13 14 15]
[16 17 18 19]
[20 21 22 23]]]
np.average(a, axis=0):
[[ 6. 7. 8. 9.]
[10. 11. 12. 13.]
[14. 15. 16. 17.]]
np.average(a, axis=0, weights=[10, 1]):
[[ 1.09090909 2.09090909 3.09090909 4.09090909] # (0*10+12*1)/(10+1)=1.0909
[ 5.09090909 6.09090909 7.09090909 8.09090909] # (4*10+16*1)/(10+1)=5.0909
[ 9.09090909 10.09090909 11.09090909 12.09090909]]
權重陣列是: [[[37 5 50 9]
[ 9 40 17 42]
[45 4 41 29]]
[[17 24 29 37]
[20 8 14 37]
[ 3 1 48 14]]]
np.average(a, axis=(0, 2), weights=wei):
[ 7.73557692 10.92513369 13.96756757] # (0*37+1*5+2*50+3*9+12*17+13*24+14*29+15*37)/(37+5+50+9+17+24+29+37)=7.7355
3.標準差/方差
4.numpy.std(a,axis=None)/a.std(axis=None) numpy.var(a,axis=None)/a.var(axis=None)
.std(a,axis=None)根據給定軸axis計算陣列a相關元素的總體標準差(要與樣本標準差區分)即:\(\sigma=\sqrt{{\frac 1N}\sum_{i=1}^N(x_i-\overline x)^2}\)
(Standard Deviation)——std標準差,又稱均方差
.var(a,axis=None)根據給定軸axis計算陣列a相關元素的總體方差即:\(\sigma^2={\frac {\sum_{i=1}^N(x_i-\overline x)^2}N}\)
variance——var方差
例
b = np.random.randint(1, 30, (2, 3, 4))
print("陣列b:\n", b)
print("np.std(b, axis=2):\n", np.std(b, axis=2)) # 標準差
print("np.var(b, axis=2):\n", np.var(b, axis=2)) # 方差
輸出:
陣列b:
[[[16 8 27 24]
[12 15 25 8]
[11 19 15 26]]
[[29 15 18 24]
[17 8 4 15]
[ 2 28 10 21]]]
np.std(b, axis=2):
[[7.39509973 6.28490254 5.53962995]
[5.40832691 5.24404424 9.98436277]]
np.var(b, axis=2):
[[54.6875 39.5 30.6875]
[29.25 27.5 99.6875]]
我們來檢驗一下,例如,對2軸中12 15 25 8這組資料進行求標準差:
均值為:\(\overline x=15\)
則樣本標準差為:\(\sigma=\sqrt{\frac {(12-15)^2+(15-15)^2+(25-15)^2+\left(8-15\right)^2}{4}}=\sqrt{39.5}\approx6.284902544988\)
方差為:\(\sigma^2=39.5\)
4.最大值/最小值
numpy.amin(a,axis=None)/numpy.min(a,axis=None)/a.min(axis=None)
回傳軸
axis上的最小值,若不指定軸,默認回傳所有元素最小值
numpy.amax(a,axis=None)/numpy.max(a,axis=None)/a.max(axis=None)
回傳軸
axis上的最大值,若不指定軸,默認回傳所有元素最大值
例:
c = np.random.randint(1, 60, (2, 3, 4))
print("陣列c:\n", c)
print("np.min(c): ", np.min(c))
print("np.amin(c, axis=1):\n", np.amin(c, axis=1))
print("c.min(axis=2): \n", c.min(axis=2))
print("-"*20 + '分割線' + '-'*20)
print("np.max(c): ", np.max(c))
print("np.amax(c, axis=1):\n", np.amax(c, axis=1))
print("c.max(axis=2):\n", c.max(axis=2))
輸出:
陣列c:
[[[15 50 24 6]
[ 2 8 27 53]
[52 23 9 35]]
[[17 38 42 20]
[ 4 32 9 17]
[48 39 17 40]]]
np.min(c): 2
np.amin(c, axis=1):
[[ 2 8 9 6]
[ 4 32 9 17]]
c.min(axis=2):
[[ 6 2 9]
[17 4 17]]
--------------------分割線--------------------
np.max(c): 53
np.amax(c, axis=1):
[[52 50 27 53]
[48 39 42 40]]
c.max(axis=2):
[[50 53 52]
[42 32 48]]
嚴格的說,a.min等并不是NumPy庫的函式
5.最值下標
numpy.argmin(a,axis=None)/a.argmin(axis=None)
回傳陣列指定軸上最小值降成一維后的相對坐標
numpy.argmax(a,axis=None)/a.argmax(axis=None)
回傳陣列指定軸上最大值降成一維后的相對坐標
例:
print("陣列c:\n", c)
print("c.argmax(): ", c.argmax())
print("np.argmax(c, axis=2):\n", np.argmax(c, axis=2))
print("-"*20 + '分割線' + '-'*20)
print("np.argmin(c): ", np.argmin(c))
print("c.argmin(axis=1):\n", c.argmin(axis=1))
輸出:
陣列c:
[[[50 44 13 16]
[26 23 31 35]
[ 5 21 42 8]]
[[ 6 53 10 57]
[14 5 18 38]
[40 31 4 55]]]
c.argmax(): 15 # 降一維后57下標是15
np.argmax(c, axis=2):
[[0 3 2] # 在軸2上,50-0 35-3 42-2 57-3 38-3 55-3
[3 3 3]]
--------------------分割線--------------------
np.argmin(c): 22
c.argmin(axis=1):
[[2 2 0 2]
[0 1 2 1]]
7.numpy.unravel_index(index, shape)
根據
shape將一維下標index轉換成多維下標(對應shape的下標),與6中的argmax,argmin配合使用
例:
print("陣列c:\n", c)
print(np.unravel_index(np.argmax(c), c.shape))
輸出:
[[[22 4 28 56]
[45 34 3 22]
[59 43 43 27]]
[[32 35 47 53]
[ 7 27 41 18]
[40 32 30 43]]]
(0, 2, 0) # 59是陣列最大值,其索引坐標為(0,2,0)
6.中值
8.numpy.median(a,axis=None)
回傳陣列在指定軸上的中位數(中值),若不指定軸,默認回傳全部元素中位數
例:
print("陣列c:\n", c)
print("np.median(c): ", np.median(c))
輸出:
[[[17 59 14 23]
[27 59 6 12]
[43 16 27 17]]
[[12 10 5 17]
[21 55 18 42]
[41 36 40 5]]]
np.median(c): 19.5
7.其他函式
numpy.ptp(a,axis=None)/a.ptp(a,axis=None)
計算指定軸上最大值與最小值的差,若不指定
axis,默認為全部元素
例:
print("np.ptp(c): ", np.ptp(c))
print("c.ptp(axis=1):\n", c.ptp(axis=1))
輸出:
陣列c:
[[[35 28 18 38]
[44 56 7 24]
[ 4 59 2 24]]
[[55 56 5 27]
[18 44 22 1]
[ 3 30 20 43]]]
np.ptp(c): 58 # 59-1=58
c.ptp(axis=1):
[[40 31 16 14] # 44-4=40 59-28=31 ...
[52 26 17 42]]
numpy.percentile(a, q, axis=None)
a:輸入陣列q:要計算的百分位數,在0~100之間axis:計算百分位數的軸
回傳一個數,滿足至少有q%的數小于或等于該值,且至少有(100-q)%的數大于或等于該值,
例:
d = np.random.randint(1, 40, (2, 5))
print("陣列d:\n", d)
print("np.percentile(d, 40): ", np.percentile(d, 40))
print("np.percentile(d, 40, axis=1):\n", np.percentile(d, 40, axis=1))
輸出:
陣列d:
[[39 15 35 17 39]
[20 12 36 19 10]]
np.percentile(d, 40): 18.200000000000003
np.percentile(d, 40, axis=1):
[27.8 16.2]
很多函式引數串列中都有keepdims=False,keepdims是保持陣列維度特性,如果keepdims為True,則回傳仍會用多維陣列[]包含
參考資料
奇客谷——NumPy統計函式
官方檔案—統計函式
官方檔案—排序、搜索和計數
轉載請註明出處,本文鏈接:https://www.uj5u.com/houduan/184542.html
標籤:Python
上一篇:flask
下一篇:Tornado
