學習Coursera上Mathematics for Machine Learning Specialization后所做的筆記與整理

文章目錄

第一部分 Linear Algebra 線性代數
- 1. Vector operations 矢量運算
- - 1.1 dot or inner product 點積/數量積/內積
  - 1.2 scalar and vector projection 投影
- 2. basis 基
- 3. Matrices 矩陣
- - 矩陣與向量相乘
- 4. change of basis 基變換/坐標變換
- 5. Gram-Schmidt process for constructing an orthonoral basis 用格拉姆-施密特正交化構建正交基
- 6. Transformation in a Plane or other object
- 7. Eigenstuff 特征分解
- 8. PageRank
第二部分 Multivariate 多元微積分
- 1. Defenition of a derivative 積分定義
- 2. Time saving rules
- 3. Derivatives of named functions 常見函式的導數
- 4. Derivative structures
- 5. Taylor Series 泰勒展開式
- - 5.1 Maclaurin 麥克勞林展開式
  - 5.2 泰勒展開式
- 6. Optimization and Vector Calculus
第三部分 PCA (Principal Component Analysis) 主成分分析
- 1. 1-D datasets 一維資料
- 2. Definite symmetric matrix
- 3. higher-dimensional datasets 高維資料
- 4. Effect of Linear Transformations 線性變換對均值與方差對影響
- 5. Dot product 點積
- 6. Inner product 內積
- - 點積向其它資料型別對拓展
- 7. Projection 投影
- - 7.1 Projection onto 1D subspaces
  - 7.2 Projection onto k k k-dimensional subspaces
- 8. PCA derivation 主成分分析推導
- - 8.1 Setting up ( X n = ∑ i = 1 D β i n b i X_n=\sum_{i=1}^D\beta_{in}b_i Xn?=∑i=1D?βin?bi?, X n ~ = ∑ i = i M β i n b i \tilde{X_n} = \sum_{i=i}^M\beta_{in}b_i Xn?~?=∑i=iM?βin?bi?, J = 1 N ∥ X n ? X n ~ ∥ 2 \mathbf{J} =\frac{1}{N}\|X_n-\tilde{X_n}\|^2 J=N1?∥Xn??Xn?~?∥2, S = 1 N ∑ n = 1 N X n X n T \mathrm{S}=\frac{1}{N}\sum_{n=1}^N X_nX_n^T S=N1?∑n=1N?Xn?XnT?)
  - 8.2 got coordinate/code β i n \beta_{in} βin? ( β i n = X n T b i \beta_{in}=X_n^Tb_i βin?=XnT?bi?)
  - 8.3 rewrite the formula ( X n ? X n ~ = ∑ i = M + 1 D ( b i T X n ) b i X_n-\tilde{X_n}=\sum_{i=M+1}^D (b_i^T X_n) b_i Xn??Xn?~?=∑i=M+1D?(biT?Xn?)bi?)
  - 8.4 redefine J = B ′ B ′ T S \mathrm{J} = B'B'^TS J=B′B′TS
  - 8.5 solve b i b_i bi?
- 9. Key steps of PCA algorithm
- - 9.1 *zscore* transformation
  - 9.2 Projection matrix computation
  - 9.3 Projection
- 10. PCA in high dimensions 高維資料的優化

第一部分 Linear Algebra 線性代數

1. Vector operations 矢量運算

commutative 交換律: r + s = s + r \text{commutative 交換律:} \quad r + s = s + r commutative 交換律:r+s=s+r
2 r = r + r 2r = r + r 2r=r+r
∥ r ∥ 2 = ∑ i r i 2 \|r\|^2 = \sum_{i} r_i^2 ∥r∥2=∑i?ri2?

1.1 dot or inner product 點積/數量積/內積

點積是一種特殊的內積
r ? s = ∑ i r i s i r \cdot s = \sum_{i} r_i s_i r?s=i∑?ri?si?

commutative 交換律: r ? s = s ? r \text{commutative 交換律:} \quad r \cdot s = s \cdot r commutative 交換律:r?s=s?r
distributive 分配律: r ? ( s + t ) = r ? s + r ? t \text{distributive 分配律:} \quad r \cdot (s + t) = r \cdot s + r \cdot t distributive 分配律:r?(s+t)=r?s+r?t
associative 結合律 r ? ( a s ) = a ( r ? s ) \text{associative 結合律} \quad r \cdot (a s) = a(r \cdot s) associative 結合律r?(as)=a(r?s)
r ? r = ∥ r ∥ 2 r \cdot r = \|r\|^2 r?r=∥r∥2
r ? s = ∥ r ∥ ∥ s ∥ cos ? θ r \cdot s = \|r\| \|s\| \cos \theta r?s=∥r∥∥s∥cosθ

1.2 scalar and vector projection 投影

scalar projection 投影/標量投影
例：向量s在向量r上的投影 r ? s ∥ r ∥ \frac{r \cdot s}{\|r\|} ∥r∥r?s?
vector projection 矢量投影
例：向量s在向量r上的投影 r ? s r ? r r \frac{r \cdot s} {r \cdot r} r r?rr?s?r

2. basis 基

A basis is a set of n n n vectors that:

are not linear combinations of each other
span the space
The Space is then n-dimensional.

在線性空間 V V V中，如果存在 n n n個元素 a 1 , a 2 , … , a n a_1,a_2,\dots,a_n a1?,a2?,…,an?，滿足：

a 1 , a 2 , … , a n a_1,a_2,\dots,a_n a1?,a2?,…,an?線性無關；
V V V中任一元素 a a a總可由 a 1 , a 2 , … , a n a_1,a_2,\dots,a_n a1?,a2?,…,an?線性表示，

那么， a 1 , a 2 , … , a n a_1,a_2,\dots,a_n a1?,a2?,…,an?就稱為線性空間 V V V的一個基， n n n稱為線性空間 V V V的維數，只含有一個零元素的線性空間沒有基，規定它的維數為0.
維數為 n n n的線性空間稱為 n n n維線性空間，記作 V n V_n Vn?，
（同濟大學線性代數第五版第六章第二節）

3. Matrices 矩陣

由 m × n m \times n m×n個數 a i j ( i = 1 , 2 , ? ? , m ; j = 1 , 2 , … , n ) a_{ij}(i=1,2,\cdots,m;j=1,2,\dots,n) aij?(i=1,2,?,m;j=1,2,…,n)排成的 m m m行 n n n列的數表稱為 m m m行 n n n列矩陣，簡稱 m × n m \times n m×n矩陣，記作
A = ( a 11 a 12 ? a 1 n a 21 a 22 ? a 2 n ? ? ? ? a m 1 a m 2 ? a m n ) A = \begin{pmatrix} a_{11} & a_{12} & \cdots & a_{1n} \\ a_{21} & a_{22} & \cdots & a_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ a_{m1} & a_{m2} & \cdots & a_{mn} \end{pmatrix} A=??????a11?a21??am1??a12?a22??am2???????a1n?a2n??amn????????
（同濟大學線性代數第五版第二章第一節）

矩陣與向量相乘

[ a b c d ] [ e f ] = [ a e + b f c e + d f ] \begin{bmatrix} a & b \\ c & d \end{bmatrix} \begin{bmatrix} e \\ f \end{bmatrix} = \begin{bmatrix} ae + bf \\ ce + df \end{bmatrix} [ac?bd?][ef?]=[ae+bfce+df?]

向量與矩陣相乘可以理解為: 向量 r r r經過矩陣A變換為 r ′ A r = r ′ r' \quad Ar=r' r′Ar=r′
A ( n r ) = n ( A r ) = n r ′ A(nr) = n(Ar) = nr' A(nr)=n(Ar)=nr′
分配律 A ( r + s ) = A r + A s \quad A(r + s) = Ar + As A(r+s)=Ar+As
Identity 單位矩陣
I = [ 1 0 0 1 ] I = \begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix} I=[10?01?]
clockwise rotation by θ \theta θ 順時針旋轉 θ \theta θ角度
[ cos ? θ sin ? θ ? sin ? θ cos ? θ ] \begin{bmatrix} \cos \theta & \sin \theta \\ - \sin \theta & \cos \theta \end{bmatrix} [cosθ?sinθ?sinθcosθ?]
determinant of 2×2 matrix 行列式
∣ A ∣ = d e t A = d e t [ a b c d ] = a d ? b d |A| = det A= det \begin{bmatrix} a & b \\ c & d \end{bmatrix} = ad - bd ∣A∣=detA=det[ac?bd?]=ad?bd
inverse of 2×2 matrix 逆矩陣
[ a b c d ] ? 1 = 1 a d ? b c [ d ? b ? c a ] \begin{bmatrix} a & b \\ c & d \end{bmatrix}^{-1} = \frac {1} {ad -bc} \begin{bmatrix} d & -b \\ -c & a \end{bmatrix} [ac?bd?]?1=ad?bc1?[d?c??ba?]
summation convention for multiplying matrices A A A and B B B
A B = C AB=C AB=C
c i k = a b i k = ∑ j a i j b j k c_{ik} = ab_{ik} = \sum_j a_{ij}b_{jk} cik?=abik?=j∑?aij?bjk?

4. change of basis 基變換/坐標變換

Change from an original basis to a new, pried basis. The columns of the transformation matrix B B B are the new basis vectors in the original coordinate system. So
B r ′ = r Br' = r Br′=r
where r ′ r' r′ is the vector in the B B B-basis, and r r r is the vector in the original basis. Or;
r ′ = B ? 1 r r' = B^{-1}r r′=B?1r
If a matrix A A A is orthonormal (all the columns are of unit size and orthogonal to each other) then
矩陣 A A A是正交矩陣（正交陣）的充分必要條件是 A A A等列向量都是單位向量，且兩兩正交，
A T = A ? 1 A^T = A^{-1} AT=A?1
即
A T A ? 1 = E A^TA^{-1}=E ATA?1=E
即
[ a 1 T a 2 T … a n T ] ( a 1 , a 2 , … , a n ) = E \begin{bmatrix} a_1^T \\ a_2^T \\ \dots \\ a_n^T \end{bmatrix} (a_1,a_2,\dots,a_n)=E ?????a1T?a2T?…anT???????(a1?,a2?,…,an?)=E
也即
( a i T a j ) = ( δ i j ) (a_i^Ta_j) = (\delta_{ij}) (aiT?aj?)=(δij?)
相當于 n 2 n^2 n2個關系式
a i T a j = δ i j = { 1 when i = j 0 when i ≠ j a_i^Ta_j = \delta_{ij} = \begin{cases} 1 & \quad \text{when } i = j\\ 0 & \quad \text{when } i \neq j \end{cases} aiT?aj?=δij?={10?when i=jwhen i?=j?
因為 A T = A ? 1 A^T=A^{-1} AT=A?1,所以上述結論對 A A A的行向量亦成立，
（補充閱讀：同濟大學線性代數第五版第六章第三節）

5. Gram-Schmidt process for constructing an orthonoral basis 用格拉姆-施密特正交化構建正交基

Start with n n n linearly independent basis vectors v = { v 1 , v 2 , … , v n } v = \{ v_1,v_2,\dots,v_n \} v={v1?,v2?,…,vn?}. Then
e 1 = v 1 ∥ v 1 ∥ e_1 = \frac {v_1} {\|v_1\|} e1?=∥v1?∥v1??
u 2 = v 2 ? ( v 2 ? e 1 ) e 1 u_2 = v_2 - (v_2 \cdot e_1)e_1 u2?=v2??(v2??e1?)e1? so e 2 = u 2 ∥ u 2 ∥ e_2 = \frac {u_2} {\|u_2\|} e2?=∥u2?∥u2??
… and so on for u 3 u_3 u3? being the remnant part of v 3 v_3 v3? not composed of the preceding e e e-vectors, etc. …

6. Transformation in a Plane or other object

First transform into the basis referred to the reflection plane, or whicherev; E ? 1 E^{-1} E?1.
Then do the reflection or other transformation, in the plane of the object T E T_E TE?.
Then transform back intor the original basis E.
So our transformed vector r ′ = E T E E ? 1 r r' = ET_EE^{-1}r r′=ETE?E?1r

7. Eigenstuff 特征分解

To investigate the characteristics of the n n n by n n n matrix A A A, you are looking for the solutions the the equation,
A x = λ x Ax=\lambda x Ax=λx
where λ \lambda λ is a scalar eigenvalue. Eigenvalues will staisfy the following condition
( A ? λ I ) x = 0 (A-\lambda I)x = 0 (A?λI)x=0
where I I I is a n n n by n n n dimensional identity matrix

8. PageRank

To find the dominant eigenvector of link matrix L L L, the Power Method can be iteratively applied, staring from a uniform initial vector r ? \vec{r} r .
r i + 1 = L r i r^{i+1} = Lr^i ri+1=Lri
A damping factor, d, can be implement to stabilize this method as follows.
r i + 1 = d L r i + 1 ? d n r^{i+1} = dLr^i + \frac{1-d}{n} ri+1=dLri+n1?d?

第二部分 Multivariate 多元微積分

1. Defenition of a derivative 積分定義

f ′ ( s ) = d f ( x ) d x = lim ? x → 0 f ( x + Δ x ) ? f ( x ) Δ x f'(s) = \frac{\mathrm{d}f(x)}{\mathrm{d}x} = \lim\limits_{x \to 0} \frac{f(x + \Delta x) -f(x)}{\Delta x} f′(s)=dxdf(x)?=x→0lim?Δxf(x+Δx)?f(x)?

2. Time saving rules

Sum Rule:
d d x ( f ( x ) + g ( x ) ) = d d x ( f ( x ) ) + d d x ( g ( x ) ) \frac{\mathrm{d}}{\mathrm{d}x}(f(x)+g(x)) = \frac{\mathrm{d}}{\mathrm{d}x}(f(x)) + \frac{\mathrm {d}}{\mathrm{d}x}(g(x)) dxd?(f(x)+g(x))=dxd?(f(x))+dxd?(g(x))
Power Rule:
f ( x ) = a x b f(x) = ax^b f(x)=axb
f ′ ( x ) = a b x b ? 1 f'(x) = abx^{b-1} f′(x)=abxb?1
Product Rule:
A ( x ) = f ( x ) g ( x ) A(x) = f(x)g(x) A(x)=f(x)g(x)
A ′ ( x ) = f ′ ( x ) g ( x ) + f ( x ) g ′ ( x ) A'(x) = f'(x)g(x) + f(x)g'(x) A′(x)=f′(x)g(x)+f(x)g′(x)
Chain Rule:
If h = h ( p ) h = h(p) h=h(p) and p = p ( m ) p = p(m) p=p(m)
then d h d m = d h d p × d p d m \frac{\mathrm {d}h}{\mathrm{d}m} = \frac{\mathrm{d}h}{\mathrm{d}p} × \frac{\mathrm{d}p}{\mathrm{d}m} dmdh?=dpdh?×dmdp?
Total derivative:
For the function f ( x , y , z , … ? ) f(x, y, z, \dots) f(x,y,z,…), where each variable is a function of parameter t t t, the total derivative is
d f d t = ? f ? x d x d t + ? f ? y d y d t + ? f ? z d z d t + … \frac{\mathrm{d}f}{\mathrm{d}t} = \frac{\partial f}{\partial x}\frac{\mathrm{d}x}{\mathrm{d}t} + \frac{\partial f}{\partial y}\frac{\mathrm{d}y}{\mathrm{d}t} + \frac{\partial f}{\partial z}\frac{\mathrm{d}z}{\mathrm{d}t} + \dots dtdf?=?x?f?dtdx?+?y?f?dtdy?+?z?f?dtdz?+…

3. Derivatives of named functions 常見函式的導數

? ? x 1 x = ? 1 x 2 \frac{\partial}{\partial x}\frac{1}{\mathrm x}=-\frac{1}{\mathrm x^2} ?x??x1?=?x21?
? ? x sin ? x = cos ? x \frac{\partial}{\partial x}\sin x = \cos x ?x??sinx=cosx
? ? x cos ? x = ? sin ? x \frac{\partial}{\partial x}\cos x = - \sin x ?x??cosx=?sinx
? ? x exp ? x = exp ? x \frac{\partial}{\partial x}\exp x = \exp x ?x??expx=expx

4. Derivative structures

f = f ( x , y , z ) f = f(x,y,z) f=f(x,y,z)

Jacobian:
J f = [ ? f ? x , ? f ? y , ? f ? z ] \mathbf J_f = \begin{bmatrix} \frac{\partial f}{\partial x}, & \frac{\partial f}{\partial y}, & \frac{\partial f}{\partial z} \end{bmatrix} Jf?=[?x?f?,??y?f?,??z?f??]
Hessian:
H f = [ ? 2 f ? x 2 ? 2 f ? x ? y ? 2 f ? x ? z ? 2 f ? y ? x ? 2 f ? y 2 ? 2 f ? y ? z ? 2 f ? z ? x ? 2 f ? z ? y ? 2 f ? z 2 ] \mathbf H_f = \begin{bmatrix} \frac{\partial^2 f}{\partial x^2} & \frac{\partial^2 f}{\partial x \partial y} & \frac{\partial^2 f}{\partial x \partial z} \\ \frac{\partial^2 f}{\partial y \partial x} & \frac{\partial^2 f}{\partial y^2} & \frac{\partial^2 f}{\partial y \partial z} \\ \frac{\partial^2 f}{\partial z \partial x} & \frac{\partial^2 f}{\partial z \partial y} & \frac{\partial^2 f}{\partial z^2} \end{bmatrix} Hf?=?????x2?2f??y?x?2f??z?x?2f???x?y?2f??y2?2f??z?y?2f???x?z?2f??y?z?2f??z2?2f??????

5. Taylor Series 泰勒展開式

5.1 Maclaurin 麥克勞林展開式

f ( x ) = f ( 0 ) + f ′ ( c ) ( x ) + 1 2 f ′ ′ ( 0 ) ( x ) 2 + ? = ∑ n = 0 ∞ f ( n ) ( 0 ) n ! ( x ) n f(x) = f(0) + f'(c)(x) + \frac{1}{2}f''(0)(x)^2 + \dots = \sum_{n=0}^{\infty}\frac{f^{(n)}(0)}{n!}(x)^n f(x)=f(0)+f′(c)(x)+21?f′′(0)(x)2+?=n=0∑∞?n!f(n)(0)?(x)n

5.2 泰勒展開式

Univariate 一元:
f ( x ) = f ( c ) + f ′ ( c ) ( x ? c ) + 1 2 f ′ ′ ( c ) ( x ? c ) 2 + ? = ∑ n = 0 ∞ f ( n ) ( c ) n ! ( x ? c ) n f(x) = f(c) + f'(c)(x-c) + \frac{1}{2}f''(c)(x-c)^2 + \dots = \sum_{n=0}^{\infty}\frac{f^{(n)}(c)}{n!}(x-c)^n f(x)=f(c)+f′(c)(x?c)+21?f′′(c)(x?c)2+?=n=0∑∞?n!f(n)(c)?(x?c)n

f ( x + Δ x ) = f ( x ) + f ′ ( x ) Δ x + 1 2 f ′ ′ ( x ) Δ x 2 = ∑ n = 0 ∞ f ( n ) ( x ) n ! Δ x n f(x+\Delta x) = f(x) + f'(x)\Delta x+ \frac{1}{2}f''(x)\Delta x^2 = \sum_{n=0}^{\infty}\frac{f^{(n)}(x)}{n!}\Delta x^{n} f(x+Δx)=f(x)+f′(x)Δx+21?f′′(x)Δx2=n=0∑∞?n!f(n)(x)?Δxn

Multivariate 多元:
f ( x ) = f ( c ) + J f ( c ) ( x ? c ) + … 1 2 ( x ? c ) t H f ( c ) ( x ? c ) + … f(x) = f(c) + \mathbf J_f(c)(x-c) + \dots \\ \frac{1}{2}(x-c)^t\mathbf H_f(c)(x-c) + \dots f(x)=f(c)+Jf?(c)(x?c)+…21?(x?c)tHf?(c)(x?c)+…
x x x and c c c are vector for variable and constant

6. Optimization and Vector Calculus

Newton-Raphson:
x i + 1 = x i ? f ( x i ) f ′ ( x i ) x_{i+1} = x_i - \frac{f(x_i)}{f'(x_i)} xi+1?=xi??f′(xi?)f(xi?)?
Grad:
? f = [ ? f ? x ? f ? y ? f ? z ] \nabla f = \begin{bmatrix} \frac{\partial f}{\partial x} \\ \frac{\partial f}{\partial y} \\ \frac{\partial f}{\partial z} \end{bmatrix} ?f=?????x?f??y?f??z?f??????
Directional Gradient:
? f ? r ^ \nabla f\cdot\hat{r} ?f?r^
Gradient Descent:
s n + 1 = s n ? γ ? f s_{n+1} = s_n - \gamma \nabla f sn+1?=sn??γ?f
Lagrange Multipliers λ \lambda λ:
? f = λ ? g \nabla f = \lambda \nabla g ?f=λ?g
[ ? f ? x ? f ? y ] = λ [ ? g ? x ? g ? y ] \begin{bmatrix} \frac{\partial f}{\partial x} \\ \frac{\partial f}{\partial y} \end{bmatrix} = \lambda \begin{bmatrix} \frac{\partial g}{\partial x} \\ \frac{\partial g}{\partial y} \end{bmatrix} [?x?f??y?f??]=λ[?x?g??y?g??]
? L ( x , y , λ ) = [ ? f ? x ? λ ? g ? x ? f ? y ? λ ? g ? y ? g ( x ) ] \nabla \mathcal{L} (x,y,\lambda) = \begin{bmatrix} \frac{\partial f}{\partial x} - \lambda \frac{\partial g}{\partial x} \\ \frac{\partial f}{\partial y} - \lambda \frac{\partial g}{\partial y} \\ -g(x) \end{bmatrix} ?L(x,y,λ)=????x?f??λ?x?g??y?f??λ?y?g??g(x)????
Least Squares - χ 2 \chi^2 χ2 minimization:
χ 2 = ∑ i n ( y i ? y ( x i ; a k ) ) 2 σ i \chi^2 = \sum_i^n \frac{(y_i-y(x_i;a_k))^2}{\sigma_i} χ2=i∑n?σi?(yi??y(xi?;ak?))2?
criterion: ? χ 2 = 0 \nabla \chi^2 = 0 ?χ2=0
a n e x t = a c u r ? γ ? χ 2 = a c u r + γ ∑ i n ( y i ? y ( x i ; a k ) ) σ i ? y ? a k a_{next} = a_{cur} - \gamma \nabla \chi^2 \\ = a_{cur} + \gamma \sum_i^n \frac{(y_i-y(x_i;a_k))}{\sigma_i} \frac{\partial y}{\partial a_k} anext?=acur??γ?χ2=acur?+γi∑n?σi?(yi??y(xi?;ak?))??ak??y?

第三部分 PCA (Principal Component Analysis) 主成分分析

1. 1-D datasets 一維資料

Given a data set D = { x 1 , ? ? , x N } D = \{x_1,\cdots,x_N\} D={x1?,?,xN?}, x n ∈ R x_n \in R xn?∈R,

Mean Value
E [ D ] = 1 N ∑ n = 1 N x n E[D]=\frac{1}{N}\sum_{n=1}^{N}x_n E[D]=N1?n=1∑N?xn?
Variance
V [ D ] = E [ ( x n ? μ ) 2 ] = 1 N ∑ n = 1 n ( x n ? μ ) 2 V[D]=E[(x_n-\mu)^2]=\frac{1}{N}\sum_{n=1}^{n}(x_n-\mu)^2 V[D]=E[(xn??μ)2]=N1?n=1∑n?(xn??μ)2

2. Definite symmetric matrix

Given a symmetric real matrix M ∈ R n × n M \in R^{n×n} M∈Rn×n, ? z ∈ R n × x + \forall z \in R^{n×x}+ ?z∈Rn×x+. When z T M z > 0 z^TMz > 0 zTMz>0, then M M M is a positive-definite matrix. When z T M z ≥ 0 z^TMz \geq 0 zTMz≥0, then M M M is a positiv semi-definite matrix.

3. higher-dimensional datasets 高維資料

Given a data set X = { x 1 , ? ? , x N } X = \{x_1,\cdots,x_N\} X={x1?,?,xN?}, x n ∈ R D × 1 x_n \in R^{D×1} xn?∈RD×1, X ∈ R D × N X \in R^{D×N} X∈RD×N
X = [ x 1 , 1 x 1 , 2 ? x 1 , N x 2 , 1 x 2 , 2 ? x 2 , N ? ? ? ? x D , 1 x D , 2 ? x D , N ] X = \begin{bmatrix} x_{1,1} & x_{1,2} & \cdots & x_{1,N} \\ x_{2,1} & x_{2,2} & \cdots & x_{2,N} \\ \vdots & \vdots & \ddots & \vdots \\ x_{D,1} & x_{D,2} & \cdots & x_{D,N} \end{bmatrix} X=??????x1,1?x2,1??xD,1??x1,2?x2,2??xD,2???????x1,N?x2,N??xD,N????????

Mean Value
μ = E [ X ] = 1 N ∑ n = 1 N x n = [ μ 1 μ 2 ? μ D ] ∈ R D × 1 \mu=E[X]=\frac{1}{N}\sum_{n=1}^{N}x_n = \begin{bmatrix} \mu_1 \\ \mu_2 \\ \vdots \\ \mu_D \end{bmatrix} \in R^{D×1} μ=E[X]=N1?n=1∑N?xn?=??????μ1?μ2??μD????????∈RD×1
Variance
V [ X ] = 1 N ∑ n = 1 N ( x n ? μ ) ( x n ? μ ) T = 1 N [ ( x n , 1 ? μ n , 1 ) ( x n , 1 ? μ ) T ( x n , 2 ? μ ) ( x n , 1 ? μ ) T ? ( x n , D ? μ ) ( x n , 1 ? μ ) T ( x n , 1 ? μ ) ( x n , 2 ? μ ) T ( x n , 2 ? μ ) ( x n , 2 ? μ ) T ? ( x n , D ? μ ) ( x n , 2 ? μ ) T ? ? ? ? ( x n , 1 ? μ ) ( x n , D ? μ ) T ( x n , 2 ? μ ) ( x n , D ? μ ) T ? ( x n , D ? μ ) ( x n , D ? μ ) T ] ∈ R D × D \begin{aligned}V[X]&=\frac{1} {N}\sum_{n=1}^N(x_n - \mu) (x_n - \mu)^T \\ &= \frac{1}{N} \begin{bmatrix} (x_{n,1} - \mu_{n,1})(x_{n,1} - \mu)^T & (x_{n,2} - \mu)(x_{n,1} - \mu)^T & \cdots & (x_{n,D} - \mu)(x_{n,1} - \mu)^T \\ (x_{n,1} - \mu)(x_{n,2} - \mu)^T & (x_{n,2} - \mu)(x_{n,2} - \mu)^T & \cdots & (x_{n,D} - \mu)(x_{n,2} - \mu)^T \\ \vdots & \vdots & \ddots & \vdots \\ (x_{n,1} - \mu)(x_{n,D} - \mu)^T & (x_{n,2} - \mu)(x_{n,D} - \mu)^T & \cdots & (x_{n,D} - \mu)(x_{n,D} - \mu)^T \end{bmatrix} \in R^{D×D} \end{aligned} V[X]?=N1?n=1∑N?(xn??μ)(xn??μ)T=N1???????(xn,1??μn,1?)(xn,1??μ)T(xn,1??μ)(xn,2??μ)T?(xn,1??μ)(xn,D??μ)T?(xn,2??μ)(xn,1??μ)T(xn,2??μ)(xn,2??μ)T?(xn,2??μ)(xn,D??μ)T??????(xn,D??μ)(xn,1??μ)T(xn,D??μ)(xn,2??μ)T?(xn,D??μ)(xn,D??μ)T???????∈RD×D?
D D D維資料的方差為 D × D D \times D D×D的矩陣，對角線上的元素 a i i a_{ii} aii?為第 i i i維資料第方差，其它元素 a i j a_{ij} aij?是第 i i i維與 j j j維資料第協方差，

4. Effect of Linear Transformations 線性變換對均值與方差對影響

Given a data set D = { x 1 , ? ? , x N } D = \{x_1,\cdots,x_N\} D={x1?,?,xN?}, x n ∈ R D × 1 x_n \in R^{D×1} xn?∈RD×1, D ∈ R D × N D \in R^{D×N} D∈RD×N, with
E [ D ] = μ E[D] = \mu E[D]=μ
V [ D ] = Q V[D] = Q V[D]=Q
linear transformations:
x i ′ = A x i + b x'_i = Ax_i +b xi′?=Axi?+b
then
E [ D ′ ] = A μ + b E[D'] = A\mu+b E[D′]=Aμ+b
V [ D ′ ] = A Q A T V[D'] = AQA^T V[D′]=AQAT
where D ′ = { x 1 ′ , x 2 ′ , ? ? , x N ′ } D' = \{x'_1,x'_2,\cdots,x'_N\} D′={x1′?,x2′?,?,xN′?}

5. Dot product 點積

dot product
x T y = ∑ d = 1 D x d y d , x , y ∈ R D x^Ty=\sum_{d=1}^Dx_dy_d, \quad x,y \in R^D xTy=d=1∑D?xd?yd?,x,y∈RD
length
∥ x ∥ = x T x \|x\|=\sqrt{x^Tx} ∥x∥=xTx ?
angle ω \omega ω between vectors x x x, y y y
c o s ω = x T y ∥ x ∥ ∥ y ∥ cos\omega=\frac{x^Ty}{\|x\|\|y\|} cosω=∥x∥∥y∥xTy?

6. Inner product 內積

Consider a vector space V. A positive definete, symmetric bilinear mapping ? ? , ? ? : V × V → R \langle\cdot,\cdot\rangle: V \times{} V \to R ??,??:V×V→R is called an inner product on V V V.

symmetric: ? x , y ∈ V ? x , y ? = ? y , x ? \forall x, y \in V \quad \langle x,y \rangle = \langle y,x\rangle ?x,y∈V?x,y?=?y,x?
positive definite: ? x ∈ V \ { 0 } ? x , x ? > 0 , ? 0 , 0 ? = 0 \forall x \in V\backslash\{0\} \quad \langle x, x \rangle > 0, \langle 0,0 \rangle=0 ?x∈V\{0}?x,x?>0,?0,0?=0
bilinear: ? x , y , z ∈ V , λ ∈ R \forall x,y,z \in V, \lambda \in R ?x,y,z∈V,λ∈R
? λ x + y , z ? = λ ? x , z ? + ? y , z ? \langle \lambda x+y,z\rangle = \lambda\langle x,z\rangle+\langle y,z\rangle ?λx+y,z?=λ?x,z?+?y,z?
? x , λ y + z ? = λ ? x , y ? + ? x , z ? \langle x,\lambda y+z\rangle = \lambda\langle x,y\rangle+\langle x,z\rangle ?x,λy+z?=λ?x,y?+?x,z?
length of a vector x ∈ V x \in V x∈V
∥ x ∥ = ? x , x ? \|x\|=\sqrt{\langle x,x\rangle} ∥x∥=?x,x? ?
distance between two vectors x , y ∈ V x,y \in V x,y∈V
d ( x , y ) = ∥ x ? y ∥ = ? x ? y , x ? y ? d(x,y)=\|x-y\|=\sqrt{\langle x-y,x-y\rangle} d(x,y)=∥x?y∥=?x?y,x?y? ?
angle ω \omega ω between two vectors x , y ∈ V x,y\in V x,y∈V
c o s ω = ? x , y ? ∥ x ∥ ∥ y ∥ cos\omega=\frac{\langle x,y\rangle}{\|x\|\|y\|} cosω=∥x∥∥y∥?x,y??
where ∥ x ∥ \|x\| ∥x∥ is defined via ineer product as ? x , x ? \sqrt{\langle x,x\rangle} ?x,x? ?

點積向其它資料型別對拓展

Ineer product for continuous data

? f , g ? = ∫ a b f ( x ) g ( x ) d x \langle f,g \rangle =\int\limits_a^b f(x)g(x)\mathrm{d}x ?f,g?=a∫b?f(x)g(x)dx

Inner product for random variables

? X , Y ? = C o v ( X , Y ) \langle X,Y \rangle=Cov(X,Y) ?X,Y?=Cov(X,Y)

7. Projection 投影

7.1 Projection onto 1D subspaces

Consider a vector space V V V and a subspace U U U of V V V. With a basis vector b b b of U U U, we obtain the orthogonal projection of any x ∈ V x \in V x∈V onto U U U via
π u ( x ) = λ b , λ = b T x b T b = b T x ∥ b ∥ 2 \pi_u(x) = \lambda b, \quad \lambda=\frac{b^Tx}{b^Tb}=\frac{b^Tx}{\|b\|^2} πu?(x)=λb,λ=bTbbTx?=∥b∥2bTx?
where λ \lambda λ is the coordinate of π u ( x ) \pi_u(x) πu?(x) with respect to b b b.
The projection matrix P P P is
P = b b T b T b = b b T ∥ b ∥ 2 P=\frac{bb^T}{b^Tb}=\frac{bb^T}{\|b\|^2} P=bTbbbT?=∥b∥2bbT?
such that
π u ( x ) = P x \pi_u(x)=Px πu?(x)=Px
for all x ∈ V x\in V x∈V

7.2 Projection onto k k k-dimensional subspaces

Consider a vector space V V V and a subspace U U U of V V V. With a basis vector b 1 , ? ? , b k b_1,\cdots,b_k b1?,?,bk? of U U U, we obtain the orthogonal projection of any x ∈ V x \in V x∈V onto U U U via
π u ( x ) = B λ , λ = ( B T B ) ? 1 B T x \pi_u(x) = B\lambda,\quad \lambda=(B^TB)^{-1}B^Tx πu?(x)=Bλ,λ=(BTB)?1BTx
B = ( b 1 ∣ ? ∣ b k ) ∈ R n × k B=(b_1|\cdots|b_k)\in R^{n\times k} B=(b1?∣?∣bk?)∈Rn×k
where λ \lambda λ is the coordinate of π u ( x ) \pi_u(x) πu?(x) with respect to b 1 , ? ? , b k b_1,\cdots,b_k b1?,?,bk? of U U U.
The projection matrix P P P is
P = B ( B T B ) ? 1 B T P=B(B^TB)^{-1}B^T P=B(BTB)?1BT
such that
π u ( x ) = P x \pi_u(x)=Px πu?(x)=Px
for all x ∈ V x\in V x∈V

8. PCA derivation 主成分分析推導

8.1 Setting up ( X n = ∑ i = 1 D β i n b i X_n=\sum_{i=1}^D\beta_{in}b_i Xn?=∑i=1D?βin?bi?, X n ~ = ∑ i = i M β i n b i \tilde{X_n} = \sum_{i=i}^M\beta_{in}b_i Xn?~?=∑i=iM?βin?bi?, J = 1 N ∥ X n ? X n ~ ∥ 2 \mathbf{J} =\frac{1}{N}\|X_n-\tilde{X_n}\|^2 J=N1?∥Xn??Xn?~?∥2, S = 1 N ∑ n = 1 N X n X n T \mathrm{S}=\frac{1}{N}\sum_{n=1}^N X_nX_n^T S=N1?∑n=1N?Xn?XnT?)

Given a data set X = x 1 , ? ? , x n X={x_1,\cdots,x_n} X=x1?,?,xn?, x i ∈ R D x_i\in R^D xi?∈RD, E [ X n ] = 0 \mathrm{E}[X_n]=0 E[Xn?]=0, Original basis A = ( a 1 , ? ? , a n ) A=(a_1,\cdots,a_n) A=(a1?,?,an?), project to a new orthonormal basis B = ( b 1 , ? ? , b n ) , b n ∈ R D B=(b_1,\cdots,b_n),\quad b_n \in R^D B=(b1?,?,bn?),bn?∈RD

S = 1 N ∑ n = 1 N ( x n ? μ ) ( x n ? μ ) T = 1 N ∑ n = 1 N X n X n T ( E [ X n ] = 0 ) \begin{aligned}\mathrm{S} &= \frac{1} {N}\sum_{n=1}^N (x_n - \mu) (x_n - \mu)^T \\ &= \frac{1}{N}\sum_{n=1}^N X_{n} X_{n}^T \quad (\mathrm{E}[X_n]=0) \end{aligned} S?=N1?n=1∑N?(xn??μ)(xn??μ)T=N1?n=1∑N?Xn?XnT?(E[Xn?]=0)?
X X X represented in new basis
X n = ∑ i = 1 D β i n b i X_n=\sum_{i=1}^D\beta_{in}b_i Xn?=i=1∑D?βin?bi?
Our goal is represent X n X_n Xn? in D-dimentional space to a lower M-dimentional
X n ~ = ∑ i = i M β i n b i \tilde{X_n} = \sum_{i=i}^M\beta_{in}b_i Xn?~?=i=i∑M?βin?bi?
with minimum difference between X n X_n Xn? and X n ~ \tilde{X_n} Xn?~?. The cost function
J = 1 N ∥ X n ? X n ~ ∥ 2 \mathbf{J} =\frac{1}{N}\|X_n-\tilde{X_n}\|^2 J=N1?∥Xn??Xn?~?∥2

8.2 got coordinate/code β i n \beta_{in} βin? ( β i n = X n T b i \beta_{in}=X_n^Tb_i βin?=XnT?bi?)

? J ? β i n = ? J ? X n ~ ? X n ~ ? β i n = ? 2 N ( X n ? X n ~ ) T b i = ? 2 N ( X n ? ∑ i = i M β i n b i ) T b i ( X n ~ = ∑ i = i M β i n b i ) = ? 2 N ( X n ? β i n ∑ i = i M b i ) T b i ( β i n is scalar ) = ? 2 N ( X n T b i ? β i n b i T b i ) ( ONB ) = ? 2 N ( X n T b i ? β i n ) ( ONB ) \begin{aligned} \frac{\partial\mathbf{J}}{\partial\beta_{in}} &= \frac{\partial\mathbf{J}}{\partial\tilde{X_n}}\frac{\partial\tilde{X_n}}{\partial\beta_{in}} \\ &= -\frac{2}{N}(X_n-\tilde{X_n})^Tb_i \\ &= -\frac{2}{N}(X_n-\sum_{i=i}^M\beta_{in}b_i)^Tb_i \quad (\tilde{X_n} = \sum_{i=i}^M\beta_{in}b_i) \\ &= -\frac{2}{N}(X_n-\beta_{in}\sum_{i=i}^Mb_i)^Tb_i \quad (\beta_{in}\text{ is scalar})\\ &= -\frac{2}{N}(X_n^Tb_i-\beta_{in}b_i^Tb_i) \quad (\text{ONB}) \\ &= -\frac{2}{N}(X_n^Tb_i-\beta_{in}) \quad (\text{ONB}) \end{aligned}

轉載請註明出處，本文鏈接：https://www.uj5u.com/qita/208946.html

標籤：其他

上一篇：周期信號的傅里葉級數展開分析（利用MATLAB）

下一篇：【數學建模】1層次分析法模型部分

標籤雲: 其他(157675) Python(38076) JavaScript(25376) Java(17977) C(15215) 區塊鏈(8255) C＃(7972) AI(7469) 爪哇(7425) MySQL(7132) html(6777) 基礎類(6313) sql(6102) 熊猫(6058) PHP(5869) 数组(5741) R(5409) Linux(5327) 反应(5209) 腳本語言(PerlPython)(5129) 非技術區(4971) Android(4554) 数据框(4311) css(4259) 节点.js(4032) C語言(3288) json(3245) 列表(3129) 扑(3119) C++語言(3117) 安卓(2998) 打字稿(2995) VBA(2789) Java相關(2746) 疑難問題(2699) 细绳(2522) 單片機工控(2479) iOS(2429) ASP.NET(2402) MongoDB(2323) 麻木的(2285) 正则表达式(2254) 字典(2211) 循环(2198) 迅速(2185) 擅长(2169) 镖(2155) 功能(1967) .NET技术(1958) Web開發(1951) python-3.x(1918) HtmlCss(1915) 弹簧靴(1913) C++(1909) xml(1889) PostgreSQL(1872) .NETCore(1853) 谷歌表格(1846) Unity3D(1843) for循环(1842)

熱門瀏覽

網閘典型架構簡述
網閘架構一般分為兩種：三主機的三系統架構網閘和雙主機的2+1架構網閘。三主機架構分別為內端機、外端機和仲裁機。三機無論從軟體和硬體上均各自獨立。首先從硬體上來看，三機都用各自獨立的主板、記憶體及存盤設備。從軟體上來看，三機有各自獨立的作業系統。這樣能達到完全的三機獨立。對于“2+1”系統，“2”分為 ......
uj5u.com 2020-09-10 02:00:44 more
如何從xshell上傳檔案到centos linux虛擬機里
如何從xshell上傳檔案到centos linux虛擬機里及：虛擬機CentOs下執行 yum -y install lrzsz命令，出現錯誤：鏡像無法找到軟體包前言一、安裝lrzsz步驟二、上傳檔案三、遇到的問題及解決方案總結前言提示：其實很簡單，往虛擬機上安裝一個上傳檔案的工具 ......
uj5u.com 2020-09-10 02:00:47 more
一、SQLMAP入門
一、SQLMAP入門 1、判斷是否存在注入 sqlmap.py -u 網址/id=1 id=1不可缺少。當注入點后面的引數大于兩個時。需要加雙引號， sqlmap.py -u "網址/id=1&uid=1" 2、判斷文本中的請求是否存在注入從文本中加載http請求，SQLMAP可以從一個文本檔案中 ......
uj5u.com 2020-09-10 02:00:50 more
Metasploit 簡單使用教程
metasploit 簡單使用教程浩先生， 2020-08-28 16:18:25 分類專欄： kail 網路安全 linux 文章標簽： linux資訊安全編輯著作權 metasploit 使用教程前言一、Metasploit是什么？二、準備作業三、具體步驟前言 Msfconsole ......
uj5u.com 2020-09-10 02:00:53 more
游戲逆向之驅動層與用戶層通訊
驅動層代碼： #pragma once #include <ntifs.h> #define add_code CTL_CODE(FILE_DEVICE_UNKNOWN,0x800,METHOD_BUFFERED,FILE_ANY_ACCESS) /* 更多游戲逆向視頻www.yxfzedu.com ......
uj5u.com 2020-09-10 02:00:56 more
北斗電力時鐘（北斗授時服務器）讓網路資料更精準
北斗電力時鐘（北斗授時服務器）讓網路資料更精準北斗電力時鐘（北斗授時服務器）讓網路資料更精準京準電子科技官微——ahjzsz 近幾年，資訊技術的得了快速發展，互聯網在逐漸普及，其在人們生活和生產中都得到了廣泛應用，并且取得了不錯的應用效果。計算機網路資訊在電力系統中的應用，一方面使電力系統的運行 ......
uj5u.com 2020-09-10 02:01:03 more
【CTF】CTFHub 技能樹彩蛋 writeup
?碎碎念 CTFHub：https://www.ctfhub.com/ 筆者入門CTF時時剛開始刷的是bugku的舊平臺，后來才有了CTFHub。感覺不論是網頁UI設計，還是題目質量，賽事跟蹤，工具軟體都做得很不錯。而且因為獨到的金幣制度的確讓人有一種想去刷題賺金幣的感覺。個人還是非常喜歡這個 ......
uj5u.com 2020-09-10 02:04:05 more
02windows基礎操作
我學到了一下幾點 Windows系統目錄結構與滲透的作用常見Windows的服務詳解 Windows埠詳解常用的Windows注冊表詳解 hacker DOS命令詳解（net user / type /md /rd/ dir /cd /net use copy、批處理等）利用dos命令制作 ......
uj5u.com 2020-09-10 02:04:18 more
03.Linux基礎操作
我學到了以下幾點 01Linux系統介紹02系統安裝，密碼啊破解03Linux常用命令04LAMP 01LINUX windows： win03 8 12 16 19 配置不繁瑣 Linux：redhat,centos(紅帽社區版)，Ubuntu server,suse unix:金融機構，證券，銀 ......
uj5u.com 2020-09-10 02:04:30 more
05HTML
01HTML介紹 02頭部標簽講解03基礎標簽講解04表單標簽講解 HTML前段語言 js1.了解代碼2.根據代碼懂得挖掘漏洞（POST注入/XSS漏洞上傳）3.黑帽seo 白帽seo 客戶網站被黑帽植入劫持代碼如何處理4.熟悉html表單 <html><head><title>TDK標題，描述 ......
uj5u.com 2020-09-10 02:04:36 more

最新发布

2023年最新微信小程式抓包教程
01 開門見山隔一個月發一篇文章，不過分。首先回顧一下《微信系結手機號資料庫被脫庫事件》，我也是第一時間得知了這個訊息，然后跟蹤了整件事情的經過。下面是這起事件的相關截圖以及近日流出的一萬條資料樣本：個人認為這件事也沒什么，還不如關注一下之前45億快遞資料查詢渠道疑似在近日復活的訊息。訊息是 ......
uj5u.com 2023-04-20 08:48:24 more
web3 產品介紹：metamask 錢包使用最多的瀏覽器插件錢包
Metamask錢包是一種基于區塊鏈技術的數字貨幣錢包，它允許用戶在安全、便捷的環境下管理自己的加密資產。Metamask錢包是以太坊生態系統中最流行的錢包之一，它具有易于使用、安全性高和功能強大等優點。本文將詳細介紹Metamask錢包的功能和使用方法。一、 Metamask錢包的功能數字資 ......
uj5u.com 2023-04-20 08:47:46 more
vulnhub_Earth
前言靶機地址->>>vulnhub_Earth 攻擊機ip：192.168.20.121 靶機ip：192.168.20.122 參考文章 https://www.cnblogs.com/Jing-X/archive/2022/04/03/16097695.html https://www.cnb ......
uj5u.com 2023-04-20 07:46:20 more
從4k到42k，軟體測驗工程師的漲薪史，給我看哭了
清明節一過，盲猜大家已經無心上班，在數著日子準備過五一，但一想到銀行卡里的余額……瞬間心情就不美麗了。最近，2023年高校畢業生就業調查顯示，本科畢業月平均起薪為5825元。調查一出，便有很多同學表示自己又被平均了。看著這一資料，不免讓人想到前不久中國青年報的一項調查：近六成大學生認為畢業10年內會 ......
uj5u.com 2023-04-20 07:44:00 more
最新版本 Stable Diffusion 開源 AI 繪畫工具之中文自動提詞篇
🎈 標簽生成器由于輸入正向提示詞 prompt 和反向提示詞 negative prompt 都是使用英文，所以對學習母語的我們非常不友好使用網址：https://tinygeeker.github.io/p/ai-prompt-generator 這個網址是為了讓大家在使用 AI 繪畫的時候 ......
uj5u.com 2023-04-20 07:43:36 more
漫談前端自動化測驗演進之路及測驗工具分析
隨著前端技術的不斷發展和應用程式的日益復雜，前端自動化測驗也在不斷演進。隨著 Web 應用程式變得越來越復雜，自動化測驗的需求也越來越高。如今，自動化測驗已經成為 Web 應用程式開發程序中不可或缺的一部分，它們可以幫助開發人員更快地發現和修復錯誤，提高應用程式的性能和可靠性。 ......
uj5u.com 2023-04-20 07:43:16 more
CANN開發實踐：4個DVPP記憶體問題的典型案例解讀
摘要：由于DVPP媒體資料處理功能對存放輸入、輸出資料的記憶體有更高的要求（例如，記憶體首地址128位元組對齊），因此需呼叫專用的記憶體申請介面，那么本期就分享幾個關于DVPP記憶體問題的典型案例，并給出原因分析及解決方法。本文分享自華為云社區《FAQ_DVPP記憶體問題案例》，作者：昇騰CANN。 DVPP ......
uj5u.com 2023-04-20 07:43:03 more
msf學習
msf學習以kali自帶的msf為例一、msf核心模塊與功能 msf模塊都放在/usr/share/metasploit-framework/modules目錄下 1、auxiliary 輔助模塊，輔助滲透（埠掃描、登錄密碼爆破、漏洞驗證等） 2、encoders 編碼器模塊，主要包含各種編碼 ......
uj5u.com 2023-04-20 07:42:59 more
Halcon軟體安裝與界面簡介
1. 下載Halcon17版本到到本地 2. 雙擊安裝包后 3. 步驟如下 1.2 Halcon軟體安裝界面分為四大塊 1. Halcon的五個助手 1) 影像采集助手：與相機連接，設定相機引數，采集影像 2) 標定助手：九點標定或是其它的標定，生成標定檔案及內參外參，可以將像素單位轉換為長度單位 ......
uj5u.com 2023-04-20 07:42:17 more
在MacOS下使用Unity3D開發游戲
第一次發博客，先發一下我的游戲開發環境吧。去年2月份買了一臺MacBookPro2021 M1pro(以下簡稱mbp)，這一年來一直在用mbp開發游戲。我大致分享一下我的開發工具以及使用體驗。 1、Unity 官網鏈接： https://unity.cn/releases 我一般使用的Apple ......
uj5u.com 2023-04-20 07:40:19 more

友情鏈接

有解無憂

Mathematics for Machine Learning 學習筆記