什么是梯度？

一、數學知識：

函式 z = y 2 ? x 2 z = y^2 - x^2 z=y2?x2
1. 偏導數: ? z ? x = ? 2 x \frac{\partial z}{\partial x}=-2x ?x?z?=?2x, ? z ? y = 2 y \frac{\partial z}{\partial y}=2y ?y?z?=2y
2. 梯度: ? f = ( ? z ? x , ? z ? y ) = ( ? 2 x , 2 y ) \nabla f=(\frac{\partial z}{\partial x},\frac{\partial z}{\partial y})=(-2x,2y) ?f=(?x?z?,?y?z?)=(?2x,2y)

神經網路的特征之一，從資料樣本中學習， 而loss函式就是我們可以自動確定的抓手，當然，使得loss函式達到最小值時，就是我們要尋找的引數，這時就引入了導數的概念，
導數的引入，可以使我們容易獲得極值點， f ′ ( x ) = 0 f'(x)=0 f′(x)=0，可是使得我們通過微分方程來輕松獲得極值點，但是導數僅僅是對一維函式所說的，但是在現實生活中，往往存在很多維度的屬性，這時候并不能用導數來完成這一作業，所以我們引入了梯度，

通過使用 θ t + 1 = θ t ? a t ? f ( θ t ) \theta_{t+1}=\theta_{t}-a_t\nabla f(\theta_t) θt+1?=θt??at??f(θt?)

函式： f ( θ 1 , θ 2 ) = θ 1 2 + θ 2 2 f(\theta_1,\theta_2)=\theta_1^2+\theta_2^2 f(θ1?,θ2?)=θ12?+θ22?
目標函式： m i n θ 1 , θ 2 ( f ( θ 1 , θ 2 ) ) \underset{\theta_1,\theta_2}{min}(f(\theta_1,\theta_2)) θ1?,θ2?min?(f(θ1?,θ2?))
更新規則:
1. θ 1 = θ 1 ? a d d θ 1 f ( θ 1 , θ 2 ) \theta_1=\theta_1-a\frac d{d\theta_1}f(\theta_1,\theta_2) θ1?=θ1??adθ1?d?f(θ1?,θ2?)
2. θ 2 = θ 2 ? a d d θ 2 f ( θ 1 , θ 2 ) \theta_2=\theta_2-a\frac d{d\theta_2}f(\theta_1,\theta_2) θ2?=θ2??adθ2?d?f(θ1?,θ2?)