動手實作深度學習(4): 神經網路的backward實作-有解無憂

傳送門： https://www.cnblogs.com/greentomlee/p/12314064.html

github: https://github.com/Leezhen2014/python_deep_learning

在第二篇中介紹了用數值微分的形式計算神經網路的梯度，數值微分的形式比較簡單也容易實作，但是計算上比較耗時，本章會介紹一種能夠較為高效的計算出梯度的方法：基于圖的誤差反向傳播，

根據 deep learning from scratch 這本書的介紹，在誤差反向傳播方法的實作上有兩種方法：一種是基于數學式的（第二篇就是利用的這種方法），一種是基于計算圖的，這兩種方法的本質是一樣的，有所不同的是表述方法，計算圖的方法可以參考feifei li負責的斯坦福大學公開課CS231n 或者theano的tutorial/Futher readings/graph Structures.

之前我們的誤差傳播是基于數學式的，可以看出對代碼撰寫者來說很麻煩；

這次我們換成基于計算圖的；

Backward是神經網路訓練程序中包含的一個程序，在這個程序中會通過反饋調節網路中各個節點的權重，以此達到最佳權重引數，在反饋中，loss value是起點，是衡量與label之間差距的值，Loss value 自然是loss function計算得出的，

TODO:

本章會講解常見的兩種loss function

然后會介紹梯度，梯度是用于修改節點權重的，

最后會實作backward，用mnist資料集訓練;

4.1 損失函式

損失函式的種類有很多，本案只介紹兩種損失函式：均方差、交叉熵；

然后會將交叉熵改寫成n-batch的交叉熵；

均方誤差（mean squared error）

wps11

實作如下：

  1 def mean_squared_error(y, t):
  2     return 0.5*np.sum((y-t)**2)

交叉熵誤差（cross entropy error）

交叉熵的公式和實作如下：

wps12

  1 def cross_entropy_error(y,t):
  2     delta = 1e-7
  3     return -np.sum(t*np.log(y+delta))

mini-batch版本的損失函式

上述的兩個損失函式都是針對于1-batch做的損失函式；如果輸入的資料是n-batch的話，上述的損失函式不太適用了；

需要對損失函式做一下修改，以交叉熵為例，n-batch的交叉熵公式和代碼實作如下：

wps13

  1 def cross_entropy_error(y,t):
  2 # batch版本的交叉熵
  3     if y.ndim == 1:
  4         t = t.reshape(1, t.size)
  5         y = y.reshape(1, y.size)
  6     batch_size = y.shape[0] # batch
  7 
  8     delta = 1e-7
  9     return -np.sum(t*np.log(y+delta))/batch_size
 10

4.2 梯度

本節介紹梯度法的實作，不涉及神經網路的反饋演算法，本節內容是為下一節反饋演算法做鋪墊，

神經網路學習的本質是根據資料的label和預測值的誤差，即loss value，然后根據誤差修改權重資訊，

資料的label和預測值的誤差可以使用損失函式來衡量，獲得loss value，

修改權重的資訊可以用求數學統計求極值的方式獲得，

已知導數為0的點為極值點，可以通過求導數 wps15 一次性找到極值點，但是這種方法在資料樣本或者W的規模相當大的情況下是無法計算的，在具有多個變數的情況下會計算多次的偏導數，可以想象到是一件耗時耗力的事情，

梯度法來恰恰可以彌補上述的缺陷，

4.2.1 實作梯度的計算

梯度法優勢在于可以一次計算出多個變數的偏導數，并匯總成向量，像 wps16 這種匯總而成的向量，稱為梯度（注：來自 deep learning from scratch），

接下來就是梯度的實作了，計算機是無法直接求出偏導數或者導數的，不過根據數學知識可以得到偏導數的近似值，因此可以：

wps17

對應的實作程式如下：

  1 def numerical_gradient_1d(f, x):
  2     '''
  3 數值微分，求f(x)的梯度
  4     :param f: 函式
  5     :param x: 梯度值
  6     :return: df在x處的導數
  7     '''
  8     h = 1e-4 # 0.0001
  9     grad = np.zeros_like(x)
 10 
 11     for idx in range(x.size):
 12         tmp_val = x[idx] # 快取原來的值
 13         x[idx] = float(tmp_val) + h
 14         fxh1 = f(x) # f(x+h)
 15 
 16         x[idx] = tmp_val - h
 17         fxh2 = f(x) # f(x-h)
 18         grad[idx] = (fxh1 - fxh2) / (2*h)
 19 
 20         x[idx] = tmp_val # 原來的值
 21 
 22     return grad

以上是1D的梯度下降，我們可以擴展成2D的梯度下降函式numerical_gradient_2d：

  1 def numerical_gradient_2d(f, X):
  2     if X.ndim == 1:
  3         return _numerical_gradient_1d(f, X)
  4     else:
  5         grad = np.zeros_like(X)
  6 
  7         for idx, x in enumerate(X):
  8             grad[idx] = _numerical_gradient_1d(f, x)
  9 
 10         return grad
 11

  1 
  2 def numerical_gradient(f, x):
  3     '''
  4     數值微分，求f(x)的梯度
  5     :param f:
  6     :param x:
  7     :return:
  8     '''
  9     h = 1e-4 # 0.0001
 10     grad = np.zeros_like(x)
 11 
 12     it = np.nditer(x, flags=['multi_index'], op_flags=['readwrite'])
 13     while not it.finished:
 14         idx = it.multi_index
 15         tmp_val = x[idx]
 16         x[idx] = tmp_val + h
 17         fxh1 = f(x) # f(x+h)
 18 
 19         x[idx] = tmp_val - h
 20         fxh2 = f(x) # f(x-h)
 21         grad[idx] = (fxh1 - fxh2) / (2*h)
 22 
 23         x[idx] = tmp_val # 恢復成原來的值
 24         it.iternext()
 25 
 26     return grad
 27

4.2.2 驗證梯度計算方法是否有用

作為一個嚴謹的程式員，自己寫的每一個模塊至少需要做一次驗證，以減少后期出現bug的除錯；

在此，簡單的驗證 wps18 的在點 wps19 上的梯度；

根據數學知識，可以知道梯度為： wps20

那么 wps21

下面開始驗證

片段1： function的實作：

  1 def function(x):
  2     return x[0]**2 + x[1]**2

function的坐標圖如下, 可見整個函式的最小值在（0，0）處，因此點（3，4）的梯度方向應該指向（0，0），即：梯度值應該為正數，

  1 from matplotlib import pyplot as plot  # 用來繪制圖形
  2 import numpy as np  # 用來處理資料
  3 from mpl_toolkits.mplot3d import Axes3D  # 用來給出三維坐標系，
  4 figure = plot.figure()
  5 
  6 # 畫出三維坐標系：
  7 axes = Axes3D(figure)
  8 X = np.arange(-10, 10, 0.25)
  9 Y = np.arange(-10, 10, 0.25)
 10 X, Y = np.meshgrid(X, Y) # 限定圖形的樣式是網格線的樣式：
 11 Z = function([X,Y])
 12 axes.plot_surface(X, Y, Z, cmap='rainbow') # 繪制曲面，采用彩虹色著色：
 13 # 圖形可視化：
 14 plot.show()
 15 
 16 if __name__ == '__main__':
 17     # show()
 18     print(numerical_gradient(function, np.array([3.0,4.0])))

對應的輸出：

[6. 8.]

驗證正確, 說明grad的書寫沒有問題，

4.2.3梯度法的實作

神經網路學習的主要任務是在學習程序中尋找最優引數，這些引數使得loss function 取得最小值，這里的神經網路可以用g(x)表示，

一般來說，神經網路的引數空間較大，損失函式也較為復雜，往往會通過梯度來尋找g(x)的最小值，但需要注意：梯度表示的各個點處函式值減小最多的方向，無法保證是真正的應該進行梯度下降的方向，

盡管如此，沿著梯度的方向依舊是可以最大限度的找到減小損失函式的值，通過不斷的向梯度的方向邁進，便會使得loss function逐漸減小(這個程序被稱為梯度法，gradient method)，

梯度法是解決機器學習中優化問題的常用方法，根據優化的目標可以分為：梯度下降法和梯度上升法，

用數學表達為:

wps22

利用上面的公式和numerical_gradient，梯度法的實作如下：

  1 def gradient_descent(f, init_x, learning_rate=0.01, step_num=200):
  2     '''
  3     通過一步一步的迭代，優化目標函式，找出使得目標函式最小的點
  4     :param f: 目標函式
  5     :param init_x: 初始位置
  6     :param learning_rate: 學習率
  7     :param step_num: 迭代次數
  8     :return:
  9     '''
 10     x = init_x
 11     for i in range(step_num):
 12         grad = numerical_gradient(f,x)
 13         x = x - learning_rate*grad # 公式的實作
 14     return x
 15 
 16 if __name__ == '__main__':
 17     # show()
 18     #print(numerical_gradient(function, np.array([3.0,4.0])))
 19     min_value = https://www.cnblogs.com/greentomlee/archive/2022/09/12/gradient_descent(function, init_x=np.array([3.0,4.0]))
 20     print(min_value)

4.3 學習演算法的實作

本節會闡述學習演算法的偽代碼，具體實作會在3.4給出，

神經網路的學習中使用到了梯度法（見3.2節），根據梯度法我們可以了解到神經網路學習的程序：可以按照以下4個步驟進行：

Step1：獲取minibatch:

從資料集中選取一部分資料，這部分資料稱為mini-batch，現在的目標就是減小minibatch 的loss fucntion value，

Step2：計算梯度值

為了減小loss function的值，求出各個權重引數的梯度，梯度是表示損失函式值減小最多的方向，

Step3：更新引數

梯度表示的是損失函式減小的方向；因此將權重引數沿著梯度所指的方向進行微小的更新，

Step4：迭代重復

重復Step1,2,3

以上使用的是梯度下降的方法，由于是隨機選擇的mini-batch資料（也就是說隨機選擇的初始點），所以稱為隨機梯度下降（stochastic gradient descent）注：見Deep Learning from Scratch

4.4 基于mnist資料集的神經網路的訓練

本節整合前面的代碼，實作一個兩層的神經網路，用mnist資料集訓練，來了解整個學習的程序，整個程序會盡量簡化.

為了簡化起見，網路輸入層是784*1，有兩個隱含層,神經元的數量分別是50和10; 由于輸出層和上一層的輸出數量一致,因此用恒等函式即可，損失函式使用交叉熵，

網路結構如下：

wps23

網路結構的實作：

  1 # -*- coding: utf-8 -*-
  2 # @File  : two_layer_net.py
  3 # @Author: lizhen
  4 # @Date  : 2020/1/28
  5 # @Desc  : 使用梯度的網路
  6 
  7 from src.common.functions import *
  8 from src.common.gradient import numerical_gradient
  9 import numpy as np
 10 
 11 class TwoLayerNet:
 12 
 13     def __init__(self, input_size, hidden_size, output_size, weight_init_std=0.01):
 14 
 15         self.params = {}
 16         self.params['W1'] = weight_init_std * np.random.randn(input_size, hidden_size)
 17         self.params['b1'] = np.zeros(hidden_size)
 18         self.params['W2'] = weight_init_std * np.random.randn(hidden_size, output_size)
 19         self.params['b2'] = np.zeros(output_size)
 20 
 21     def predict(self, x):
 22         W1, W2 = self.params['W1'], self.params['W2']
 23         b1, b2 = self.params['b1'], self.params['b2']
 24 
 25         a1 = np.dot(x, W1) + b1
 26         z1 = sigmoid(a1)
 27         a2 = np.dot(z1, W2) + b2
 28         y = softmax(a2)
 29 
 30         return y
 31 
 32     # x:輸入引數, t:label
 33     def loss(self, x, t):
 34         y = self.predict(x)
 35 
 36         return cross_entropy_error(y, t)
 37 
 38     def accuracy(self, x, t):
 39         y = self.predict(x)
 40         y = np.argmax(y, axis=1)
 41         t = np.argmax(t, axis=1)
 42 
 43         accuracy = np.sum(y == t) / float(x.shape[0])
 44         return accuracy
 45 
 46     # x:輸入引數, t:label
 47     def numerical_gradient(self, x, t):
 48         loss_W = lambda W: self.loss(x, t)
 49 
 50         grads = {}
 51         grads['W1'] = numerical_gradient(loss_W, self.params['W1'])
 52         grads['b1'] = numerical_gradient(loss_W, self.params['b1'])
 53         grads['W2'] = numerical_gradient(loss_W, self.params['W2'])
 54         grads['b2'] = numerical_gradient(loss_W, self.params['b2'])
 55 
 56         return grads
 57 
 58     def gradient(self, x, t):
 59         W1, W2 = self.params['W1'], self.params['W2']
 60         b1, b2 = self.params['b1'], self.params['b2']
 61         grads = {}
 62 
 63         batch_num = x.shape[0]
 64 
 65         # forward
 66         a1 = np.dot(x, W1) + b1
 67         z1 = sigmoid(a1)
 68         a2 = np.dot(z1, W2) + b2
 69         y = softmax(a2)
 70 
 71         # backward
 72         dy = (y - t) / batch_num
 73         grads['W2'] = np.dot(z1.T, dy)
 74         grads['b2'] = np.sum(dy, axis=0)
 75 
 76         dz1 = np.dot(dy, W2.T)
 77         da1 = sigmoid_grad(a1) * dz1
 78         grads['W1'] = np.dot(x.T, da1)
 79         grads['b1'] = np.sum(da1, axis=0)
 80 
 81         return grads

訓練代碼：

  1 # -*- coding: utf-8 -*-
  2 # @File  : train_neuralnet.py
  3 # @Author: lizhen
  4 # @Date  : 2020/2/2
  5 # @Desc  : 第三篇的實作： 利用梯度
  6 
  7 import numpy as np
  8 import matplotlib.pyplot as plt
  9 from src.datasets.mnist import load_mnist
 10 from src.test.two_layer_net import TwoLayerNet
 11 
 12 # 獲取訓練資料
 13 (x_train, t_train), (x_test, t_test) = load_mnist(normalize=True, one_hot_label=True)
 14 
 15 network = TwoLayerNet(input_size=784, hidden_size=50, output_size=10)
 16 
 17 iters_num = 10000  # 迭代次數
 18 train_size = x_train.shape[0]
 19 batch_size = 100
 20 learning_rate = 0.1
 21 
 22 train_loss_list = []
 23 train_acc_list = []
 24 test_acc_list = []
 25 
 26 iter_per_epoch = max(train_size / batch_size, 1)
 27 
 28 for i in range(iters_num):
 29     batch_mask = np.random.choice(train_size, batch_size)
 30     x_batch = x_train[batch_mask]
 31     t_batch = t_train[batch_mask]
 32 
 33     # 計算梯度
 34     grad = network.numerical_gradient(x_batch, t_batch)
 35     # grad = network.gradient(x_batch, t_batch) # 較快
 36 
 37     # 更新權重
 38     for key in ('W1', 'b1', 'W2', 'b2'):
 39         network.params[key] -= learning_rate * grad[key]
 40 
 41     loss = network.loss(x_batch, t_batch)
 42     train_loss_list.append(loss)
 43 
 44     if i % iter_per_epoch == 0:
 45         train_acc = network.accuracy(x_train, t_train)
 46         test_acc = network.accuracy(x_test, t_test)
 47         train_acc_list.append(train_acc)
 48         test_acc_list.append(test_acc)
 49         print("train acc, test acc , loss | " + str(train_acc) + ", " + str(test_acc)+", "+str(loss))
 50 
 51 
 52 # 繪制圖
 53 markers = {'train': 'o', 'test': 's'}
 54 x = np.arange(len(train_acc_list))
 55 plt.plot(x, train_acc_list, label='train acc')
 56 plt.plot(x, test_acc_list, label='test acc', linestyle='--')
 57 
 58 plt.xlabel("epochs")
 59 plt.ylabel("accuracy")
 60 plt.ylim(0, 1.0)
 61 plt.legend(loc='lower right')
 62 plt.show()
 63 
 64 # plt.plot(train_loss_list,label='loss value')
 65 # plt.show()

輸出：

wps24 wps25

Loss 曲線圖和acc變化曲線如下

wps26

2020年2月3日星期一

我心匪石，不可轉也，我心匪席，不可卷也，

轉載請註明出處，本文鏈接：https://www.uj5u.com/qita/506555.html

標籤：其他

上一篇：UE4 C++ 淺析委托并實作一個簡單的例子

下一篇：動手實作深度學習（12）：卷積層的實作與優化（img2col）