回圈神經網路--SimpleRNN與PyTorch實作-有解無憂

PyTorch中用于SimpleRNN的方法主要是nn.RNN及nn.RNNCell，兩者的區別是前者輸入一個序列，而后者輸入單個時間步，必須我們手動完成時間步之間的操作，前者比較簡單，為了能更深入地了解SimpleRNN的運作程序，我決定用兩種方法都呈現一下，
———————————————————————————————————————

from torch import nn

nn.RNNCell(input_size: int, hidden_size: int, bias: bool = True, nonlinearity: str = 'tanh')

這是初始化RNNCell需要的一些引數，官方檔案中給出了詳細的解釋：
以上來自PyTorch官方檔案
從上圖還可以看到PyTorch官方檔案中給出的公式，但我個人覺得，這里可以把兩個偏置合為一個偏置，事實上在花書中也確實是這么給公式的：
在這里插入圖片描述
RNN詳細的來源、發展程序、各種變體大家感興趣的可以去看相關專著或blog，這里不贅述了，直接看一個例子吧~

x x x表示待輸入序列，序列長度是4，batch_size是1，特征數是2；
h h h表示隱藏單元，初始狀態 h ( 0 ) h^{(0)} h(0)是0，形狀是(1,1);
o o o表示輸出，序列長度是4，為了簡化計算，我用的是relu作為激活函式；
W W W是隱藏單元之間的連接權，形狀是（1,1）,值是[[2]]，PyTorch中可以通過rnn_cell.weight__hh.data訪問及設定；
U U U是輸入與隱藏單元之間的連接權，形狀是(1,2),值是[[-1,3]]PyTorch中可以通過rnn_cell.weight__ih.data訪問及設定；
為了簡化計算，這里不設定偏置，故bias是None，

人工計算程序如下：
在這里插入圖片描述
注意到h都是正數，激活函式是ReLU，所以這里激活與否不會影響最終結果，

PyTorch實作如下：

import torch
from torch import nn
from torch.autograd import Variable

# simpleRNN cell
seq, batch_size = 4, 1
input_size, hidden_size = 2,1
rnn_cell = nn.RNNCell(input_size=input_size, hidden_size=hidden_size,
                      bias=False,nonlinearity='relu').cuda()
rnn_cell.weight_ih.data = torch.Tensor([[-1,3]]).cuda()
rnn_cell.weight_hh.data = torch.Tensor([[2]]).cuda()
x = Variable(torch.Tensor([[[1,2]],[[2,0]],[[3,1]],[[-1,-5]]])).cuda()
hx = Variable(torch.Tensor([[0]])).cuda()
print('this is x:\n',x)
print('this is h0:\n',hx)
output = []
for step in range(seq):
    hx = rnn_cell(x[step],hx)  # 當前得到的hx作為下個時間步的輸入
    output.append(hx)
print('these are hx')
for i in range(seq):
    print(output[i])

列印結果：
在這里插入圖片描述
用RNN不用RNNCell的代碼如下：

# RNN
# RNN
seq, batch_size = 4, 1
input_size, hidden_size = 2, 1
rnn = nn.RNN(input_size=input_size, hidden_size=hidden_size,
             bias=False, nonlinearity='relu', num_layers=1)
rnn.weight_ih_l0.data = torch.Tensor([[-1, 3]]).cuda()
rnn.weight_hh_l0.data = torch.Tensor([[2]]).cuda()
rnn.flatten_parameters()
x = Variable(torch.Tensor([[[1, 2]], [[2, 0]], [[3, 1]], [[-1, -5]]])).cuda()
hx = Variable(torch.Tensor([[[0]]])).cuda()
print('this is x:\n', x)
print('this is h0:\n', hx)
output, hn = rnn(x,hx)
print('these are hx')
print(output)
print('this is hn')
print(hn)

一個主要的區別在于這里得指定層數num_layers,那么順理成章地，hx應該有三層括號了，形狀是(1,1,1)，也就是(num_layers*num_directions, batch_size, hidden)
關于為什么要呼叫flatten_parameters()方法，參考這個大佬的博客
不用flatten_parameters()就會給我彈出一個warning，所以最好還是加一下，
然后就是rnn(x,hx)會有兩個輸出，一個是 o ( i ) , i = 1 , 2 , 3 , 4 o^{(i)}, i=1,2,3,4 o(i),i=1,2,3,4構成的序列，另一個是 h ( 4 ) h^{(4)} h(4)，也就是 o ( 4 ) o^{(4)} o(4)，用于下一個時間步的輸入（雖然這里已經結束了），
列印結果：

在這里插入圖片描述
以上結果均與人工計算結果相符?

轉載請註明出處，本文鏈接：https://www.uj5u.com/houduan/230999.html

標籤：python

上一篇：Python爬蟲---王者榮耀(最詳細)

下一篇：[Python] 函式增強之柯里化-裝飾器(詳細)