記得剛開始學TensorFlow的時候，那給我折磨的呀，我一直在想這個TensorFlow官方為什么搭建個網路還要畫什么靜態圖呢，把簡單的事情弄得麻煩死了，直到這幾天我開始接觸Pytorch，發現Pytorch是就是不用搭建靜態圖的Tensorflow版本，就想在用numpy一樣，并且封裝了很多深度學習高級API，numpy資料和Tensor資料相互轉換不用搭建會話了，只需要一個轉換函式，搭建起了numpy和TensorFlow愛的橋梁，

　　Pytorch自17年推出以來，一度有趕超TensorFlow的趨勢，是因為Pytorch采用動態圖機制，替代Numpy使用GPU的功能，搭建網路靈活，

Pytorch和TensorFlow的區別：

TensorFlow是基于靜態計算圖的，靜態計算圖是先定義后運行，一次定義多次運行（Tensorflow 2.0也開始使用動態計算圖）
PyTorch是基于動態圖的，是在運行的程序中被定義的，在運行的時候構建，可以多次構建多次運行

上手難度：tensorflow 1 > tensorflow 2 > pytorch

工業界：tensorflow 1>tensorflow 2 > pytorch

學術界：pytorch > tesnroflow 2 > tesnorflow 1(已經被谷歌拋棄)

pytorch的優點

GPU加速
自動求導
常用網路層

如何表達字串：1、one-hot，如果詞庫很大會造成矩陣很大，占記憶體，2、embeding（word2vec、glove）

pytorch的資料型別

Data type	CPU tensor	GPU tensor
torch.float32	torch.FloatTensor	torch.cuda.FloatTensor
torch.floar64	torch.DoubleTensor	torch.cuda.DoubleTensor
torch.int32	torch.IntTensor	torch.cuda.IntTensor
torch.int64	torch.LongTensor	torch.cuda.LongTensor

即便是同一個變數同時部署在CPU和GPU上面是不一樣的

這篇文章的所有代碼，請務必手敲！！！

安裝

這個網址包含pytorch與cuda的對應關系

由于我的cuda是 10.0

我選擇的安裝命令是：

pip install torch==1.4.0 torchvision==0.5.0 -f https://download.pytorch.org/whl/torch_stable.html

pip install torch==1.4.0+cu100 torchvision==0.5.0+cu100 -f https://download.pytorch.org/whl/torch_stable.html

pip install pytorch torchvision cuda100 -c pytorch

官網給的安裝命令是：

pip install torch==1.2.0 torchvision==0.4.0 -f https://download.pytorch.org/whl/torch_stable.html

創建張量

　　torch的資料型別torch.float32、torch.floar64、torch.float16、torch.int8、torch.int16、torch.int32、torch.int64，當資料在GPU上時，資料型別需要加上cuda，例：torch.cuda.FloatTensor

tesnor.dim()：求張量的階數

tensor.shape/tensor.size()：獲取張量的shape

tensor.reshape()/tensor.view()：修改張量的shape

tesnor.item()：如果我們的張量只有一個數值，可以使用.item()獲取，多用于神經網路損失值

import torch

a = torch.randn(2, 3)
print(a.type())  # torch.FloatTensor
a = a.cuda()
print(a.type())  # torch.cuda.FloatTensor

0階\0維張量

a = torch.tensor(1.3)
print(a)  # tensor(1.3000)
print(a.shape)  # torch.Size([])
print(a.size())  # torch.Size([])
print(len(a.shape))  # 0

1階張量

a = torch.tensor([1.1])
b = torch.tensor([1.1, 2.2])

print(a)    # tensor([1.1000])
print(b)    # tensor([1.1000, 2.2000])
print(torch.FloatTensor(1)) # 1個亂數 
# tensor([0.])
print(torch.FloatTensor(2,2))   # 兩行兩列的亂數
# tensor([[ 0.0000e+00,  0.0000e+00],
#         [-1.5690e-31,  4.5909e-41]])

將numpy資料轉換為torch資料

x = np.array([[1, 2], [3, 4]])
y = torch.from_numpy(x)  # 轉換為 torch資料
z = y.numpy()  　　　　　　# 轉換為 numpy 資料

2階張量

a = torch.randn(2,3)
print(a)
# tensor([[ 0.6078, -0.0300,  0.8677],
#         [ 0.4085, -2.9589, -0.0837]])
print(a.shape)  # torch.Size([2, 3])

print(a.size(0))    # 2
print(a.size(1))    # 3
print(a.shape[1])   # 3

3階

　　多用于RNN、input、batch

a = torch.rand(1, 2, 3)
print(a)
# tensor([[[0.9094, 0.8469, 0.2710],
#          [0.7306, 0.3788, 0.9464]]])
print(a.shape)  # torch.Size([1, 2, 3])
print(a[0])
# tensor([[0.9094, 0.8469, 0.2710],
#         [0.7306, 0.3788, 0.9464]])

4階

a = torch.rand(2,3,28,28)
print(a)
# tensor([[[[0.7709, 0.7338, 0.7997,  ..., 0.2025, 0.2864, 0.1989],
#           [0.8917, 0.0605, 0.4229,  ..., 0.7809, 0.9371, 0.0639],
#           [0.1946, 0.0166, 0.9767,  ..., 0.9593, 0.2626, 0.9426],
#           ...,
#           ...,
#           [0.4095, 0.3448, 0.9255,  ..., 0.6079, 0.8952, 0.0065],
#           [0.7871, 0.9928, 0.6331,  ..., 0.7503, 0.6191, 0.8684],
#           [0.5359, 0.3247, 0.8745,  ..., 0.7800, 0.8538, 0.7997]]]])

print(a.shape)  # torch.Size([2, 3, 28, 28])
print(a.dim())  # 4

將numpy資料轉換為torch資料

torch.from_numpy()

a = np.array([2, 3, 3])
print(torch.from_numpy(a))
# tensor([2, 3, 3], dtype=torch.int32)

a = np.ones([2, 3])
print(torch.from_numpy(a))
# tensor([[1., 1., 1.],
#         [1., 1., 1.]], dtype=torch.float64)

直接指定tensor的數值

print(torch.tensor([2.,3.2]))
# tensor([2.0000, 3.2000])
print(torch.FloatTensor([2.,3.2]))
# tensor([2.0000, 3.2000])
print(torch.tensor([[2.,3.2],[1.,22.3]]))
# tensor([[ 2.0000,  3.2000],
#         [ 1.0000, 22.3000]])

定義未初始化張量

print(torch.empty(2,3))
# tensor([[2.5657e-05, 6.3199e-43, 2.5657e-05],
#         [6.3199e-43, 2.5855e-05, 6.3199e-43]])
print(torch.FloatTensor(2,3))
# tensor([[2.5804e-05, 6.3199e-43, 8.4078e-45],
#         [0.0000e+00, 1.4013e-45, 0.0000e+00]])
print(torch.IntTensor(2,3))
# tensor([[937482688,       451,         1],
#         [        0,         1,         0]], dtype=torch.int32)

設定tensor資料的默認型別type

print(torch.tensor([1.2,3]).type())

torch.set_default_tensor_type(torch.DoubleTensor)
print(torch.tensor([1.3,3]).type())

torch.ones(size)/zero(size)/eye(size)：回傳全為1/0/單位對角張量

>>> torch.ones(2, 3)
tensor([[ 1.,  1.,  1.],
        [ 1.,  1.,  1.]])

>>> torch.ones(5)
tensor([ 1.,  1.,  1.,  1.,  1.])

torch.full(size, fill_value)：回傳以size大小填充fill_value的張量

torch.rand(size)：回傳[0, 1)之間的均勻分布張量

torch.randn(size)：均值為0，方差為1的正態分布

torch.*_like(input)：回傳一個和input shape一樣的張量，*可以為rand、randn...

torch.randint(low=0, high, size)：回傳shape=size，[low, high)之間的隨機整數

torch.arange()：和np.arange類似用法

torch.linspace(start, end, step=1000)：回傳start和end之間等距steps點的一維步長張量，

torch.logspace(start, end, steps=1000, base=10.0)：回傳$base^{start}$和$base^{end}$之間等距steps點的一維步長張量，

torch.randperm(n)：回傳從0到n-1的整數的隨機排列

b = torch.rand(4)
idx = torch.randperm(4)
print(b)    # tensor([0.0224, 0.7826, 0.5529, 0.2261])
print(idx)  # tensor([0, 2, 1, 3])
print(b[idx])   # tensor([0.5573, 0.6121, 0.6581, 0.1892])

索引與切片操作

a = torch.rand(4, 3, 28, 28)
print(a.shape)  # torch.Size([4, 3, 28, 28])

# 索引
print(a[0, 0].shape)  # torch.Size([28, 28])
print(a[0, 0, 2, 4])  # tensor(0.1152)

#  切片
print(a[:2].shape)  # torch.Size([2, 3, 28, 28])
print(a[:2, :2, :, :].shape)  # torch.Size([2, 2, 28, 28])
print(a[:2, -1:, :, :].shape)  # torch.Size([2, 1, 28, 28])

# ...的用法
print(a[...].shape)  # torch.Size([4, 3, 28, 28])
print(a[0, ...].shape)  # torch.Size([3, 28, 28])
print(a[:, 1, ...].shape)  # torch.Size([4, 28, 28])
print(a[..., :2].shape)  # torch.Size([4, 3, 28, 2])

掩碼取值

x = torch.rand(3, 4)
# tensor([[0.0864, 0.8583, 0.9847, 0.6263],
#         [0.4546, 0.1105, 0.5902, 0.7919],
#         [0.3894, 0.8882, 0.3354, 0.1561]])

mask = x.ge(0.5)    # ge 是符號 >
# tensor([[False,  True,  True,  True],
#         [False, False,  True,  True],
#         [False,  True, False, False]])

print(torch.masked_select(x, mask))
# tensor([0.8583, 0.9847, 0.6263, 0.5902, 0.7919, 0.8882])

通過torch.take取值

src = https://www.cnblogs.com/LXP-Never/p/torch.tensor([[4,3,5],[6,7,8]])
print(torch.take(src, torch.tensor([0,2,5])))
# tensor([4, 5, 8])

維度變換

torch tensor有兩個維度變換方法

tensor.reshape()
tensor.view()

squeeze：去除對應維度為1的維度

unsqueeze：往對應位置索引插入一個維度

a = torch.rand(4, 1, 28, 28)
b = a.unsqueeze(0)
print(b.shape)  # torch.Size([1, 4, 1, 28, 28])

a = torch.rand(4, 1, 1, 28)
b = a.squeeze()
print(b.shape)  # torch.Size([4, 28])
b = a.squeeze(1)
print(b.shape)  # torch.Size([4, 1, 28])

tensor.expand(*size)：回傳具有單個尺寸擴展到更大尺寸的張量

tesnor.repeat(*size)：沿指定尺寸重復此張量

x = torch.tensor([[1], [2], [3]])
print(x.size())  # torch.Size([3, 1])
print(x.expand(3, 4))
# tensor([[ 1,  1,  1,  1],
#         [ 2,  2,  2,  2],
#         [ 3,  3,  3,  3]])
print(x.expand(-1, 4))  # -1表示不改變維度的大小
# tensor([[ 1,  1,  1,  1],
#         [ 2,  2,  2,  2],
#         [ 3,  3,  3,  3]])

b = torch.tensor([1, 2, 3])  # torch.Size([1, 3])
print(b.repeat(4, 2).shape)  # torch.Size([4, 6])
print(b.repeat(4, 2, 1).size())  # torch.Size([4, 2, 3])

tensor.transpose()：調換張量指定維度的順序

tensor.permute()：將張量按指定順序排列

b = torch.rand(4, 3, 28, 32)

print(b.transpose(1, 3).shape)  # torch.Size([4, 32, 28, 3])
print(b.permute(0, 2, 3, 1).shape)  # torch.Size([4, 28, 32, 3])

數學運算

加法

a = torch.rand(3,4)
b = torch.rand(4)   # dim=1
# 加法
print(a+b)
print(torch.add(a,b))
# 減法
print(a-b)
print(torch.sub(a,b))

# 乘法
a = torch.rand(2,3)
b = torch.rand(2,3)   # dim=1
print(a*b)  # 對應元素相乘

a = torch.rand(2,3)
b = torch.rand(3,2)   # dim=1
# 矩陣相乘
print(torch.mm(a,b))    # shape=(2,2)
print(torch.matmul(a,b))    # shape=(2,2)
print(a@b)  # shape=(2,2)

平方

a = torch.full([2,2], 2)    # 創建一個shape=[2,2]值為2的陣列
print(a.pow(2))
print(a**2)

平方根

a = torch.full([2,2], 4)    # 創建一個shape=[2,2]值為2的陣列
print(a.sqrt())    # 平方根
print(a**(0.5))

e的指數冥：torch.exp()

取對數：torch.log()

tensor.floor()：向下取整

tensor.ceil() ：向上取整

tensor.round()：四舍五入

tensor.trunc()：取整數值

tensor.frac()：取小數值

tensor.clamp(min,max)：不足最小值的變成最小值，大于最大值的變成最大值

torch.mean()：求均值

torch.sum()：求和

torch.max\torch.min：求最大最小值

torch.prod(input, dtype=None) ：回傳input中所有元素的乘積

torch.argmin(input)\torch.argmax(input)：回傳input張量中所有元素的最小值\最大值的索引

torch.where(condition, x, y)：如果符合條件回傳x，如果不符合條件回傳y

torch.gather(input, dim, index)：沿dim指定的軸收集值

input：輸入tensor
dim：索引所沿的軸
index：要收集的元素的索引

t = torch.tensor([[1,2],[3,4]])
torch.gather(t, 1, torch.tensor([[0,0],[1,0]]))
# tensor([[ 1,  1],
#         [ 4,  3]])

autograd：自動求導

autograd 包為張量上的所有操作提供了自動求導機制，它是一個在運行時定義(define-by-run）的框架，這意味著反向傳播是根據代碼如何運行來決定的，并且每次迭代可以是不同的.

讓我們用一些簡單的例子來看看吧，

張量

　　如果torch.Tesnor 的屬性 .requires_grad 設定為True，那么autograd 會追蹤對于該張量的所有操作，當完成計算后可以通過呼叫 .backward()，來自動計算所有的梯度，這個 torch.Tensor 張量的所有梯度將會自動累加到 .grad屬性上，

　　如果要阻止一個張量被跟蹤歷史，可以呼叫 .detach() 方法將其與計算歷史分離，并阻止它未來的計算記錄被跟蹤，

為了防止跟蹤歷史記錄(和使用記憶體），可以將代碼塊包裝在 with torch.no_grad(): 中，在評估模型時特別有用，因為模型可能具有 requires_grad = True 的可訓練的引數，但是我們不需要在此程序中對他們進行梯度計算，

Tensor 和 Function 互相連接生成了一個無圈圖(acyclic graph)，它編碼了完整的計算歷史，每個張量都有一個 .grad_fn 屬性，該屬性參考了創建 Tensor 自身的Function(除非這個張量是用戶手動創建的，即這個張量的 grad_fn 是 None )，

如果需要計算導數，可以在 Tensor 上呼叫 .backward()，如果 Tensor 是一個標量(即它包含一個元素的資料），則不需要為 backward() 指定任何引數，但是如果它有更多的元素，則需要指定一個 gradient 引數，該引數是形狀匹配的張量，

有一部分有點難，需要多看幾遍，

https://pytorch.apachecn.org/docs/1.2/beginner/blitz/autograd_tutorial.html

如果設定torch.tensor_1(requires_grad=True)，那么會追蹤所有對該張量tensor_1的所有操作，

import torch

# 創建一個張量并設定 requires_grad=True 用來追蹤他的計算歷史
x = torch.ones(2, 2, requires_grad=True)
print(x)
# tensor([[1., 1.],
#         [1., 1.]], requires_grad=True)

當Tensor完成一個計算程序，每個張量都會自動生成一個.grad_fn屬性

# 對張量進行計算操作，grad_fn已經被自動生成了，
y = x + 2
print(y)
# tensor([[3., 3.],
#         [3., 3.]], grad_fn=<AddBackward>)
print(y.grad_fn)
# <AddBackward object at 0x00000232535FD860>

# 對y進行一個乘法操作
z = y * y * 3
out = z.mean()

print(z)
# tensor([[27., 27.],
#         [27., 27.]], grad_fn=<MulBackward>) 
print(out)
# tensor(27., grad_fn=<MeanBackward1>)

.requires_grad_(...) 可以改變張量的requires_grad屬性，

import torch

a = torch.randn(2, 2)
a = ((a * 3) / (a - 1))
print(a.requires_grad)      # 默認是requires_grad = False
a.requires_grad_(True)
print(a.requires_grad)      # True
b = (a * a).sum()
print(b.grad_fn)        # <SumBackward0 object at 0x000002325360B438>

梯度

回顧到上面

import torch

# 創建一個張量并設定 requires_grad=True 用來追蹤他的計算歷史
x = torch.ones(2, 2, requires_grad=True)
print(x)
# tensor([[1., 1.],
#         [1., 1.]], requires_grad=True)


# 對張量進行計算操作，grad_fn已經被自動生成了，
y = x + 2
print(y)
# tensor([[3., 3.],
#         [3., 3.]], grad_fn=<AddBackward>)
print(y.grad_fn)
# <AddBackward object at 0x00000232535FD860>

# 對y進行一個乘法操作
z = y * y * 3
out = z.mean()

print(z)
# tensor([[27., 27.],
#         [27., 27.]], grad_fn=<MulBackward>)
print(out)
# tensor(27., grad_fn=<MeanBackward1>)

讓我們來反向傳播，運行 out.backward() ，等于out.backward(torch.tensor(1.))

對out進行反向傳播，$out = \frac{1}{4}\sum_i z_i$，其中$z_i = 3(x_i+2)^2$，因為方向傳播中torch.tensor=1（out.backward中的引數）因此$z_i\bigr\rvert_{x_i=1} = 27$

對于梯度$\frac{\partial out}{\partial x_i} = \frac{3}{2}(x_i+2)$，把$x_i=1$代入$\frac{\partial out}{\partial x_i}\bigr\rvert_{x_i=1} = \frac{9}{2} = 4.5$

print(out)  # tensor(27., grad_fn=<MeanBackward1>)

print("*"*50)
out.backward()
# 列印梯度
print(x.grad)
# tensor([[4.5000, 4.5000],
#         [4.5000, 4.5000]])

對吃栗子找到規律，才能看懂

import torch

x = torch.randn(3, requires_grad=True)

y = x * 2
while y.data.norm() < 1000:
    y = y * 2

print(y)  # tensor([-920.6895, -115.7301, -867.6995], grad_fn=<MulBackward>)
gradients = torch.tensor([0.1, 1.0, 0.0001], dtype=torch.float)

# 把gradients代入y的反向傳播中
y.backward(gradients)   

# 計算梯度
print(x.grad)   # tensor([ 51.2000, 512.0000,   0.0512])

為了防止跟蹤歷史記錄，可以將代碼塊包裝在with torch.no_grad()：中，在評估模型時特別有用，因為模型的可訓練引數的屬性可能具有requires_grad = True，但是我們不需要梯度計算，

print(x.requires_grad)      # True
print((x ** 2).requires_grad)   # True

with torch.no_grad():
    print((x ** 2).requires_grad)   # False

神經網路

神經網路是基于自動梯度 (autograd)來定義一些模型，一個 nn.Module 包括各個層和一個 forward(input) 方法，它會回傳output，

神經網路訓練程序包括以下幾點：

定義一個包含可訓練引數的神經網路
迭代整個輸入
通過神經網路處理輸入
計算損失(loss)
反向傳播梯度到神經網路的引數
更新網路的引數，典型的用一個簡單的更新方法：weight = weight - learning_rate *gradient

定義網路

　　我們先來定義一個網路，處理輸入，呼叫backword

import torch
import torch.nn as nn
import torch.nn.functional as F


class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        # 輸入影像channel:1，輸出channel:6; 5*5卷積核
        self.conv1 = nn.Conv2d(1, 6, 5)
        self.conv2 = nn.Conv2d(6, 16, 5)
        self.fc1 = nn.Linear(16 * 5 * 5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        # 前向傳播
        x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2))
        # 如果核大小是正方形，則只能指定一個數字
        x = F.max_pool2d(F.relu(self.conv2(x)), 2)
        x = x.view(-1, self.num_flat_features(x))   # reshape 成二維，方便做全連接操作
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

    def num_flat_features(self, x):
        size = x.size()[1:]  # 除去 batch 維度的其他維度
        num_features = 1
        for s in size:
            num_features *= s
        return num_features


net = Net()     # 列印模型結構
print(net)
# Net(
#   (conv1): Conv2d(1, 6, kernel_size=(5, 5), stride=(1, 1))
#   (conv2): Conv2d(6, 16, kernel_size=(5, 5), stride=(1, 1))
#   (fc1): Linear(in_features=400, out_features=120, bias=True)
#   (fc2): Linear(in_features=120, out_features=84, bias=True)
#   (fc3): Linear(in_features=84, out_features=10, bias=True))

我們只需要定義 forward 函式，backward函式會在使用autograd時自動被定義，backward函式用來計算導數，可以在 forward 函式中使用任何針對張量的操作和計算，

一個模型可訓練的引數可以通過呼叫 net.parameters() 回傳：

params = list(net.parameters())
print(len(params))       # 10
print(params[0].size())  # 第一個卷積層的權重 torch.Size([6, 1, 3, 3])

讓我們隨機生成一個 shape=(1, 1, 32, ,32) 的資料輸入模型

# 輸入shape格式：[batch_size, nChannels, Height, Width]
input = torch.randn(1, 1, 32, 32)
out = net(input)
print(out.shape)    # torch.Size([1, 10])
print(out)
# tensor([[ 0.0227, -0.0823,  0.1665,  0.2153,  0.0732,  0.0471,  0.0019,  0.0664,
#          -0.0576,  0.0238]], grad_fn=<AddmmBackward>)

清零所有引數的梯度快取，然后進行隨機梯度的反向傳播：

net.zero_grad()     # 把所有引數梯度快取器置零
out.backward(torch.randn(1, 10))    # 用隨機的梯度來反向傳播

損失函式

　　我們這里計算均方誤差 $loss=nn.MSELoss(模型預測值-目標)$

output = net(input)             # torch.Size([1, 10])
target = torch.randn(10)        # 生成一個隨機資料作為target
target = target.reshape(1,-1)   # [1, 10]
mse_loss = nn.MSELoss()
loss_value = mse_loss(output, target)
print(loss_value)   # tensor(0.5513, grad_fn=<MseLossBackward>)

現在，如果使用loss的.grad_fn屬性跟蹤反向傳播程序，會看到計算圖如下：

input -> conv2d -> relu -> maxpool2d -> conv2d -> relu -> maxpool2d
      -> view -> linear -> relu -> linear -> relu -> linear
      -> MSELoss
      -> loss

所以，當我們呼叫loss.backward()，整張圖開始關于loss微分，圖中所有設定了requires_grad=True的張量的.grad屬性累積著梯度張量，

為了說明這一點，讓我們向后跟蹤幾步：

print(loss.grad_fn)  # MSELoss
# <MseLossBackward object at 0x7fab77615278>
print(loss.grad_fn.next_functions[0][0])  # Linear
# <AddmmBackward object at 0x7fab77615940>
print(loss.grad_fn.next_functions[0][0].next_functions[0][0])  # ReLU
# <AccumulateGrad object at 0x7fab77615940>

反向傳播

　　為了實作損失函式的梯度反向傳播，我們需要使用 loss.backward() 來反向傳播權重，我們需要清零現有的梯度，否則梯度將會與已有的梯度累加，

現在，我們將呼叫loss.backward()，并查看conv1層的偏置(bias）在反向傳播前后的梯度，

net.zero_grad()     # 清零所有引數的梯度
print('反向傳播之前的 conv1.bias.grad 梯度')
print(net.conv1.bias.grad)
# tensor([0., 0., 0., 0., 0., 0.])

loss.backward()

print('反向傳播之后的 conv1.bias.grad 梯度')
print(net.conv1.bias.grad)
# tensor([-0.0118,  0.0125, -0.0085, -0.0225,  0.0125,  0.0235])

更新權重

最簡單的更新規則是隨機梯度下降法(SGD）:

weight = weight - learning_rate * gradient

我們先簡單的的使用python代碼來實作一下：

learning_rate = 0.01
for f in net.parameters():
    f.data.sub_(f.grad.data * learning_rate)

方式在pytorch中官方有一個優化器包torch.optim，包含了不同的優化器：SGD、Nesterov-SGD、Adam、RMSProp等，使用方法如下

import torch.optim as optim

optimizer = optim.SGD(net.parameters(), lr=0.01)    # 創建 SGD 優化器

optimizer.zero_grad()   # 清零梯度快取
output = net(input)
loss = criterion(output, target)    # 損失函式
loss.backward()     # 損失函式的梯度反向傳播
optimizer.step()    # 更新引數

影像分類器

torch有一個叫做totchvision 的包，支持加載類似Imagenet，CIFAR10，MNIST 等公共資料集的資料加載模塊 torchvision.datasets

支持加載影像資料資料轉換模塊 torch.utils.data.DataLoader，

本節我們使用CIFAR10資料集，它包含十個類別：‘airplane’, ‘automobile’, ‘bird’, ‘cat’, ‘deer’, ‘dog’, ‘frog’, ‘horse’, ‘ship’, ‘truck’，CIFAR-10 中的影像尺寸為33232，也就是RGB的3層顏色通道，每層通道內的尺寸為32*32，

訓練一個影像分類器

pytorch創建了一個包torchvision，其中包含了針對Imagenet、CIFAR10、MNIST等常用資料集的資料加載器(data loaders），還有對圖片資料變形的操作，

這一節我們將訓練一個cifar10的影像分類器，步驟如下：

使用torchvision加載并且歸一化CIFAR10的訓練和測驗資料集
定義一個卷積神經網路
定義一個損失函式
在訓練樣本資料上訓練網路
在測驗樣本資料上測驗網路

1.加載并標準化CIFAR10

　　使用torchvision加載CIFAR10，torchvision 資料集的輸出是范圍在[0,1]之間的 PILImage，我們將他們轉換成歸一化范圍為[-1, 1]之間的張量 Tensors，

import torch
import torchvision
import torchvision.transforms as transforms

transform = transforms.Compose(
    [transforms.ToTensor(),
     transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])

# 下載訓練資料集
trainset = torchvision.datasets.CIFAR10(root='./data', train=True, download=True, transform=transform)
# 下載測驗資料集
testset = torchvision.datasets.CIFAR10(root='./data', train=False, download=True, transform=transform)
# 訓練集加載器
trainloader = torch.utils.data.DataLoader(trainset, batch_size=4, shuffle=True, num_workers=0)
# 測驗集加載器
testloader = torch.utils.data.DataLoader(testset, batch_size=4, shuffle=False, num_workers=0)

classes = ('plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck')

現在讓我們可視化部分訓練資料

import matplotlib.pyplot as plt
import numpy as np


def imshow(img):
    img = img / 2 + 0.5  # 去標準化
    npimg = img.numpy()  # 轉換成numpy資料
    plt.imshow(np.transpose(npimg, (1, 2, 0)))
    plt.show()

# 隨機獲取圖片
dataiter = iter(trainloader)
images, labels = dataiter.next()

# 顯示圖片
imshow(torchvision.utils.make_grid(images))
# 列印圖片標簽
print("".join("%6s" % classes[labels[j]] for j in range(4)))
#   frog truck  bird   cat

2.定義卷積神經網路

將上一節定義的神經網路拿過來，并將其修改成輸入為3通道影像

import torch.nn as nn
import torch.nn.functional as F


class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(3, 6, 5)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(6, 16, 5)
        self.fc1 = nn.Linear(16 * 5 * 5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = x.view(-1, 16 * 5 * 5)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x


net = Net()

3.定義損失函式和優化器

　　我們使用分類的交叉熵損失函式和隨機梯度下降SGD(使用momentum），

import torch.optim as optim

criterion = nn.CrossEntropyLoss()   # 損失函式實體化
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)     # 優化器實體化

4.訓練網路

　　遍歷我們的資料迭代器，并將輸入“喂”給網路和優化函式，

for epoch in range(2):
    running_loss = 0.0
    for i, data in enumerate(trainloader, 0):
        inputs, labels = data   # 獲取資料
        optimizer.zero_grad()   # 清零引數的梯度

        # --- forward + backward + optimize ---  #
        outputs = net(inputs)   # 前向傳播
        loss = criterion(outputs, labels)   # 計算損失
        loss.backward()     # 損失梯度反向傳播
        optimizer.step()    # 引數更新

        running_loss += loss.item()
        # 每2000個小批量列印一次
        if i % 2000 == 1999:
            print("[%d, %5d] loss: %.3f" % (epoch + 1, i+1, running_loss/2000))
            running_loss = 0.0
print("訓練完成")
# [1,  2000] loss: 2.182
# [1,  4000] loss: 1.819
# [1,  6000] loss: 1.648
# [1,  8000] loss: 1.569
# [1, 10000] loss: 1.511
# [1, 12000] loss: 1.473
# [2,  2000] loss: 1.414
# [2,  4000] loss: 1.365
# [2,  6000] loss: 1.358
# [2,  8000] loss: 1.322
# [2, 10000] loss: 1.298
# [2, 12000] loss: 1.282
# 訓練完成

5.使用測驗資料測驗網路

　　接來下對比神經網路輸出的標簽和正確樣本(ground-truth），并檢測網路的預測精度，如果預測是正確的，我們將樣本添加到正確預測的串列中，

第一步，我們先顯示測驗集中的影像

# 測驗資料集
dataiter = iter(testloader)
images, labels = dataiter.next()

# Ground Truth
imshow(torchvision.utils.make_grid(images))
print('GroundTruth: ', ' '.join('%5s' % classes[labels[j]] for j in range(4)))
# GroundTruth:    cat  ship  ship plane

# 預測值
outputs = net(images)
_, predicted = torch.max(outputs, 1)
print('Predicted: ', ' '.join('%5s' % classes[predicted[j]] for j in range(4)))
# Predicted:    dog  ship  ship plane

預測對了兩個，讓我們看看網路在整個資料集上的表現，

correct = 0
total = 0
with torch.no_grad():
    for data in testloader:
        images, labels = data   # 測驗集圖片和標簽
        # images.shape torch.Size([4, 3, 32, 32])
        # labels.shape torch.Size([4])
        outputs = net(images)   # 預測標簽 torch.Size([4, 10])
        # .data 回傳和outputs的相同資料tensor, 但不會加入到x的計算歷史里
        _, predicted = torch.max(outputs.data, dim=1)   # 回傳最大值和最大值的索引
        total += labels.size(0) # 計算一共有多少個測驗資料集
        # .item()得到一個元素張量里面的元素值
        correct += (predicted == labels).sum().item()   # 預測正確的數量
print('該網路對10000張測驗影像的精度: %d %%' % (100 * correct / total))
# 該網路對10000張測驗影像的精度: 55 %

正確率有55%，看來網路學到了東西，那么哪些是表現好的類呢？哪些是表現的差的類呢？

class_correct = list(0. for i in range(10))
class_total = list(0. for i in range(10))
with torch.no_grad():
    for data in testloader:
        images, labels = data
        # images.shape torch.Size([4, 3, 32, 32])
        # labels.shape torch.Size([4])
        outputs = net(images)       # 預測標簽 torch.Size([4, 10])
        _, predicted = torch.max(outputs, 1)
        c = (predicted == labels).squeeze()     # 去掉維數為1的的維度
        for i in range(4):
            label = labels[i]
            class_correct[label] += c[i].item()
            class_total[label] += 1


for i in range(10):
    print('Accuracy of %5s : %2d %%' % (
        classes[i], 100 * class_correct[i] / class_total[i]))
# Accuracy of plane : 70 %
# Accuracy of   car : 70 %
# Accuracy of  bird : 28 %
# Accuracy of   cat : 25 %
# Accuracy of  deer : 37 %
# Accuracy of   dog : 60 %
# Accuracy of  frog : 66 %
# Accuracy of horse : 62 %
# Accuracy of  ship : 69 %
# Accuracy of truck : 61 %

在GPU上訓練

在GPU上訓練，我么要將神經網路轉到GPU上，前提條件是CUDA可以用，讓我們首先定義下我們的設備為第一個可見的cuda設備，

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

# Assume that we are on a CUDA machine, then this should print a CUDA device:

print(device)
# cuda:0

然后這些方法將遞回遍歷所有模塊，并將它們的引數和緩沖區轉換為CUDA張量：

net.to(device)

請記住，我們不得不將輸入和目標在每一步都送入GPU：

inputs, labels = inputs.to(device), labels.to(device)

資料并行處理

　　在這一章節中我們教大家使用DataParallel來使用多GPU

我們把模型放入GPU中

device = torch.device("cuda: 0")    # 實體化cuda設備
model.to(device)

將張量復制到GPU設備上：

mytensor = my_tensor.to(device)

PyTorch 默認只會使用一個 GPU，因此我們可以使用 DataParallel 讓模型在多個GPU上并行運行

model = nn.DataParallel(model)

輸入和引數

import torch
import torch.nn as nn
from torch.utils.data import Dataset, DataLoader

input_size = 5
output_size = 2
batch_size = 30
data_size = 100

# 設備
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

虛擬資料集

　　制造一個隨機的資料集，只需實作__getitem__，

class RandomDataset(Dataset):
    def __init__(self, size, length):
        self.len = length
        self.data = torch.randn(length, size)

    def __getitem__(self, index):
        return self.data[index]

    def __len__(self):
        return self.len

rand_loader = DataLoader(dataset=RandomDataset(input_size, data_size),
            batch_size=batch_size, shuffle=True)

簡單模型

　　作為演示，我們的模型只接受一個輸入，執行一個線性操作，然后得到結果，然而，你能在任何模型(CNN，RNN，Capsule Net等）上使用DataParallel，

class Model(nn.Module):
    def __init__(self, input_size, output_size):
        super(Model, self).__init__()
        self.fc = nn.Linear(input_size, output_size)

    def forward(self, input):
        output = self.fc(input)
        print("In Model: input size", input.size(), "output size", output.size())
        return output

創建模型和資料并行

我們先要檢查模型是否有多個GPU，如果有我們再使用nn.DataParallel，然后我們可以把模型放在GPU上model.to(device)

model = Model(input_size, output_size)  # 模型實體化
if torch.cuda.device_count() >= 1:
    print("我們有", torch.cuda.device_count(), "個GPUs!")
    model = nn.DataParallel(model)  # 設定資料并行

model.to(device)    # 將模型放到cuda上面

運行模型，現在我們可以看到輸入和輸出張量的大小了

for data in rand_loader:
    input = data.to(device)     # 將資料放到cuda上面
    output = model(input)
    print("Outside: input size", input.size(), "output_size", output.size())
    print("---------------------------------------------------------")

輸出

我們有 1 個GPUs!
In Model: input size torch.Size([30, 5]) output size torch.Size([30, 2])
Outside: input size torch.Size([30, 5]) output_size torch.Size([30, 2])
---------------------------------------------------------
In Model: input size torch.Size([30, 5]) output size torch.Size([30, 2])
Outside: input size torch.Size([30, 5]) output_size torch.Size([30, 2])
---------------------------------------------------------
In Model: input size torch.Size([30, 5]) output size torch.Size([30, 2])
Outside: input size torch.Size([30, 5]) output_size torch.Size([30, 2])
---------------------------------------------------------
In Model: input size torch.Size([10, 5]) output size torch.Size([10, 2])
Outside: input size torch.Size([10, 5]) output_size torch.Size([10, 2])
---------------------------------------------------------

　　當我們對30個輸入和輸出進行批處理時，我們和期望的一樣得到30個輸入和30個輸出，但是若有多個GPU，會得到如下的結果，

如果我們有2個GPU我們可以看到以下結果

我們有 2 個GPUs!
In Model: input size torch.Size([15, 5]) output size torch.Size([15, 2])
In Model: input size torch.Size([15, 5]) output size torch.Size([15, 2])
Outside: input size torch.Size([30, 5]) output_size torch.Size([30, 2])
---------------------------------------------------------
In Model: input size torch.Size([15, 5]) output size torch.Size([15, 2])
In Model: input size torch.Size([15, 5]) output size torch.Size([15, 2])
Outside: input size torch.Size([30, 5]) output_size torch.Size([30, 2])
---------------------------------------------------------
In Model: input size torch.Size([15, 5]) output size torch.Size([15, 2])
In Model: input size torch.Size([15, 5]) output size torch.Size([15, 2])
Outside: input size torch.Size([30, 5]) output_size torch.Size([30, 2])
---------------------------------------------------------
In Model: input size torch.Size([5, 5]) output size torch.Size([5, 2])
In Model: input size torch.Size([5, 5]) output size torch.Size([5, 2])
Outside: input size torch.Size([10, 5]) output_size torch.Size([10, 2])
---------------------------------------------------------

DataParallel自動的劃分資料，并將作業發送到多個GPU上的多個模型，DataParallel會在每個模型完成作業后，收集與合并結果然后回傳給你，

TensorBoard可視化

我通過from torch.utils.tensorboard import SummaryWriter匯入tensorboard有問題，因此我選擇通過tesnorboardX，

from tensorboardX import SummaryWriter

創建事件物件：writer = SummaryWriter(logdir)

寫入圖片資料：writer.add_image(tag, img_tensor, global_step=None)

寫入標量資料：writer.add_scalar(tag=, scalar_value, global_step=None)

關閉事件物件：writer.close()

在事件檔案夾 ./events 中打開cmd，輸入

tensorboard --logdir=runs
# 或者
tensorboard --logdir "./"

然后在瀏覽器中輸入

https://localhost:6006/ 即可顯示，

保存和加載模型

torch.save：保存模型，序列化物件保存到磁盤，常見的PyTorch約定是使用.pt或 .pth檔案擴展名保存模型，

torch.load：加載模型，目標檔案反序列化到記憶體中

torch.nn.Module.load_state_dict：使用反序列化的state_dict 加載模型的引數字典

state_dict：python字典，包括具有可學習引數的層、每層的引數張量、優化器以及優化器超引數

　　為了充分了解state_dict，我們看下面例子：

import torch.nn as nn
import torch.nn.functional as F
from torch import optim


class TheModelClass(nn.Module):
    def __init__(self):
        super(TheModelClass, self).__init__()
        self.conv1 = nn.Conv2d(3, 6, 5)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(6, 16, 5)
        self.fc1 = nn.Linear(16 * 5 * 5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = x.view(-1, 16 * 5 * 5)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x


model = TheModelClass() # 初始化模型
optimizer = optim.SGD(model.parameters(), lr=0.001, momentum=0.9)   # 初始化optimizer

print("Model的state_dict:")
for param_tensor in model.state_dict():
    print(param_tensor, "\t", model.state_dict()[param_tensor].size())

print("Optimizer的state_dict:")
for var_name in optimizer.state_dict():
    print(var_name, "\t", optimizer.state_dict()[var_name])
    
# Model的state_dict:
# conv1.weight      torch.Size([6, 3, 5, 5])
# conv1.bias      torch.Size([6])
# conv2.weight      torch.Size([16, 6, 5, 5])
# conv2.bias      torch.Size([16])
# fc1.weight      torch.Size([120, 400])
# fc1.bias      torch.Size([120])
# fc2.weight      torch.Size([84, 120])
# fc2.bias      torch.Size([84])
# fc3.weight      torch.Size([10, 84])
# fc3.bias      torch.Size([10])
# Optimizer的state_dict:
# state      {}
# param_groups      [{'lr': 0.001, 'momentum': 0.9, 'dampening': 0, 'weight_decay': 0, 'nesterov': False, 'params': [3251954079208, 3251954079280, 3251954079352, 3251954079424, 3251954079496, 3251954079568, 3251954079640, 3251954079712, 3251954079784, 3251954079856]}]

推理模型的保存和加載

保存

torch.save(model.state_dict(), PATH)

加載

model = TheModelClass(*args, **kwargs)
model.load_state_dict(torch.load(PATH))
model.eval()

記住，model.eval()在運行推理之前，注意要將批處理規范化層設定為評估模式，

整個模型的保存和加載

保存

torch.save(model, PATH)

加載

model = torch.load(PATH)
model.eval()

記住，model.eval()在運行推理之前，注意要將批處理規范化層設定為評估模式，

保存和加載用于推理或繼續訓練的常規檢查點

保存

torch.save({
    'epoch': epoch,
    'model_state_dict': model.state_dict(),
    'optimizer_state_dict': optimizer.state_dict(),
    'loss': loss,
    ...
    }, PATH)

加載

model = TheModelClass(*args, **kwargs)
optimizer = TheOptimizerClass(*args, **kwargs)

checkpoint = torch.load(PATH)
model.load_state_dict(checkpoint['model_state_dict'])
optimizer.load_state_dict(checkpoint['optimizer_state_dict'])
epoch = checkpoint['epoch']
loss = checkpoint['loss']

model.eval()
# 或
model.train()

常見的PyTorch約定是使用.tar檔案擴展名保存這些檢查點，

參考

Github一個Pytorch從入門到精通比較好的教程

PyTorch模型訓練實用教程

簡單易上手的PyTorch中文檔案：https://github.com/fendouai/pytorch1.0-cn

【檔案】

英文版官方檔案
中文版官方檔案
PyTorch中文檔案

知乎pytorch搜索頁

【視頻】

莫煩pytorch
[曉唦帶你讀]《動手學深度學習》(PyTorch版)完結！4萬播放
pytorch 入門學習（目前見過最好的pytorch學習視頻）12萬播放
《PyTorch深度學習實踐》完結合集 4萬播放

如何成為Pytorch大神

學好深度學習的基礎知識
學習PyTorch官方tutorial
學習GitHub以及各種博客上的教程(別人創建好的list)
閱讀documentation，使用論壇 https://discuss.pytorch.org/
跑通以及學習開源PyTorch專案
閱讀深度學習模型paper，學習別人的模型實作
通過閱讀paper，自己實作模型
自己創造模型(也可以寫paper)

轉載請註明出處，本文鏈接：https://www.uj5u.com/qita/73700.html

標籤：其他

上一篇：遵循統一的機器學習框架理解高斯混合模型（GMM）

下一篇：讀---白話大資料與機器學習

Pytorch入門教程