學習總結

（1）本次影像多分類中的最后一層網路不需要加激活，因為在最后的Torch.nn.CrossEntropyLoss已經包括了激活函式softmax，
（2）本次pytorch在讀影像時，將PIL影像（現在都用Pillow了）轉為Tensor，神經網路一般希望input比較小，最好在-1到1之間，最好符合正態分布，三種主流影像處理庫的比較：

庫	函式/方法	回傳值	影像像素格式	像素值范圍	影像矩陣表示
skimage	io.imread(xxx)	numpy.ndarray	RGB	[0, 255]	(H X W X C)
cv2	cv2.imread(xxx)	numpy.ndarray	BGR	[0, 255]	(H X W X C)
Pillow(PIL)	Image.open(xxx)	PIL.Image.Image物件	根據影像格式，一般為RGB	[0, 255]	—

文章目錄

學習總結
一、多分類問題
二、分布和API
三、交叉熵代碼實踐
四、代碼實踐
Reference

一、多分類問題

注意驗證/測驗的流程基本與訓練程序大體一致，不同點在于：

需要預先設定torch.no_grad，以及將model調至eval模式
不需要將優化器的梯度置零
不需要將loss反向回傳到網路
不需要更新optimizer

二、分布和API

在這里插入圖片描述
法一：把每一個類別的確定看作是一個二分類問題，利用交叉熵，

為了解決抑制問題，就不要輸出每個類別的概率，且滿足每個概率大于0和概率之和為1的條件，（二分類我們輸出的是分布，求出一個然后用1減去即可，多分類雖然也可以這樣，但是最后1減去其他所有概率的計算，還需要構建計算圖有點麻煩），
之前二分類中的交叉熵的兩項中只能有一項為0.

在這里插入圖片描述
（1）NLLLoss函式計算如下紅色框：

（2）可以直接使用torch.nn.CrossEntropyLoss（將下列紅框計算納入），注意右側是由類別生成獨熱編碼向量，

交叉熵，最后一層網路不需要激活，因為在最后的Torch.nn.CrossEntropyLoss已經包括了激活函式softmax，
（1）交叉熵手寫版本

import numpy as np
y = np.array([1, 0, 0])
z = np.array([0.2, 0.1, -0.1])
y_predict = np.exp(z) / np.exp(z).sum()
loss = (- y * np.log(y_predict)).sum()
print(loss)
# 0.9729189131256584

（2）交叉熵pytorch栗子
在這里插入圖片描述
交叉熵損失和NLL損失的區別（讀檔案）：

https://pytorch.org/doc s/stable/nn.html#crossentropyloss
https://pytorch.org/docs/stable/nn.html#nllloss
搞懂為啥：CrossEntropyLoss <==> LogSoftmax + NLLLoss

三、交叉熵代碼實踐

前面是sigmoid，后面是softmax（使得概率大于0，概率值和為1），兩點注意，

# -*- coding: utf-8 -*-
"""
Created on Mon Oct 18 22:48:55 2021

@author: 86493
"""
# 用CrossEntropyLoss計算交叉熵
import torch 
# 注意1：設定的第0個分類
y = torch.LongTensor([0])
z = torch.Tensor([[0.2, 0.1, -0.1]])
# 注意2：CrossEntropyLoss
criterion = torch.nn.CrossEntropyLoss()
loss = criterion(z, y)
print(loss.item())
# 0.9729189276695251

# 舉栗子：mini-batch:batch_size = 3
import torch
criterion = torch.nn.CrossEntropyLoss()
# 三個樣本
Y = torch.LongTensor([2, 0, 1]) 

# 第一個樣本比較吻合，loss會較小
Y_pred1 = torch.Tensor([[0.1, 0.2, 0.9], # 2 該層為原始的線性層的輸出
                        [1.1, 0.1, 0.2], # 0
                        [0.2, 2.1, 0.1]])# 1
# 第二個樣本差的比較遠，loss較大
Y_pred2 = torch.Tensor([[0.8, 0.2, 0.3], # 2 
                        [0.2, 0.3, 0.5], # 0 
                        [0.2, 0.2, 0.5]])# 1

l1 = criterion(Y_pred1, Y)
l2 = criterion(Y_pred2, Y)
print("Batch Loss1 = ", l1.data,
      "\nBatch Loss2 =", l2.data)
# Batch Loss1 =  tensor(0.4966) 
# Batch Loss2 = tensor(1.2389)

四、代碼實踐

（1）pytorch在讀影像時，將PIL影像（現在都用Pillow了）轉為Tensor，神經網路希望input比較小，最好在-1到1之間，最好符合正態分布，
在這里插入圖片描述
三種主流影像處理庫的比較：

庫	函式/方法	回傳值	影像像素格式	像素值范圍	影像矩陣表示
skimage	io.imread(xxx)	numpy.ndarray	RGB	[0, 255]	(H X W X C)
cv2	cv2.imread(xxx)	numpy.ndarray	BGR	[0, 255]	(H X W X C)
Pillow(PIL)	Image.open(xxx)	PIL.Image.Image物件	根據影像格式，一般為RGB	[0, 255]	—

在這里插入圖片描述
（2）transforms.Normalize，轉為標準正態分布即是一種映射到(0, 1)函式， Pixel norm = Pixel origin ? mean std \text { Pixel }_{\text {norm }}=\frac{\text { Pixel }_{\text {origin }}-\text { mean }}{\text { std }} Pixel norm ?= std Pixel origin ?? mean ?
（3）激活層改用relu，最后一層不用加，因為交叉熵損失函式里面已經包括了，
（4）_, predicted = torch.max(outputs.data, dim = 1)求出每一行(樣本)的最大值的下標,dim = 1即行的維度；回傳最大值和最大值所在的下標，
在這里插入圖片描述

# -*- coding: utf-8 -*-
"""
Created on Mon Oct 18 19:25:46 2021

@author: 86493
"""
import torch
from torchvision import transforms 
from torchvision import datasets
from torch.utils.data import DataLoader 
# 使用relu激活函式
import torch.nn.functional as F
import torch.optim as optim 
import torch.nn as nn
import matplotlib.pyplot as plt

losslst = []
batch_size = 64 
transform = transforms.Compose([
    # 將PIL圖片轉為Tensor
    transforms.ToTensor(),
    # 歸一化，分別為均值和標準差
    transforms.Normalize((0.1307, ), 
                         (0.3081, ))
])

# 訓練集資料
train_dataset = datasets.MNIST(root = '../dataset/mnist/',
                               train = True,
                               download = True,
                               transform = transform)
train_loader = DataLoader(train_dataset,
                          shuffle = True,
                          batch_size = batch_size)

# 測驗集資料
test_dataset = datasets.MNIST(root = '../dataset/mnist/',
                              train = False,
                              download = True,
                              transform = transform)

test_loader = DataLoader(test_dataset,
                         shuffle = False,
                         batch_size = batch_size)


# 模型
class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.l1 = nn.Linear(784, 512)
        self.l2 = nn.Linear(512, 256)
        self.l3 = nn.Linear(256, 128)
        self.l4 = nn.Linear(128, 64)
        self.l5 = nn.Linear(64, 10)
        
    def forward(self, x):
        # -1位置會自動算出N，即變成(N, 784)矩陣
        x = x.view(-1, 784)
        x = F.relu(self.l1(x))
        x = F.relu(self.l2(x))
        x = F.relu(self.l3(x))
        x = F.relu(self.l4(x))
        # 注意最后一層不用加上relu,因為交叉熵已經含有softmax
        return self.l5(x)
    
model = Net()
# 交叉熵作為loss
criterion = torch.nn.CrossEntropyLoss()
# 優化器SGD加上沖量以優化訓練程序
optimizer = optim.SGD(model.parameters(),
                      lr = 0.01,
                      momentum = 0.5)

def train(epoch):
    running_loss = 0.0
    for batch_idx, data in enumerate(train_loader, 0):
        # 1.準備資料
        inputs, labels = data  # data為元組
        # 2.向前傳遞
        outputs = model(inputs)
        loss = criterion(outputs, labels)
        losslst.append(loss)
        # print(losslst,"+++++++++++++")
        # 3.反向傳播
        optimizer.zero_grad()
        loss.backward()
        # 4.更新引數
        optimizer.step()
        # 記得取loss值用item(),否則會構建計算圖  
        running_loss += loss.item()
        losslst.append(loss.item())
        
        # 每300個batch列印一次
        if batch_idx % 300 == 299:
            print('[%d, %5d] loss: %.3f' % 
                  (epoch + 1, 
                   batch_idx + 1,
                   running_loss / 300))
            running_loss = 0.0
    

def test():
    correct = 0
    total = 0 
    # 加上with這句后面就不會產生計算圖
    with torch.no_grad():
        for data in test_loader:
            images, labels = data
            outputs = model(images)
            # 求出每一行(樣本)的最大值的下標,dim = 1即行的維度
            # 回傳最大值和最大值所在的下標
            _, predicted = torch.max(outputs.data,
                                     dim = 1)
            # label矩陣為N × 1
            total += labels.size(0)
            # 猜對的數量，后面算準確率
            correct += (predicted == labels).sum().item()
    print('Accuracy on test set: %d %%' % 
          (100 * correct / total))


# print("losslst的長度為：", len(losslst))

if __name__ == '__main__':
    for epoch in range(10):
        # if epoch % 10 == 9: # 每10個輸出一次
        train(epoch)
        test()

代碼列印的loss值：

[1,   300] loss: 2.216
[1,   600] loss: 0.872
[1,   900] loss: 0.423
Accuracy on test set: 89 %
[2,   300] loss: 0.333
[2,   600] loss: 0.272
[2,   900] loss: 0.232
Accuracy on test set: 93 %
[3,   300] loss: 0.189
[3,   600] loss: 0.176
[3,   900] loss: 0.160
Accuracy on test set: 95 %
[4,   300] loss: 0.135
[4,   600] loss: 0.127
[4,   900] loss: 0.121
Accuracy on test set: 95 %
[5,   300] loss: 0.104
[5,   600] loss: 0.096
[5,   900] loss: 0.097
Accuracy on test set: 96 %
[6,   300] loss: 0.080
[6,   600] loss: 0.071
[6,   900] loss: 0.083
Accuracy on test set: 97 %
[7,   300] loss: 0.062
[7,   600] loss: 0.062
[7,   900] loss: 0.062
Accuracy on test set: 97 %
[8,   300] loss: 0.050
[8,   600] loss: 0.052
[8,   900] loss: 0.053
Accuracy on test set: 97 %
[9,   300] loss: 0.034
[9,   600] loss: 0.045
[9,   900] loss: 0.043
Accuracy on test set: 97 %
[10,   300] loss: 0.032
[10,   600] loss: 0.032
[10,   900] loss: 0.036
Accuracy on test set: 97 %

Reference

（1）pytorch中with torch.no_grad()
（2）https://www.bilibili.com/video/BV1Y7411d7Ys?p=9
（3）pytorch學習(五)—影像的加載/讀取方式
（4）解決Python報錯：RuntimeError: Can’t call numpy() on Variable that requires grad. Use var.detach().numpy()

轉載請註明出處，本文鏈接：https://www.uj5u.com/qita/325860.html

標籤：其他

上一篇：基于梅爾頻譜的音頻信號分類識別(Pytorch)

下一篇：SynFace Face: Recognition with Synthetic Data 閱讀筆記