文章目錄

13.梯度
14.激活函式
15.感知機
16.鏈式法則
17.反向傳播
18.2D函式優化實體
19.Logistic Regression
20.交叉熵
21.多分類
22.全連接層
23.激活函式與GPU加速
24. 測驗

根據龍良曲Pytorch學習視頻整理，視頻鏈接：
【計算機-AI】PyTorch學這個就夠了！
(好課推薦)深度學習與PyTorch入門實戰——主講人龍良曲

13.梯度

導數 derivative
偏微分 partial derivate
梯度 gradient（向量）

How to search for minima?

θ t + 1 = θ t ? α t ▽ f ( θ t ) \theta_{t+1}=\theta_t-\alpha_t\triangledown f(\theta_t) θt+1?=θt??αt?▽f(θt?)

Optimizer performance

initialization status 何愷明初始化方法
learning rate （learning_rate_decay）
momentum

14.激活函式

連續不可導
Sigmoid / Logistic σ ′ = σ ( 1 ? σ ) \sigma'=\sigma(1-\sigma) σ′=σ(1?σ)
torch.sigmoid()
F.sigmoid() (import torch.nn.functional as F)
Tanh
torch.tanh()
Relu
torch.relu()
F.relu() (import torch.nn.functional as F)

Typical Loss

Mean Squared Error
MSE l o s s = ∑ [ y ? ( x w + b ) ] 2 loss = \sum [y-(xw+b)]^2 loss=∑[y?(xw+b)]2
L 2 ? n o r m = ∣ ∣ y ? ( x w + b ) ∣ ∣ 2 L2-norm=||y-(xw+b)||_2 L2?norm=∣∣y?(xw+b)∣∣2?
Cross Entropy Loss
binary
multi-class
+softmax
Leave it to Logistic Regression Part
Softmax
soft version of max
S ( y i ) = e y i ∑ j e y j S(y_i)=\frac{e^{y_i}}{\sum_je^{y_j}} S(yi?)=∑j?eyj?eyi??
? p i ? p j = { p i ( 1 ? p i ) i = j ? p j ? p i i ≠ j \frac{\partial p_i}{\partial p_j}=\left\{\begin{matrix} p_i(1-p_i)&i=j \\ -p_j*p_i& i\neq j \end{matrix}\right. ?pj??pi??={pi?(1?pi?)?pj??pi??i=ji?=j?

Gradient API

torch.autograd.grad(loss, [w1, w2,...])
loss.backward()

import torch
import torch.nn.functional as F

x = torch.ones(1)
w = torch.full([1], 2.)
mse = F.mse_loss(torch.ones(1), x*w)
print(mse)  # tensor(1., grad_fn=<MseLossBackward>)

# torch.autograd.grad(mse, [w]) # RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

print(w.requires_grad_())   # tensor([2.], requires_grad=True)

# print(torch.autograd.grad(mse, [w]))    # 動態圖未更新會報錯RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

mse = F.mse_loss(torch.ones(1), x*w)
# print(torch.autograd.grad(mse, [w]))    # (tensor([2.]),

mse.backward()
print(w.grad)   # tensor([2.])


a = torch.rand(3, requires_grad=True)
print(a)    # tensor([0.0377, 0.4542, 0.1386], requires_grad=True)
p = F.softmax(a, dim=0)
# p.backward()    # 報錯 RuntimeError: grad can be implicitly created only for scalar outputs
# retain_graph=True 不會清除計算圖
print(torch.autograd.grad(p[0], [a], retain_graph=True))    # (tensor([ 0.1998, -0.1156, -0.0843]),)
print(torch.autograd.grad(p[1], [a], retain_graph=True))    # (tensor([-0.1156,  0.2434, -0.1278]),)
print(torch.autograd.grad(p[2], [a], retain_graph=True))    # (tensor([-0.0843, -0.1278,  0.2121]),)

15.感知機

單一輸出感知機求導
P32
? E ? w j 0 = ( O 0 ? t ) O 0 ( 1 ? O 0 ) x j 0 \frac{\partial E}{\partial w_{j0}}=(O_0-t)O_0(1-O_0)x^0_j ?wj0??E?=(O0??t)O0?(1?O0?)xj0?

多輸出Loss層 (Multi-output Perception)
在這里插入圖片描述
? E ? w j k = ( O k ? t k ) O k ( 1 ? O k ) x j 0 \frac{\partial E}{\partial w_{jk}}=(O_k-t_k)O_k(1-O_k)x^0_j ?wjk??E?=(Ok??tk?)Ok?(1?Ok?)xj0?

import torch
import torch.nn.functional as F

x = torch.randn(1, 10)
# w = torch.randn(1, 10, requires_grad=True)  # 單一層感知機
w = torch.randn(2, 10, requires_grad=True)  # 多輸出Loss層
o = torch.sigmoid(x@w.t())
print(o.shape)  # torch.Size([1, 2])

loss = F.mse_loss(torch.ones(1, 1), o)  # broadcasting
print(loss.shape)   # torch.Size([])
print(loss)   # tensor(0.2094, grad_fn=<MseLossBackward>)

loss.backward()
print(w.grad)
"""
tensor([[-2.0498e-01,  2.4619e-02, -8.0208e-04, -1.3723e-01, -1.3014e-01,
         -1.4648e-01, -7.5119e-02,  4.9381e-02,  2.7161e-01,  4.8075e-02],
        [-4.8705e-03,  5.8495e-04, -1.9058e-05, -3.2607e-03, -3.0922e-03,
         -3.4804e-03, -1.7849e-03,  1.1733e-03,  6.4536e-03,  1.1423e-03]])
"""

16.鏈式法則

import torch
import torch.nn.functional as F

x = torch.tensor(1.)
w1 = torch.tensor(2., requires_grad=True)
b1 = torch.tensor(1.)
w2 = torch.tensor(2., requires_grad=True)
b2 = torch.tensor(1.)
y1 = x * w1 + b1
y2 = y1 * w2 + b2

dy2_dy1 = torch.autograd.grad(y2, [y1], retain_graph=True)[0]
dy1_dw1 = torch.autograd.grad(y1, [w1], retain_graph=True)[0]
dy2_dw1 = torch.autograd.grad(y2, [w1], retain_graph=True)[0]

print(dy2_dy1 * dy1_dw1)    # tensor(2.)
print(dy2_dw1)  # tensor(2.)

17.反向傳播

For an output layer node k ∈ \in ∈ K ? E ? W j k = O j δ k \frac{\partial E}{\partial W_{jk}}=O_j\delta_k ?Wjk??E?=Oj?δk?

where δ k = O k ( 1 ? O k ) ( O k ? t k ) \delta _k = O_k(1-O_k)(O_k-t_k) δk?=Ok?(1?Ok?)(Ok??tk?)

For a hidden layer node j ∈ \in ∈ J ? E ? W i j = O i δ j \frac{\partial E}{\partial W_{ij}}=O_i\delta_j ?Wij??E?=Oi?δj?

where δ j = O j ( 1 ? O j ) ∑ k ∈ K δ k W j k \delta _j = O_j(1-O_j)\sum _{k \in K}\delta _k W_{jk} δj?=Oj?(1?Oj?)k∈K∑?δk?Wjk?

18.2D函式優化實體

import torch
import numpy as np
import matplotlib.pyplot as plt

def himmelblau(x):
    return (x[0] ** 2 + x[1] - 11) ** 2 + (x[0] + x[1] ** 2 - 7) ** 2

x = np.arange(-6, 6, 0.1)
y = np.arange(-6, 6, 0.1)
print('x, y range:', x.shape, y.shape)
X, Y = np.meshgrid(x, y)
print('X, Y maps', X.shape, Y.shape)
Z = himmelblau([X, Y])

fig = plt.figure('himmelblau')
ax = fig.gca(projection='3d')
ax.plot_surface(X, Y, Z)
ax.view_init(60, -30)
ax.set_xlabel('x')
ax.set_ylabel('y')
plt.show()

x = torch.tensor([0., 0.], requires_grad=True)
optimizer = torch.optim.Adam([x], lr=1e-3)
for step in range(20000):
    pred = himmelblau(x)
    optimizer.zero_grad()
    pred.backward()
    optimizer.step()
    if step % 2000 == 0:
        print('step {}: x = {}, f(x) = {}'.format(step, x.tolist(), pred.item()))
"""
x, y range: (120,) (120,)
X, Y maps (120, 120) (120, 120)
G:/Project/PYTHON/Demo/Pytorch21_7_29/12himmelblau.py:17: MatplotlibDeprecationWarning: Calling gca() with keyword arguments was deprecated in Matplotlib 3.4. Starting two minor releases later, gca() will take no keyword arguments. The gca() function should only be used to get the current axes, or if no axes exist, create new axes with default keyword arguments. To create a new axes with non-default arguments, use plt.axes() or plt.subplot().
  ax = fig.gca(projection='3d')
step 0: x = [0.0009999999310821295, 0.0009999999310821295], f(x) = 170.0
step 2000: x = [2.3331806659698486, 1.9540694952011108], f(x) = 13.730916023254395
step 4000: x = [2.9820079803466797, 2.0270984172821045], f(x) = 0.014858869835734367
step 6000: x = [2.999983549118042, 2.0000221729278564], f(x) = 1.1074007488787174e-08
step 8000: x = [2.9999938011169434, 2.0000083446502686], f(x) = 1.5572823031106964e-09
step 10000: x = [2.999997854232788, 2.000002861022949], f(x) = 1.8189894035458565e-10
step 12000: x = [2.9999992847442627, 2.0000009536743164], f(x) = 1.6370904631912708e-11
step 14000: x = [2.999999761581421, 2.000000238418579], f(x) = 1.8189894035458565e-12
step 16000: x = [3.0, 2.0], f(x) = 0.0
step 18000: x = [3.0, 2.0], f(x) = 0.0
"""

在這里插入圖片描述
不同的初始化會產生不同的解

19.Logistic Regression

Goal v.s. Approach

For regression
Goal: p r e d = y pred = y pred=y
Approach: minimize d i s t ( p r e d , y ) dist(pred, y) dist(pred,y)
For classification
Goal: maximize benchmark, e.g. accuracy
Approach1: minimize d i s t ( p θ ( y ∣ x ) , p r ( y ∣ x ) ) dist(p_\theta(y|x), p_r(y|x)) dist(pθ?(y∣x),pr?(y∣x))
Approach2: minimize d i v e r g e n c e ( p θ ( y ∣ x ) , p r ( y ∣ x ) ) divergence(p_\theta(y|x), p_r(y|x)) divergence(pθ?(y∣x),pr?(y∣x))

Q1. why not maximize accuracy?

a c c . = ∑ I ( p r e d i = = y i ) l e n ( Y ) acc.=\frac{\sum I(pred_i==y_i)}{len(Y)} acc.=len(Y)∑I(predi?==yi?)?
issues 1. gradient = 0 if accuracy unchanged but weights changed
issues 2. gradient not continuous since the number of correct is not continuous

Q2. why call logistic regression

use sigmoid
Controversial
Mse => regression
Cross Entropy => classification

20.交叉熵

Entropy

Uncertainty
measure of surprise
higher entropy = less info.
E n t r o p y = ? ∑ i P ( i ) l o g P ( i ) Entropy=-\sum_iP(i)logP(i) Entropy=?i∑?P(i)logP(i)

Lottery

import torch

a = torch.full([4], 1/4.)
print(a)    # tensor([0.2500, 0.2500, 0.2500, 0.2500])
print(a * torch.log2(a))    # tensor([-0.5000, -0.5000, -0.5000, -0.5000])
print(-(a * torch.log2(a)).sum())   # tensor(2.)

a = torch.tensor([0.1, 0.1, 0.1, 0.7])
print(a)    # tensor([0.1000, 0.1000, 0.1000, 0.7000])
print(a * torch.log2(a))    # tensor([-0.3322, -0.3322, -0.3322, -0.3602])
print(-(a * torch.log2(a)).sum())   #tensor(1.3568)

Cross Entropy

H ( p , q ) = ∑ p ( x ) l o g q ( x ) H(p,q)=\sum p(x)log\space q(x) H(p,q)=∑p(x)log q(x)
H ( p , q ) = H ( p ) + D K L ( p ∣ q ) H(p,q)=H(p)+D_{KL}(p|q) H(p,q)=H(p)+DKL?(p∣q) D K L D_{KL} DKL?指KL Divergence相對熵
P=Q: cross Entropy = Entropy
for one-hot encoding: entropy = 1log1=0

Binary Classfication
H ( P , Q ) = ? ( y l o g ( p ) + ( 1 ? y ) l o g ( 1 ? p ) ) H(P, Q)=-(ylog(p)+(1-y)log(1-p)) H(P,Q)=?(ylog(p)+(1?y)log(1?p))

Why not use MSE?

sigmoid+MSE
gradient vanish
converge slower
But, sometimes
e.g. meta-learning

Numerical Stability

import torch
import torch.nn.functional as F

x = torch.randn(1, 784)
w = torch.randn(10, 784)
logits = x @ w.t()
print(logits.size())    # torch.Size([1, 10])

print(F.cross_entropy(logits, torch.tensor([3])))   # tensor(0.1694)

pred = F.softmax(logits, dim=1)
print(pred.size())    # torch.Size([1, 10])
pred_log = torch.log(pred)
print(F.nll_loss(pred_log, torch.tensor([3])))   # tensor(0.1694)

21.多分類

import torch
import torch.nn as nn
import torch.nn.functional as F
from torchvision import datasets, transforms

# initialize
batch_size = 200
learning_rate = 0.01
epochs = 10

# load data
train_loader = torch.utils.data.DataLoader(
    datasets.MNIST('../data', train=True, download=True,
                   transform=transforms.Compose([
                       transforms.ToTensor(),
                       transforms.Normalize((0.01307, ), (0.3081, ))
                   ])),
    batch_size=batch_size, shuffle=True
)
test_loader = torch.utils.data.DataLoader(
    datasets.MNIST('../data', train=False,
                   transform=transforms.Compose([
                       transforms.ToTensor(),
                       transforms.Normalize((0.01307, ), (0.3081, ))
                   ])),
    batch_size=batch_size, shuffle=True
)

# Network Architecture
w1, b1 = torch.randn(200, 784, requires_grad=True), torch.zeros(200, requires_grad=True)
w2, b2 = torch.randn(200, 200, requires_grad=True), torch.zeros(200, requires_grad=True)
w3, b3 = torch.randn(10, 200, requires_grad=True), torch.zeros(10, requires_grad=True)

torch.nn.init.kaiming_normal_(w1)
torch.nn.init.kaiming_normal_(w2)
torch.nn.init.kaiming_normal_(w3)

def forward(x):
    x = x @ w1.t() + b1
    x = F.relu(x)
    x = x @ w2.t() + b2
    x = F.relu(x)
    x = x @ w3.t() + b3
    x = F.relu(x)
    return x

# Train
optimizer = torch.optim.SGD([w1, b1, w2, b2, w3, b3], lr=learning_rate)
criteon = nn.CrossEntropyLoss()

for epoch in range(epochs):
    for batch_idx, (data, target) in enumerate(train_loader):
        data = data.view(-1, 28 * 28)
        logits = forward(data)
        loss = criteon(logits, target)
        optimizer.zero_grad()
        loss.backward()
        # print(w1.grad.norm(), w2.grad.norm())
        optimizer.step()

        if batch_idx % 100 == 0:
            print('Train Epoch:{} [{} / {} ({:.0f}%)]\tLoss: {:.6f}'.format(
                epoch, batch_idx * len(data), len(train_loader.dataset),
                100. * batch_idx / len(train_loader), loss.item()
            ))

    test_loss = 0
    correct = 0
    for data, target in test_loader:
        data = data.view(-1, 28 * 28)
        logits = forward(data)
        test_loss += criteon(logits, target).item()
        pred = logits.data.max(1)[1]
        correct += pred.eq(target.data).sum()

    test_loss /= len(test_loader.dataset)
    print('\nTest set Average loss:{:.4f}, Accuracy: {} / {} ({:.0f}%)\n'.format(
        test_loss, correct, len(test_loader.dataset),
              100. * correct / len(test_loader.dataset)
    ))

"""
Train Epoch:0 [0 / 60000 (0%)]	Loss: 2.486867
Train Epoch:0 [20000 / 60000 (33%)]	Loss: 0.724861
Train Epoch:0 [40000 / 60000 (67%)]	Loss: 0.376032

Test set Average loss:0.0018, Accuracy: 8983 / 10000 (90%)

Train Epoch:1 [0 / 60000 (0%)]	Loss: 0.366988
Train Epoch:1 [20000 / 60000 (33%)]	Loss: 0.377176
Train Epoch:1 [40000 / 60000 (67%)]	Loss: 0.399104

Test set Average loss:0.0014, Accuracy: 9186 / 10000 (92%)

Train Epoch:2 [0 / 60000 (0%)]	Loss: 0.252696
Train Epoch:2 [20000 / 60000 (33%)]	Loss: 0.302346
Train Epoch:2 [40000 / 60000 (67%)]	Loss: 0.266919

Test set Average loss:0.0012, Accuracy: 9284 / 10000 (93%)

Train Epoch:3 [0 / 60000 (0%)]	Loss: 0.320602
Train Epoch:3 [20000 / 60000 (33%)]	Loss: 0.223881
Train Epoch:3 [40000 / 60000 (67%)]	Loss: 0.198832

Test set Average loss:0.0011, Accuracy: 9364 / 10000 (94%)

Train Epoch:4 [0 / 60000 (0%)]	Loss: 0.253680
Train Epoch:4 [20000 / 60000 (33%)]	Loss: 0.147065
Train Epoch:4 [40000 / 60000 (67%)]	Loss: 0.194152

Test set Average loss:0.0010, Accuracy: 9406 / 10000 (94%)

Train Epoch:5 [0 / 60000 (0%)]	Loss: 0.163504
Train Epoch:5 [20000 / 60000 (33%)]	Loss: 0.216691
Train Epoch:5 [40000 / 60000 (67%)]	Loss: 0.166883

Test set Average loss:0.0010, Accuracy: 9460 / 10000 (95%)

Train Epoch:6 [0 / 60000 (0%)]	Loss: 0.120956
Train Epoch:6 [20000 / 60000 (33%)]	Loss: 0.122348
Train Epoch:6 [40000 / 60000 (67%)]	Loss: 0.167381

Test set Average loss:0.0009, Accuracy: 9484 / 10000 (95%)

Train Epoch:7 [0 / 60000 (0%)]	Loss: 0.218382
Train Epoch:7 [20000 / 60000 (33%)]	Loss: 0.141006
Train Epoch:7 [40000 / 60000 (67%)]	Loss: 0.156644

Test set Average loss:0.0009, Accuracy: 9501 / 10000 (95%)

Train Epoch:8 [0 / 60000 (0%)]	Loss: 0.152702
Train Epoch:8 [20000 / 60000 (33%)]	Loss: 0.167587
Train Epoch:8 [40000 / 60000 (67%)]	Loss: 0.182679

Test set Average loss:0.0008, Accuracy: 9528 / 10000 (95%)

Train Epoch:9 [0 / 60000 (0%)]	Loss: 0.210252
Train Epoch:9 [20000 / 60000 (33%)]	Loss: 0.150022
Train Epoch:9 [40000 / 60000 (67%)]	Loss: 0.097077

Test set Average loss:0.0008, Accuracy: 9559 / 10000 (96%)

"""

22.全連接層

concisely

inherit from nn.Module
init layer in __init__
implement forward()

nn.Relu v.s. F.relu()

class-style API
function-style API

import torch
import torch.nn as nn
from torchvision import datasets, transforms
import torch.optim as optim
from visdom import Visdom

# initialize
batch_size = 200
learning_rate = 0.01
epochs = 10

# load data
train_loader = torch.utils.data.DataLoader(
    datasets.MNIST('../data', train=True, download=True,
                   transform=transforms.Compose([
                       transforms.ToTensor(),
                       transforms.Normalize((0.01307, ), (0.3081, ))
                   ])),
    batch_size=batch_size, shuffle=True
)
test_loader = torch.utils.data.DataLoader(
    datasets.MNIST('../data', train=False,
                   transform=transforms.Compose([
                       transforms.ToTensor(),
                       transforms.Normalize((0.01307, ), (0.3081, ))
                   ])),
    batch_size=batch_size, shuffle=True
)


class MLP(nn.Module):
    def __init__(self):
        super(MLP, self).__init__()

        self.model = nn.Sequential(
            nn.Linear(784, 200),
            nn.LeakyReLU(inplace=True), # inplace計算可以節省內（顯）存，還可省去反復申請和釋放記憶體時間,但會對原變數覆寫
            nn.Linear(200,200),
            nn.LeakyReLU(inplace=True),
            nn.Linear(200,10),
            nn.LeakyReLU(inplace=True),
        )

    def forward(self, x):
        x = self.model(x)
        return x

device = torch.device('cuda:0')
net = MLP().to(device)
optimizer = optim.SGD(net.parameters(), lr=learning_rate)
criteon = nn.CrossEntropyLoss().to(device)

viz = Visdom()
viz.line([0.], [0.], win='train_loss', opts=dict(title='train_loss'))
viz.line([[0.0, 0.0]], [0.], win='test', opts=dict(title='test loss&acc.', legend=['loss', 'acc.']))
global_step = -1

for epoch in range(epochs):
    for batch_idx, (data, target) in enumerate(train_loader):
        data = data.view(-1, 28 * 28)
        data, target = data.to(device), target.cuda()

        logits = net(data)
        loss = criteon(logits, target)
        optimizer.zero_grad()
        loss.backward()
        # print(w1.grad.norm(), w2.grad.norm())
        optimizer.step()

        if batch_idx % 100 == 0:
            print('Train Epoch:{} [{} / {} ({:.0f}%)]\tLoss: {:.6f}'.format(
                epoch, batch_idx * len(data), len(train_loader.dataset),
                       100. * batch_idx / len(train_loader), loss.item()
            ))

        # lines: single trace
        global_step += 1
        viz.line([loss.item()], [global_step], win='train_loss', update='append')

    test_loss = 0
    correct = 0
    for data, target in test_loader:
        data = data.view(-1, 28 * 28)
        data, target = data.to(device), target.cuda()

        logits = net(data)
        test_loss += criteon(logits, target).item()
        # pred = logits.data.max(1)[1]
        pred = logits.argmax(dim=1)
        correct += pred.eq(target).float().sum().item()

    # lines: multi-traces
    viz.line([[test_loss, correct / len(test_loader.dataset)]], [global_step], win='test', update='append')
    # visual X
    viz.images(data.view(-1, 1, 28, 28), win='x')
    viz.text(str(pred.detach().cpu().numpy()), win='pred', opts=dict(title='pred'))

    test_loss /= len(test_loader.dataset)
    print('\nTest set Average loss:{:.4f}, Accuracy: {} / {} ({:.0f}%)\n'.format(
        test_loss, correct, len(test_loader.dataset),
              100. * correct / len(test_loader.dataset)
    ))


"""
Train Epoch:0 [0 / 60000 (0%)]	Loss: 2.302269
Train Epoch:0 [20000 / 60000 (33%)]	Loss: 1.884911
Train Epoch:0 [40000 / 60000 (67%)]	Loss: 1.336271

Test set Average loss:0.0038, Accuracy: 8169 / 10000 (82%)

Train Epoch:1 [0 / 60000 (0%)]	Loss: 0.721071
Train Epoch:1 [20000 / 60000 (33%)]	Loss: 0.565047
Train Epoch:1 [40000 / 60000 (67%)]	Loss: 0.506850

Test set Average loss:0.0021, Accuracy: 8889 / 10000 (89%)

Train Epoch:2 [0 / 60000 (0%)]	Loss: 0.368552
Train Epoch:2 [20000 / 60000 (33%)]	Loss: 0.301212
Train Epoch:2 [40000 / 60000 (67%)]	Loss: 0.406262

Test set Average loss:0.0017, Accuracy: 9061 / 10000 (91%)

Train Epoch:3 [0 / 60000 (0%)]	Loss: 0.372895
Train Epoch:3 [20000 / 60000 (33%)]	Loss: 0.390528
Train Epoch:3 [40000 / 60000 (67%)]	Loss: 0.389583

Test set Average loss:0.0015, Accuracy: 9141 / 10000 (91%)

Train Epoch:4 [0 / 60000 (0%)]	Loss: 0.220136
Train Epoch:4 [20000 / 60000 (33%)]	Loss: 0.281799
Train Epoch:4 [40000 / 60000 (67%)]	Loss: 0.291274

Test set Average loss:0.0014, Accuracy: 9211 / 10000 (92%)

Train Epoch:5 [0 / 60000 (0%)]	Loss: 0.280618
Train Epoch:5 [20000 / 60000 (33%)]	Loss: 0.305418
Train Epoch:5 [40000 / 60000 (67%)]	Loss: 0.334693

Test set Average loss:0.0014, Accuracy: 9226 / 10000 (92%)

Train Epoch:6 [0 / 60000 (0%)]	Loss: 0.342200
Train Epoch:6 [20000 / 60000 (33%)]	Loss: 0.294665
Train Epoch:6 [40000 / 60000 (67%)]	Loss: 0.220197

Test set Average loss:0.0013, Accuracy: 9280 / 10000 (93%)

Train Epoch:7 [0 / 60000 (0%)]	Loss: 0.211271
Train Epoch:7 [20000 / 60000 (33%)]	Loss: 0.358451
Train Epoch:7 [40000 / 60000 (67%)]	Loss: 0.236865

Test set Average loss:0.0012, Accuracy: 9338 / 10000 (93%)

Train Epoch:8 [0 / 60000 (0%)]	Loss: 0.226759
Train Epoch:8 [20000 / 60000 (33%)]	Loss: 0.288015
Train Epoch:8 [40000 / 60000 (67%)]	Loss: 0.263826

Test set Average loss:0.0012, Accuracy: 9349 / 10000 (93%)

Train Epoch:9 [0 / 60000 (0%)]	Loss: 0.144617
Train Epoch:9 [20000 / 60000 (33%)]	Loss: 0.166465
Train Epoch:9 [40000 / 60000 (67%)]	Loss: 0.296551

Test set Average loss:0.0011, Accuracy: 9367 / 10000 (94%)
"""

23.激活函式與GPU加速

大部分時候使用ReLU激活函式

ReLU: R ( z ) = m a x ( 0 , z ) R(z)=max(0, z) R(z)=max(0,z)
Leaky ReLU
SELU
softplus

GPU accelerated

一鍵切換
device=torch.device('cuda: 0') data.to(device)
data.cuda() 不推薦

任務管理器查看GPU使用情況
在這里插入圖片描述

24. 測驗

Loss != Accuracy

When to test

test once per serveral batch
test once per epoch
epoch v.s. step?

轉載請註明出處，本文鏈接：https://www.uj5u.com/qita/293303.html

標籤：AI

上一篇：深度神經網路中的卷積

下一篇：vs2017永久配置opencv4.0.0環境

【Pytorch學習筆記】3.深度學習基礎