文章目錄
- 13.梯度
- 14.激活函式
- 15.感知機
- 16.鏈式法則
- 17.反向傳播
- 18.2D函式優化實體
- 19.Logistic Regression
- 20.交叉熵
- 21.多分類
- 22.全連接層
- 23.激活函式與GPU加速
- 24. 測驗
根據龍良曲Pytorch學習視頻整理,視頻鏈接:
【計算機-AI】PyTorch學這個就夠了!
(好課推薦)深度學習與PyTorch入門實戰——主講人龍良曲
13.梯度
- 導數 derivative
- 偏微分 partial derivate
- 梯度 gradient(向量)
How to search for minima?
- θ t + 1 = θ t ? α t ▽ f ( θ t ) \theta_{t+1}=\theta_t-\alpha_t\triangledown f(\theta_t) θt+1?=θt??αt?▽f(θt?)
Optimizer performance
- initialization status 何愷明初始化方法
- learning rate (learning_rate_decay)
- momentum
14.激活函式
- 連續不可導
- Sigmoid / Logistic
σ
′
=
σ
(
1
?
σ
)
\sigma'=\sigma(1-\sigma)
σ′=σ(1?σ)
torch.sigmoid()
F.sigmoid() (import torch.nn.functional as F) - Tanh
torch.tanh() - Relu
torch.relu()
F.relu() (import torch.nn.functional as F)
Typical Loss
- Mean Squared Error
MSE l o s s = ∑ [ y ? ( x w + b ) ] 2 loss = \sum [y-(xw+b)]^2 loss=∑[y?(xw+b)]2
L 2 ? n o r m = ∣ ∣ y ? ( x w + b ) ∣ ∣ 2 L2-norm=||y-(xw+b)||_2 L2?norm=∣∣y?(xw+b)∣∣2? - Cross Entropy Loss
binary
multi-class
+softmax
Leave it to Logistic Regression Part - Softmax
soft version of max
S ( y i ) = e y i ∑ j e y j S(y_i)=\frac{e^{y_i}}{\sum_je^{y_j}} S(yi?)=∑j?eyj?eyi??
? p i ? p j = { p i ( 1 ? p i ) i = j ? p j ? p i i ≠ j \frac{\partial p_i}{\partial p_j}=\left\{\begin{matrix} p_i(1-p_i)&i=j \\ -p_j*p_i& i\neq j \end{matrix}\right. ?pj??pi??={pi?(1?pi?)?pj??pi??i=ji?=j?
Gradient API
torch.autograd.grad(loss, [w1, w2,...])loss.backward()
import torch
import torch.nn.functional as F
x = torch.ones(1)
w = torch.full([1], 2.)
mse = F.mse_loss(torch.ones(1), x*w)
print(mse) # tensor(1., grad_fn=<MseLossBackward>)
# torch.autograd.grad(mse, [w]) # RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn
print(w.requires_grad_()) # tensor([2.], requires_grad=True)
# print(torch.autograd.grad(mse, [w])) # 動態圖未更新會報錯RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn
mse = F.mse_loss(torch.ones(1), x*w)
# print(torch.autograd.grad(mse, [w])) # (tensor([2.]),
mse.backward()
print(w.grad) # tensor([2.])
a = torch.rand(3, requires_grad=True)
print(a) # tensor([0.0377, 0.4542, 0.1386], requires_grad=True)
p = F.softmax(a, dim=0)
# p.backward() # 報錯 RuntimeError: grad can be implicitly created only for scalar outputs
# retain_graph=True 不會清除計算圖
print(torch.autograd.grad(p[0], [a], retain_graph=True)) # (tensor([ 0.1998, -0.1156, -0.0843]),)
print(torch.autograd.grad(p[1], [a], retain_graph=True)) # (tensor([-0.1156, 0.2434, -0.1278]),)
print(torch.autograd.grad(p[2], [a], retain_graph=True)) # (tensor([-0.0843, -0.1278, 0.2121]),)
15.感知機
單一輸出感知機求導

?
E
?
w
j
0
=
(
O
0
?
t
)
O
0
(
1
?
O
0
)
x
j
0
\frac{\partial E}{\partial w_{j0}}=(O_0-t)O_0(1-O_0)x^0_j
?wj0??E?=(O0??t)O0?(1?O0?)xj0?
多輸出Loss層 (Multi-output Perception)

?
E
?
w
j
k
=
(
O
k
?
t
k
)
O
k
(
1
?
O
k
)
x
j
0
\frac{\partial E}{\partial w_{jk}}=(O_k-t_k)O_k(1-O_k)x^0_j
?wjk??E?=(Ok??tk?)Ok?(1?Ok?)xj0?
import torch
import torch.nn.functional as F
x = torch.randn(1, 10)
# w = torch.randn(1, 10, requires_grad=True) # 單一層感知機
w = torch.randn(2, 10, requires_grad=True) # 多輸出Loss層
o = torch.sigmoid(x@w.t())
print(o.shape) # torch.Size([1, 2])
loss = F.mse_loss(torch.ones(1, 1), o) # broadcasting
print(loss.shape) # torch.Size([])
print(loss) # tensor(0.2094, grad_fn=<MseLossBackward>)
loss.backward()
print(w.grad)
"""
tensor([[-2.0498e-01, 2.4619e-02, -8.0208e-04, -1.3723e-01, -1.3014e-01,
-1.4648e-01, -7.5119e-02, 4.9381e-02, 2.7161e-01, 4.8075e-02],
[-4.8705e-03, 5.8495e-04, -1.9058e-05, -3.2607e-03, -3.0922e-03,
-3.4804e-03, -1.7849e-03, 1.1733e-03, 6.4536e-03, 1.1423e-03]])
"""
16.鏈式法則
import torch
import torch.nn.functional as F
x = torch.tensor(1.)
w1 = torch.tensor(2., requires_grad=True)
b1 = torch.tensor(1.)
w2 = torch.tensor(2., requires_grad=True)
b2 = torch.tensor(1.)
y1 = x * w1 + b1
y2 = y1 * w2 + b2
dy2_dy1 = torch.autograd.grad(y2, [y1], retain_graph=True)[0]
dy1_dw1 = torch.autograd.grad(y1, [w1], retain_graph=True)[0]
dy2_dw1 = torch.autograd.grad(y2, [w1], retain_graph=True)[0]
print(dy2_dy1 * dy1_dw1) # tensor(2.)
print(dy2_dw1) # tensor(2.)
17.反向傳播
For an output layer node k ∈ \in ∈ K ? E ? W j k = O j δ k \frac{\partial E}{\partial W_{jk}}=O_j\delta_k ?Wjk??E?=Oj?δk?
where δ k = O k ( 1 ? O k ) ( O k ? t k ) \delta _k = O_k(1-O_k)(O_k-t_k) δk?=Ok?(1?Ok?)(Ok??tk?)
For a hidden layer node j ∈ \in ∈ J ? E ? W i j = O i δ j \frac{\partial E}{\partial W_{ij}}=O_i\delta_j ?Wij??E?=Oi?δj?
where δ j = O j ( 1 ? O j ) ∑ k ∈ K δ k W j k \delta _j = O_j(1-O_j)\sum _{k \in K}\delta _k W_{jk} δj?=Oj?(1?Oj?)k∈K∑?δk?Wjk?
18.2D函式優化實體
import torch
import numpy as np
import matplotlib.pyplot as plt
def himmelblau(x):
return (x[0] ** 2 + x[1] - 11) ** 2 + (x[0] + x[1] ** 2 - 7) ** 2
x = np.arange(-6, 6, 0.1)
y = np.arange(-6, 6, 0.1)
print('x, y range:', x.shape, y.shape)
X, Y = np.meshgrid(x, y)
print('X, Y maps', X.shape, Y.shape)
Z = himmelblau([X, Y])
fig = plt.figure('himmelblau')
ax = fig.gca(projection='3d')
ax.plot_surface(X, Y, Z)
ax.view_init(60, -30)
ax.set_xlabel('x')
ax.set_ylabel('y')
plt.show()
x = torch.tensor([0., 0.], requires_grad=True)
optimizer = torch.optim.Adam([x], lr=1e-3)
for step in range(20000):
pred = himmelblau(x)
optimizer.zero_grad()
pred.backward()
optimizer.step()
if step % 2000 == 0:
print('step {}: x = {}, f(x) = {}'.format(step, x.tolist(), pred.item()))
"""
x, y range: (120,) (120,)
X, Y maps (120, 120) (120, 120)
G:/Project/PYTHON/Demo/Pytorch21_7_29/12himmelblau.py:17: MatplotlibDeprecationWarning: Calling gca() with keyword arguments was deprecated in Matplotlib 3.4. Starting two minor releases later, gca() will take no keyword arguments. The gca() function should only be used to get the current axes, or if no axes exist, create new axes with default keyword arguments. To create a new axes with non-default arguments, use plt.axes() or plt.subplot().
ax = fig.gca(projection='3d')
step 0: x = [0.0009999999310821295, 0.0009999999310821295], f(x) = 170.0
step 2000: x = [2.3331806659698486, 1.9540694952011108], f(x) = 13.730916023254395
step 4000: x = [2.9820079803466797, 2.0270984172821045], f(x) = 0.014858869835734367
step 6000: x = [2.999983549118042, 2.0000221729278564], f(x) = 1.1074007488787174e-08
step 8000: x = [2.9999938011169434, 2.0000083446502686], f(x) = 1.5572823031106964e-09
step 10000: x = [2.999997854232788, 2.000002861022949], f(x) = 1.8189894035458565e-10
step 12000: x = [2.9999992847442627, 2.0000009536743164], f(x) = 1.6370904631912708e-11
step 14000: x = [2.999999761581421, 2.000000238418579], f(x) = 1.8189894035458565e-12
step 16000: x = [3.0, 2.0], f(x) = 0.0
step 18000: x = [3.0, 2.0], f(x) = 0.0
"""

不同的初始化會產生不同的解
19.Logistic Regression
Goal v.s. Approach
- For regression
Goal: p r e d = y pred = y pred=y
Approach: minimize d i s t ( p r e d , y ) dist(pred, y) dist(pred,y) - For classification
Goal: maximize benchmark, e.g. accuracy
Approach1: minimize d i s t ( p θ ( y ∣ x ) , p r ( y ∣ x ) ) dist(p_\theta(y|x), p_r(y|x)) dist(pθ?(y∣x),pr?(y∣x))
Approach2: minimize d i v e r g e n c e ( p θ ( y ∣ x ) , p r ( y ∣ x ) ) divergence(p_\theta(y|x), p_r(y|x)) divergence(pθ?(y∣x),pr?(y∣x))
Q1. why not maximize accuracy?
- a c c . = ∑ I ( p r e d i = = y i ) l e n ( Y ) acc.=\frac{\sum I(pred_i==y_i)}{len(Y)} acc.=len(Y)∑I(predi?==yi?)?
- issues 1. gradient = 0 if accuracy unchanged but weights changed
- issues 2. gradient not continuous since the number of correct is not continuous
Q2. why call logistic regression
- use sigmoid
- Controversial
Mse => regression
Cross Entropy => classification
20.交叉熵
Entropy
- Uncertainty
- measure of surprise
- higher entropy = less info.
E n t r o p y = ? ∑ i P ( i ) l o g P ( i ) Entropy=-\sum_iP(i)logP(i) Entropy=?i∑?P(i)logP(i)
Lottery
import torch
a = torch.full([4], 1/4.)
print(a) # tensor([0.2500, 0.2500, 0.2500, 0.2500])
print(a * torch.log2(a)) # tensor([-0.5000, -0.5000, -0.5000, -0.5000])
print(-(a * torch.log2(a)).sum()) # tensor(2.)
a = torch.tensor([0.1, 0.1, 0.1, 0.7])
print(a) # tensor([0.1000, 0.1000, 0.1000, 0.7000])
print(a * torch.log2(a)) # tensor([-0.3322, -0.3322, -0.3322, -0.3602])
print(-(a * torch.log2(a)).sum()) #tensor(1.3568)
Cross Entropy
- H ( p , q ) = ∑ p ( x ) l o g q ( x ) H(p,q)=\sum p(x)log\space q(x) H(p,q)=∑p(x)log q(x)
-
H
(
p
,
q
)
=
H
(
p
)
+
D
K
L
(
p
∣
q
)
H(p,q)=H(p)+D_{KL}(p|q)
H(p,q)=H(p)+DKL?(p∣q)
D
K
L
D_{KL}
DKL?指KL Divergence相對熵
P=Q: cross Entropy = Entropy
for one-hot encoding: entropy = 1log1=0
Binary Classfication
H
(
P
,
Q
)
=
?
(
y
l
o
g
(
p
)
+
(
1
?
y
)
l
o
g
(
1
?
p
)
)
H(P, Q)=-(ylog(p)+(1-y)log(1-p))
H(P,Q)=?(ylog(p)+(1?y)log(1?p))
Why not use MSE?
- sigmoid+MSE
gradient vanish - converge slower
- But, sometimes
e.g. meta-learning
Numerical Stability
import torch
import torch.nn.functional as F
x = torch.randn(1, 784)
w = torch.randn(10, 784)
logits = x @ w.t()
print(logits.size()) # torch.Size([1, 10])
print(F.cross_entropy(logits, torch.tensor([3]))) # tensor(0.1694)
pred = F.softmax(logits, dim=1)
print(pred.size()) # torch.Size([1, 10])
pred_log = torch.log(pred)
print(F.nll_loss(pred_log, torch.tensor([3]))) # tensor(0.1694)
21.多分類
import torch
import torch.nn as nn
import torch.nn.functional as F
from torchvision import datasets, transforms
# initialize
batch_size = 200
learning_rate = 0.01
epochs = 10
# load data
train_loader = torch.utils.data.DataLoader(
datasets.MNIST('../data', train=True, download=True,
transform=transforms.Compose([
transforms.ToTensor(),
transforms.Normalize((0.01307, ), (0.3081, ))
])),
batch_size=batch_size, shuffle=True
)
test_loader = torch.utils.data.DataLoader(
datasets.MNIST('../data', train=False,
transform=transforms.Compose([
transforms.ToTensor(),
transforms.Normalize((0.01307, ), (0.3081, ))
])),
batch_size=batch_size, shuffle=True
)
# Network Architecture
w1, b1 = torch.randn(200, 784, requires_grad=True), torch.zeros(200, requires_grad=True)
w2, b2 = torch.randn(200, 200, requires_grad=True), torch.zeros(200, requires_grad=True)
w3, b3 = torch.randn(10, 200, requires_grad=True), torch.zeros(10, requires_grad=True)
torch.nn.init.kaiming_normal_(w1)
torch.nn.init.kaiming_normal_(w2)
torch.nn.init.kaiming_normal_(w3)
def forward(x):
x = x @ w1.t() + b1
x = F.relu(x)
x = x @ w2.t() + b2
x = F.relu(x)
x = x @ w3.t() + b3
x = F.relu(x)
return x
# Train
optimizer = torch.optim.SGD([w1, b1, w2, b2, w3, b3], lr=learning_rate)
criteon = nn.CrossEntropyLoss()
for epoch in range(epochs):
for batch_idx, (data, target) in enumerate(train_loader):
data = data.view(-1, 28 * 28)
logits = forward(data)
loss = criteon(logits, target)
optimizer.zero_grad()
loss.backward()
# print(w1.grad.norm(), w2.grad.norm())
optimizer.step()
if batch_idx % 100 == 0:
print('Train Epoch:{} [{} / {} ({:.0f}%)]\tLoss: {:.6f}'.format(
epoch, batch_idx * len(data), len(train_loader.dataset),
100. * batch_idx / len(train_loader), loss.item()
))
test_loss = 0
correct = 0
for data, target in test_loader:
data = data.view(-1, 28 * 28)
logits = forward(data)
test_loss += criteon(logits, target).item()
pred = logits.data.max(1)[1]
correct += pred.eq(target.data).sum()
test_loss /= len(test_loader.dataset)
print('\nTest set Average loss:{:.4f}, Accuracy: {} / {} ({:.0f}%)\n'.format(
test_loss, correct, len(test_loader.dataset),
100. * correct / len(test_loader.dataset)
))
"""
Train Epoch:0 [0 / 60000 (0%)] Loss: 2.486867
Train Epoch:0 [20000 / 60000 (33%)] Loss: 0.724861
Train Epoch:0 [40000 / 60000 (67%)] Loss: 0.376032
Test set Average loss:0.0018, Accuracy: 8983 / 10000 (90%)
Train Epoch:1 [0 / 60000 (0%)] Loss: 0.366988
Train Epoch:1 [20000 / 60000 (33%)] Loss: 0.377176
Train Epoch:1 [40000 / 60000 (67%)] Loss: 0.399104
Test set Average loss:0.0014, Accuracy: 9186 / 10000 (92%)
Train Epoch:2 [0 / 60000 (0%)] Loss: 0.252696
Train Epoch:2 [20000 / 60000 (33%)] Loss: 0.302346
Train Epoch:2 [40000 / 60000 (67%)] Loss: 0.266919
Test set Average loss:0.0012, Accuracy: 9284 / 10000 (93%)
Train Epoch:3 [0 / 60000 (0%)] Loss: 0.320602
Train Epoch:3 [20000 / 60000 (33%)] Loss: 0.223881
Train Epoch:3 [40000 / 60000 (67%)] Loss: 0.198832
Test set Average loss:0.0011, Accuracy: 9364 / 10000 (94%)
Train Epoch:4 [0 / 60000 (0%)] Loss: 0.253680
Train Epoch:4 [20000 / 60000 (33%)] Loss: 0.147065
Train Epoch:4 [40000 / 60000 (67%)] Loss: 0.194152
Test set Average loss:0.0010, Accuracy: 9406 / 10000 (94%)
Train Epoch:5 [0 / 60000 (0%)] Loss: 0.163504
Train Epoch:5 [20000 / 60000 (33%)] Loss: 0.216691
Train Epoch:5 [40000 / 60000 (67%)] Loss: 0.166883
Test set Average loss:0.0010, Accuracy: 9460 / 10000 (95%)
Train Epoch:6 [0 / 60000 (0%)] Loss: 0.120956
Train Epoch:6 [20000 / 60000 (33%)] Loss: 0.122348
Train Epoch:6 [40000 / 60000 (67%)] Loss: 0.167381
Test set Average loss:0.0009, Accuracy: 9484 / 10000 (95%)
Train Epoch:7 [0 / 60000 (0%)] Loss: 0.218382
Train Epoch:7 [20000 / 60000 (33%)] Loss: 0.141006
Train Epoch:7 [40000 / 60000 (67%)] Loss: 0.156644
Test set Average loss:0.0009, Accuracy: 9501 / 10000 (95%)
Train Epoch:8 [0 / 60000 (0%)] Loss: 0.152702
Train Epoch:8 [20000 / 60000 (33%)] Loss: 0.167587
Train Epoch:8 [40000 / 60000 (67%)] Loss: 0.182679
Test set Average loss:0.0008, Accuracy: 9528 / 10000 (95%)
Train Epoch:9 [0 / 60000 (0%)] Loss: 0.210252
Train Epoch:9 [20000 / 60000 (33%)] Loss: 0.150022
Train Epoch:9 [40000 / 60000 (67%)] Loss: 0.097077
Test set Average loss:0.0008, Accuracy: 9559 / 10000 (96%)
"""
22.全連接層
concisely
- inherit from nn.Module
- init layer in __init__
- implement forward()
nn.Relu v.s. F.relu()
- class-style API
- function-style API
import torch
import torch.nn as nn
from torchvision import datasets, transforms
import torch.optim as optim
from visdom import Visdom
# initialize
batch_size = 200
learning_rate = 0.01
epochs = 10
# load data
train_loader = torch.utils.data.DataLoader(
datasets.MNIST('../data', train=True, download=True,
transform=transforms.Compose([
transforms.ToTensor(),
transforms.Normalize((0.01307, ), (0.3081, ))
])),
batch_size=batch_size, shuffle=True
)
test_loader = torch.utils.data.DataLoader(
datasets.MNIST('../data', train=False,
transform=transforms.Compose([
transforms.ToTensor(),
transforms.Normalize((0.01307, ), (0.3081, ))
])),
batch_size=batch_size, shuffle=True
)
class MLP(nn.Module):
def __init__(self):
super(MLP, self).__init__()
self.model = nn.Sequential(
nn.Linear(784, 200),
nn.LeakyReLU(inplace=True), # inplace計算可以節省內(顯)存,還可省去反復申請和釋放記憶體時間,但會對原變數覆寫
nn.Linear(200,200),
nn.LeakyReLU(inplace=True),
nn.Linear(200,10),
nn.LeakyReLU(inplace=True),
)
def forward(self, x):
x = self.model(x)
return x
device = torch.device('cuda:0')
net = MLP().to(device)
optimizer = optim.SGD(net.parameters(), lr=learning_rate)
criteon = nn.CrossEntropyLoss().to(device)
viz = Visdom()
viz.line([0.], [0.], win='train_loss', opts=dict(title='train_loss'))
viz.line([[0.0, 0.0]], [0.], win='test', opts=dict(title='test loss&acc.', legend=['loss', 'acc.']))
global_step = -1
for epoch in range(epochs):
for batch_idx, (data, target) in enumerate(train_loader):
data = data.view(-1, 28 * 28)
data, target = data.to(device), target.cuda()
logits = net(data)
loss = criteon(logits, target)
optimizer.zero_grad()
loss.backward()
# print(w1.grad.norm(), w2.grad.norm())
optimizer.step()
if batch_idx % 100 == 0:
print('Train Epoch:{} [{} / {} ({:.0f}%)]\tLoss: {:.6f}'.format(
epoch, batch_idx * len(data), len(train_loader.dataset),
100. * batch_idx / len(train_loader), loss.item()
))
# lines: single trace
global_step += 1
viz.line([loss.item()], [global_step], win='train_loss', update='append')
test_loss = 0
correct = 0
for data, target in test_loader:
data = data.view(-1, 28 * 28)
data, target = data.to(device), target.cuda()
logits = net(data)
test_loss += criteon(logits, target).item()
# pred = logits.data.max(1)[1]
pred = logits.argmax(dim=1)
correct += pred.eq(target).float().sum().item()
# lines: multi-traces
viz.line([[test_loss, correct / len(test_loader.dataset)]], [global_step], win='test', update='append')
# visual X
viz.images(data.view(-1, 1, 28, 28), win='x')
viz.text(str(pred.detach().cpu().numpy()), win='pred', opts=dict(title='pred'))
test_loss /= len(test_loader.dataset)
print('\nTest set Average loss:{:.4f}, Accuracy: {} / {} ({:.0f}%)\n'.format(
test_loss, correct, len(test_loader.dataset),
100. * correct / len(test_loader.dataset)
))
"""
Train Epoch:0 [0 / 60000 (0%)] Loss: 2.302269
Train Epoch:0 [20000 / 60000 (33%)] Loss: 1.884911
Train Epoch:0 [40000 / 60000 (67%)] Loss: 1.336271
Test set Average loss:0.0038, Accuracy: 8169 / 10000 (82%)
Train Epoch:1 [0 / 60000 (0%)] Loss: 0.721071
Train Epoch:1 [20000 / 60000 (33%)] Loss: 0.565047
Train Epoch:1 [40000 / 60000 (67%)] Loss: 0.506850
Test set Average loss:0.0021, Accuracy: 8889 / 10000 (89%)
Train Epoch:2 [0 / 60000 (0%)] Loss: 0.368552
Train Epoch:2 [20000 / 60000 (33%)] Loss: 0.301212
Train Epoch:2 [40000 / 60000 (67%)] Loss: 0.406262
Test set Average loss:0.0017, Accuracy: 9061 / 10000 (91%)
Train Epoch:3 [0 / 60000 (0%)] Loss: 0.372895
Train Epoch:3 [20000 / 60000 (33%)] Loss: 0.390528
Train Epoch:3 [40000 / 60000 (67%)] Loss: 0.389583
Test set Average loss:0.0015, Accuracy: 9141 / 10000 (91%)
Train Epoch:4 [0 / 60000 (0%)] Loss: 0.220136
Train Epoch:4 [20000 / 60000 (33%)] Loss: 0.281799
Train Epoch:4 [40000 / 60000 (67%)] Loss: 0.291274
Test set Average loss:0.0014, Accuracy: 9211 / 10000 (92%)
Train Epoch:5 [0 / 60000 (0%)] Loss: 0.280618
Train Epoch:5 [20000 / 60000 (33%)] Loss: 0.305418
Train Epoch:5 [40000 / 60000 (67%)] Loss: 0.334693
Test set Average loss:0.0014, Accuracy: 9226 / 10000 (92%)
Train Epoch:6 [0 / 60000 (0%)] Loss: 0.342200
Train Epoch:6 [20000 / 60000 (33%)] Loss: 0.294665
Train Epoch:6 [40000 / 60000 (67%)] Loss: 0.220197
Test set Average loss:0.0013, Accuracy: 9280 / 10000 (93%)
Train Epoch:7 [0 / 60000 (0%)] Loss: 0.211271
Train Epoch:7 [20000 / 60000 (33%)] Loss: 0.358451
Train Epoch:7 [40000 / 60000 (67%)] Loss: 0.236865
Test set Average loss:0.0012, Accuracy: 9338 / 10000 (93%)
Train Epoch:8 [0 / 60000 (0%)] Loss: 0.226759
Train Epoch:8 [20000 / 60000 (33%)] Loss: 0.288015
Train Epoch:8 [40000 / 60000 (67%)] Loss: 0.263826
Test set Average loss:0.0012, Accuracy: 9349 / 10000 (93%)
Train Epoch:9 [0 / 60000 (0%)] Loss: 0.144617
Train Epoch:9 [20000 / 60000 (33%)] Loss: 0.166465
Train Epoch:9 [40000 / 60000 (67%)] Loss: 0.296551
Test set Average loss:0.0011, Accuracy: 9367 / 10000 (94%)
"""
23.激活函式與GPU加速
大部分時候使用ReLU激活函式
- ReLU: R ( z ) = m a x ( 0 , z ) R(z)=max(0, z) R(z)=max(0,z)
- Leaky ReLU
- SELU
- softplus
GPU accelerated
- 一鍵切換
device=torch.device('cuda: 0')data.to(device)
data.cuda()不推薦
任務管理器查看GPU使用情況

24. 測驗
Loss != Accuracy
When to test
- test once per serveral batch
- test once per epoch
- epoch v.s. step?
轉載請註明出處,本文鏈接:https://www.uj5u.com/qita/293303.html
標籤:AI
上一篇:深度神經網路中的卷積
