本教程通過自包含的例子介紹 PyTorch 的基本概念，

要查看格式更加優美的圖文并茂的教程，請移步：http://studyai.com/pytorch-1.4/beginner/pytorch_with_examples.html

PyTorch的核心是提供了兩個主要特性:

n維Tensor，類似于numpy，但可以在GPU上運行，
建立和訓練神經網路的自動微分

我們將使用一個完全連接的relu網路作為我們的運行示例，該網路將有一個單一的隱藏層，并將用梯度下降訓練，為了適應隨機資料，通過最小化網路輸出和真正的輸出的歐氏距離來更新網路模型引數，

Note

你可以單獨瀏覽和下載這個示例，在這個頁面的最后，
張量
熱身: numpy

在介紹PyTorch之前，我們首先使用numpy實作網路，

Numpy提供了一個n維陣列物件，以及許多用于操作這些陣列的函式. Numpy是一個用于科學計算的通用框架；它對計算圖、深度學習或梯度一無所知，但是，我們可以很容易地使用numpy來擬合兩層網路中的隨機資料，方法是使用numpy操作手動實作前后向通過網路：

-- coding: utf-8 --

import numpy as np

N is batch size; D_in is input dimension;

H is hidden dimension; D_out is output dimension.

N, D_in, H, D_out = 64, 1000, 100, 10

產生隨機輸入和輸出資料

x = np.random.randn(N, D_in)
y = np.random.randn(N, D_out)

隨機初始化權重

w1 = np.random.randn(D_in, H)
w2 = np.random.randn(H, D_out)

learning_rate = 1e-6
for t in range(500):
# 前向傳遞: 計算 predicted y
h = x.dot(w1)
h_relu = np.maximum(h, 0)
y_pred = h_relu.dot(w2)

# 計算和輸出損失
loss = np.square(y_pred - y).sum()
print(t, loss)

# 反向傳播(Backprop) 去計算 w1 和 w2 相對于loss的梯度
grad_y_pred = 2.0 * (y_pred - y)
grad_w2 = h_relu.T.dot(grad_y_pred)
grad_h_relu = grad_y_pred.dot(w2.T)
grad_h = grad_h_relu.copy()
grad_h[h < 0] = 0
grad_w1 = x.T.dot(grad_h)

# 更新權重
w1 -= learning_rate * grad_w1
w2 -= learning_rate * grad_w2

PyTorch: Tensors

Numpy是一個很好的框架，但它不能使用GPU加速其數值計算，對于現代的深層神經網路， GPU通常提供50倍或更高的速度，因此不幸的是，numpy不足以滿足現代深度學習的需要，

這里我們介紹最基礎的PyTorch概念：張量(Tensor) ，PyTorch張量在概念上與numpy陣列相同：一個Tensor是一個n維陣列，PyTorch提供了許多在這些張量上操作的函式，在幕后，張量可以跟蹤計算圖和梯度，但它們也是科學計算的通用工具，

與Numpy不同的是，PyTorch張量可以利用GPU加速它們的數值計算，要在GPU上運行PyTorch張量，只需將其轉換為新的資料型別即可，

在這里，我們使用PyTorch張量對隨機資料進行兩層網路擬合，與上面的numpy示例一樣，我們需要手動實作通過網路向前和向后傳遞的內容:

-- coding: utf-8 --

import torch

dtype = torch.float
device = torch.device("cpu")

device = torch.device("cuda:0") # 去掉這行注釋就可以在GPU上運行

N is batch size; D_in is input dimension;

H is hidden dimension; D_out is output dimension.

N, D_in, H, D_out = 64, 1000, 100, 10

產生隨機輸入和輸出資料

x = torch.randn(N, D_in, device=device, dtype=dtype)
y = torch.randn(N, D_out, device=device, dtype=dtype)

隨機初始化權重

w1 = torch.randn(D_in, H, device=device, dtype=dtype)
w2 = torch.randn(H, D_out, device=device, dtype=dtype)

learning_rate = 1e-6
for t in range(500):
# 前向傳遞: 計算 predicted y
h = x.mm(w1)
h_relu = h.clamp(min=0)
y_pred = h_relu.mm(w2)

# 計算并輸出損失
loss = (y_pred - y).pow(2).sum().item()
print(t, loss)

# 反向傳播(Backprop) 去計算 w1 和 w2 相對于loss的梯度
grad_y_pred = 2.0 * (y_pred - y)
grad_w2 = h_relu.t().mm(grad_y_pred)
grad_h_relu = grad_y_pred.mm(w2.t())
grad_h = grad_h_relu.clone()
grad_h[h < 0] = 0
grad_w1 = x.t().mm(grad_h)

# 使用梯度下降法更新權重
w1 -= learning_rate * grad_w1
w2 -= learning_rate * grad_w2

自動梯度
PyTorch: Tensors 和 autograd

在上面的例子中，我們必須手動實作我們的神經網路的前向、后向傳播程序，對于一個小的兩層網路來說，手動實作反向傳遞并不是什么大問題，但是對于大型復雜網路來說，它很快就會變得令人害怕的(hairy)，

值得慶幸的是，我們可以使用自動微分 (automatic differentiation) 來自動計算神經網路中的反向傳播， PyTorch中的 autograd 包提供了這個功能，使用autograd時，網路的前向傳播過將定義一個計算圖(computational graph)；圖中的節點將是張量，邊將是從輸入張量產生輸出張量的函式，然后，通過這個圖進行反向傳播，您可以輕松地計算梯度，

這聽起來很復雜，在實踐中很容易使用，每個張量表示計算圖中的一個節點，如果 x 是具有 x.requires_grad=True 狀態的張量，則 x.grad 是另一個張量，它持有 x 相對于某個標量值的梯度，

在這里，我們使用PyTorch張量和自動梯度來實作我們的兩層網路；現在我們不再需要手動實作通過網路的反向傳遞:

-- coding: utf-8 --

import torch

dtype = torch.float
device = torch.device("cpu")

device = torch.device("cuda:0") #去掉這行注釋就可以在GPU上運行

N is batch size; D_in is input dimension;

H is hidden dimension; D_out is output dimension.

N, D_in, H, D_out = 64, 1000, 100, 10

創建隨機張量以持有輸入和輸出.

設定 requires_grad=False 表明我們在反向傳遞階段

不需要計算相對于這些張量的梯度

x = torch.randn(N, D_in, device=device, dtype=dtype)
y = torch.randn(N, D_out, device=device, dtype=dtype)

創建隨機張量用來存放模型的可學習引數: weights

設定 requires_grad=True 表明我們在反向傳遞階段

需要計算相對于這些張量的梯度

w1 = torch.randn(D_in, H, device=device, dtype=dtype, requires_grad=True)
w2 = torch.randn(H, D_out, device=device, dtype=dtype, requires_grad=True)

learning_rate = 1e-6
for t in range(500):
# 前向傳遞: 計算預測出的 y 使用Tensors相關的運算/操作;
# 這個地方與上一節中使用Tensor的同樣的操作計算前向傳遞是一樣的；
# 但是我們不需要保留計算程序的中間值的參考，
# 因為我們并沒有去手動實作反向傳遞，
y_pred = x.mm(w1).clamp(min=0).mm(w2)

# 使用Tensors的操作 計算損失并輸出
# 現在損失是一個 shape 為 (1,) 的張量
# loss.item() 可以獲得張量loss中持有的數字
loss = (y_pred - y).pow(2).sum()
print(t, loss.item())

# 使用 autograd 去計算反向傳遞， 這個呼叫將會計算
# loss相對于所有狀態為 requires_grad=True 的張量的梯度，
# 呼叫完畢以后， w1.grad 和 w2.grad 將會是兩個張量，分別持有
# 損失相對于 w1 和 w2 的梯度，
loss.backward()

# 使用梯度下降法手動更新權重，并將代碼分裝在 torch.no_grad() 中，
# 因為 權重張量的狀態為 requires_grad=True, 但是我們不希望在
# autograd 中去跟蹤歷史.
# 另一種可選的方法是 直接操作 weight.data 和 weight.grad.data ，
# 回想到 tensor.data 給出一個與其共享存盤空間的張量，但是不會跟蹤歷史，
# 你也可以使用 torch.optim.SGD 來達到此目的，
with torch.no_grad():
    w1 -= learning_rate * w1.grad
    w2 -= learning_rate * w2.grad

    # 更新完權重以后，手動將所有的梯度清零
    w1.grad.zero_()
    w2.grad.zero_()

PyTorch: 定義一個新的 autograd 函式

在這種情況下，每個原始的自動梯度算子(autograd operator) 實際上是兩個作用于張量的函式， forward 函式從輸入張量計算輸出張量，backward 函式接收輸出張量相對于某個標量值的梯度，并計算輸入張量相對于該標量值的梯度，

在PyTorch中，我們可以通過定義 torch.autograd.Function 的子類來輕松地定義我們自己的自動梯度算子并實作 forward 和 backward 函式，然后，我們可以使用新的自動梯度算子，方法是構造一個類實體并像函式一樣呼叫它，傳遞包含輸入資料的張量，

在這個例子中，我們定義了自定義的自動梯度函式來執行relu非線性，并使用它來實作我們的兩層網路:

-- coding: utf-8 --

import torch

class MyReLU(torch.autograd.Function):
"""
我們可以通過定義 torch.autograd.Function 的子類
并實作forward和backward函式來輕松地定義我們自己的
autograd Functions ，
"""

@staticmethod
def forward(ctx, input):
    """
    在前向傳遞中，我們接收一個包含輸入的Tensor并回傳
    一個包含輸出的Tensor， ctx 是一個背景關系物件，
    可以用于為反向計算存盤資訊，
    可以使用 ctx.save_for_backward 方法快取任意物件，以便在向后傳遞中使用，
    """
    ctx.save_for_backward(input)
    return input.clamp(min=-2)

@staticmethod
def backward(ctx, grad_output):
    """
    在反向傳遞中，我們接收到一個包含了損失相對于輸出的梯度的張量，
    并且我們需要計算損失相對于輸入的梯度，
    """
    input, = ctx.saved_tensors
    grad_input = grad_output.clone()
    grad_input[input < -2] = 0
    return grad_input

dtype = torch.float
device = torch.device("cpu")

device = torch.device("cuda:0") # Uncomment this to run on GPU

N is batch size; D_in is input dimension;

H is hidden dimension; D_out is output dimension.

N, D_in, H, D_out = 64, 1000, 100, 10

Create random Tensors to hold input and outputs.

x = torch.randn(N, D_in, device=device, dtype=dtype)
y = torch.randn(N, D_out, device=device, dtype=dtype)

Create random Tensors for weights.

w1 = torch.randn(D_in, H, device=device, dtype=dtype, requires_grad=True)
w2 = torch.randn(H, D_out, device=device, dtype=dtype, requires_grad=True)

learning_rate = 1e-6
for t in range(500):
# 要應用我們自定義的函式, 可以使用 Function.apply 方法.
# 我們給它起個別名 'relu'.
relu = MyReLU.apply

# 前向傳遞: 使用operations計算預測的 y ; 我們
# 使用自定義的 autograd operation 計算 ReLU ，
y_pred = relu(x.mm(w1)).mm(w2)

# 計算并輸出損失
loss = (y_pred - y).pow(2).sum()
print(t, loss.item())

# 使用 autograd 去計算 backward pass.
loss.backward()

# 使用梯度下降法更新權重
with torch.no_grad():
    w1 -= learning_rate * w1.grad
    w2 -= learning_rate * w2.grad

    # 更新完權重以后，手動清零所有的梯度快取
    w1.grad.zero_()
    w2.grad.zero_()

TensorFlow: 靜態計算圖

PyTorch Autograd看起來很像TensorFlow：在這兩個框架中，我們定義了一個計算圖，并使用自動微分來計算梯度，兩者最大的區別是TensorFlow的計算圖是靜態的(static) ， PyTorch使用動態(dynamic) 計算圖，

在TensorFlow中，我們定義一次計算圖，然后一次又一次地執行相同的圖，可能會將不同的輸入資料輸入到圖中，在PyTorch中，每一次前向傳播程序都定義一個新的計算圖，

靜態圖很好，因為您可以預先優化它；例如，一個框架可能決定融合一些圖節點操作以提高效率，或者想出一種在多個GPU或多臺機器上分配圖上計算節點的策略，如果您一次又一次地重用相同的圖，那么這個潛在的代價高昂的預先優化可以被攤還，因為相同的圖會一次又一次地重復運行，

靜態圖和動態圖不同的一個方面是控制流(control flow)，對于某些模型，我們可能希望對每個資料點執行不同的計算；例如，對于每個資料點，可能會對不同的時間步驟展開遞回網路；這種展開可以作為一個回圈來實作，對于靜態圖，回圈構造需要是圖的一部分；因此，TensorFlow提供了諸如 tf.scan 之類的運算子，用于將回圈嵌入到圖中，對于動態圖，情況更簡單：因為我們為每個示例動態構建圖，所以我們可以使用正常的命令式流控制來執行對每個輸入不同的計算，

為了與上面的PyTorch Autograd的示例作對比，這里我們使用TensorFlow來擬合一個簡單的兩層網路:

-- coding: utf-8 --

import tensorflow as tf
import numpy as np

First we set up the computational graph:

N is batch size; D_in is input dimension;

H is hidden dimension; D_out is output dimension.

N, D_in, H, D_out = 64, 1000, 100, 10

Create placeholders for the input and target data; these will be filled

with real data when we execute the graph.

x = tf.placeholder(tf.float32, shape=(None, D_in))
y = tf.placeholder(tf.float32, shape=(None, D_out))

Create Variables for the weights and initialize them with random data.

A TensorFlow Variable persists its value across executions of the graph.

w1 = tf.Variable(tf.random_normal((D_in, H)))
w2 = tf.Variable(tf.random_normal((H, D_out)))

Forward pass: Compute the predicted y using operations on TensorFlow Tensors.

Note that this code does not actually perform any numeric operations; it

merely sets up the computational graph that we will later execute.

h = tf.matmul(x, w1)
h_relu = tf.maximum(h, tf.zeros(1))
y_pred = tf.matmul(h_relu, w2)

Compute loss using operations on TensorFlow Tensors

loss = tf.reduce_sum((y - y_pred) ** 2.0)

Compute gradient of the loss with respect to w1 and w2.

grad_w1, grad_w2 = tf.gradients(loss, [w1, w2])

Update the weights using gradient descent. To actually update the weights

we need to evaluate new_w1 and new_w2 when executing the graph. Note that

in TensorFlow the the act of updating the value of the weights is part of

the computational graph; in PyTorch this happens outside the computational

graph.

learning_rate = 1e-6
new_w1 = w1.assign(w1 - learning_rate * grad_w1)
new_w2 = w2.assign(w2 - learning_rate * grad_w2)

Now we have built our computational graph, so we enter a TensorFlow session to

actually execute the graph.

with tf.Session() as sess:
# Run the graph once to initialize the Variables w1 and w2.
sess.run(tf.global_variables_initializer())

# Create numpy arrays holding the actual data for the inputs x and targets
# y
x_value = https://www.cnblogs.com/studyai/p/np.random.randn(N, D_in)
y_value = np.random.randn(N, D_out)
for _ in range(500):
    # Execute the graph many times. Each time it executes we want to bind
    # x_value to x and y_value to y, specified with the feed_dict argument.
    # Each time we execute the graph we want to compute the values for loss,
    # new_w1, and new_w2; the values of these Tensors are returned as numpy
    # arrays.
    loss_value, _, _ = sess.run([loss, new_w1, new_w2],
                                feed_dict={x: x_value, y: y_value})
    print(loss_value)

nn 模塊
PyTorch: nn

計算圖和自動梯度是定義復雜算子和自動獲取導數的一個非常強大的paradigm；然而，對于大型神經網路來說，raw autograd 可能有點過于低級，

在建立神經網路時，我們經常會考慮將計算組織成層(layers) ，其中有些具有可學習的引數(learnable parameters)，在學習程序中會進行優化，

在TensorFlow中， Keras, TensorFlow-Slim, 和 TFLearn 等包提供了比原始計算圖更高層次的抽象，這對于構建神經網路非常有用，

在PyTorch中，nn 包也有同樣的用途，nn 包定義了一組模塊(Modules) ，它們大致相當于神經網路的層，模塊接收輸入張量并計算輸出張量，同時也可以保持內部狀態，例如包含可學習引數的張量， nn 包還定義了一組有用的損失函式，這些函式是訓練神經網路時常用的，

在本例中，我們使用nn包來實作我們的兩層網路：

-- coding: utf-8 --

import torch

N is batch size; D_in is input dimension;

H is hidden dimension; D_out is output dimension.

N, D_in, H, D_out = 64, 1000, 100, 10

Create random Tensors to hold inputs and outputs

x = torch.randn(N, D_in)
y = torch.randn(N, D_out)

使用 nn package 來把我們的模型定義為layers構成的序列，nn.Sequential

是一個包含了其他Modules的Module, 并把它們應用在序列中產生輸出，

每個Linear Module使用線性函式從輸入計算輸出，并且持有內部張量用于存盤它的權重和偏置，

model = torch.nn.Sequential(
torch.nn.Linear(D_in, H),
torch.nn.ReLU(),
torch.nn.Linear(H, D_out),
)

nn package 也包含了各種廣泛使用的損失函式;

在這里，我們使用 Mean Squared Error (MSE) 作為我們的損失函式，

loss_fn = torch.nn.MSELoss(reduction='sum')

learning_rate = 1e-4
for t in range(500):
# 前向傳遞: 把 x 傳入 model 計算預測輸出 y ，因為 Module objects 多載了
# call 這個魔法函式，所以你可以像呼叫函式一樣呼叫 model ，
# 當你這么做的時候，你要把輸入資料的Tensor傳遞到Module里面，并產生輸出資料的Tensor.
y_pred = model(x)

# 計算并輸出 loss. 我們把包含預測值的張量 y_pred 和真實值的張量 y 都傳入損失函式，
# 損失函式回傳一個包含損失的張量，
loss = loss_fn(y_pred, y)
print(t, loss.item())

# 在運行反向傳播之前先將模型內部的梯度快取都清零
model.zero_grad()

# 反向傳遞: 計算損失相對模型中所有可學習引數的梯度
# 在內部, 每個 Module 的引數被存盤在狀態為
# requires_grad=True 的 Tensors 中, 所以呼叫backward()后，
# 將會計算模型中所有可學習引數的梯度，
loss.backward()

# 使用梯度下降演算法更新權重. 每個引數是一個Tensor, 因此
# 我們可以像之前一樣通過 param.grad 來獲取梯度
with torch.no_grad():
    for param in model.parameters():
        param -= learning_rate * param.grad

PyTorch: optim

到目前為止，我們已經通過手動修改包含可學習引數的張量來更新模型的權重(使用 torch.no_grad() 或 .data ，以避免在autograd中跟蹤歷史記錄)，對于像隨機梯度下降這樣的簡單優化演算法來說，這并不是一個巨大的負擔，但在實踐中，我們經常使用更復雜的優化器，如AdaGrad、RMSProp、Adam 等來訓練神經網路，

PyTorch中的 optim package 抽象了優化演算法的思想，并提供了常用優化演算法的實作，

在這個例子中，我們將像前面一樣使用 nn 包來定義我們的模型，但是我們將使用 optim package提供的 Adam 演算法來優化模型:

-- coding: utf-8 --

import torch

N is batch size; D_in is input dimension;

H is hidden dimension; D_out is output dimension.

N, D_in, H, D_out = 64, 1000, 100, 10

創建持有輸入和輸出的隨機張量

x = torch.randn(N, D_in)
y = torch.randn(N, D_out)

使用 nn package 來定義模型和損失函式

model = torch.nn.Sequential(
torch.nn.Linear(D_in, H),
torch.nn.ReLU(),
torch.nn.Linear(H, D_out),
)
loss_fn = torch.nn.MSELoss(reduction='sum')

使用 optim package 來定義一個優化器(Optimizer),用于為我們更新模型的權重，

這里我們使用 Adam; optim package 包含很多其他的優化演算法，

Adam 建構式的第一個引數告訴優化器哪些Tensors需要被更新，

learning_rate = 1e-4
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)
for t in range(500):
# 前向傳遞: 通過把x傳入模型來計算預測值 y，
y_pred = model(x)

# 計算并輸出 loss.
loss = loss_fn(y_pred, y)
print(t, loss.item())

# 在向后傳遞之前，使用優化器物件把它將要更新的變數(模型的可學習引數)的所有梯度變為零，
# 這是因為默認情況下，不管啥時候呼叫.backward()，梯度都會累積到快取(i.e. 不是重新寫入)，
# 請查看 torch.autograd.backward 的檔案獲得更多資訊，
optimizer.zero_grad()

# 向后傳遞: 計算損失相對于模型引數的梯度
loss.backward()

# 呼叫 Optimizer 的 step 函式對引數進行一步更新
optimizer.step()

PyTorch: 自定義 nn 模塊

有時，您可能希望指定比現有模塊序列更復雜的模型；在這些情況下，您可以通過定義 nn.Module 的子類和定義一個 forward 來定義您自己的模塊，它接收輸入張量并使用其他模塊或對張量的其他自動梯度算子生成輸出張量，

在本例中，我們將我們的兩層網路實作為自定義模塊子類:

-- coding: utf-8 --

import torch

class TwoLayerNet(torch.nn.Module):
def init(self, D_in, H, D_out):
"""
在建構式中，我們實體化了兩個nn.Linear模塊，
并將它們賦值為成員變數，
"""
super(TwoLayerNet, self).init()
self.linear1 = torch.nn.Linear(D_in, H)
self.linear2 = torch.nn.Linear(H, D_out)

def forward(self, x):
    """
    在前饋函式中，我們接受一個輸入資料的 Tensor，
    并且我們必須回傳輸出資料的Tensor，在這里
    我們可以使用造函式中已經定義好的Modules和
    其他任意的Tensors上的算子來完成前饋函式的任務邏輯，
    """
    h_relu = self.linear1(x).clamp(min=0)
    y_pred = self.linear2(h_relu)
    return y_pred

N is batch size; D_in is input dimension;

H is hidden dimension; D_out is output dimension.

N, D_in, H, D_out = 64, 1000, 100, 10

創建持有輸入和輸出的隨機張量

x = torch.randn(N, D_in)
y = torch.randn(N, D_out)

通過實體化上面定義的類，來創建模型

model = TwoLayerNet(D_in, H, D_out)

構建我們的損失函式和優化器，在SGD的構造器中呼叫 model.parameters()

將會包含來自兩個nn.Linear modules的可學習引數；

這兩個 nn.Linear modules 是我們自定義的模型的類成員，

criterion = torch.nn.MSELoss(reduction='sum')
optimizer = torch.optim.SGD(model.parameters(), lr=1e-4)
for t in range(500):
# 前向程序: 把 x 傳遞給model, 計算 predicted y
y_pred = model(x)

# 計算并輸出loss
loss = criterion(y_pred, y)
print(t, loss.item())

# 把梯度置零， 執行后向傳遞, 以及 更新權重
optimizer.zero_grad()
loss.backward()
optimizer.step()

PyTorch: 控制流 + 權重共享

作為動態圖和權重共享的一個例子，我們實作了一個非常奇怪的模型：一個完全連接的ReLU網路，它在每個前向通路上選擇一個介于1到4之間的亂數，并使用那許多隱藏層，重復使用相同的權重多次計算最內部的隱藏層，

對于這個模型，我們可以使用普通的Python流控制來實作回圈，我們可以通過在定義前向傳遞時多次重用相同的模塊來實作最內部層之間的權重共享，

我們可以很容易地將這個模型實作為一個模塊子類:

-- coding: utf-8 --

import random
import torch

class DynamicNet(torch.nn.Module):
def init(self, D_in, H, D_out):
"""
在建構式中，我們創建3個 nn.Linear 的實體，它們將被用于前向傳遞中，
"""
super(DynamicNet, self).init()
self.input_linear = torch.nn.Linear(D_in, H)
self.middle_linear = torch.nn.Linear(H, H)
self.output_linear = torch.nn.Linear(H, D_out)

def forward(self, x):
    """
    在模型的前向傳遞中, 我們隨機的選擇 0, 1, 2, 或 3 中的一個數字，
    然后我們就重復使用middle_linear Module那么多次 作為計算隱藏層的表示，

    因為每一次前向傳遞都會構建一個動態的計算圖，我們在定義模型的前向計算程序時
    可以使用普通的Python控制流操作比如 for-loops 或 條件運算式，

    在這里，我們還看到，在定義計算圖時，多次重用同一個模塊是完全安全的，
    這是對Lua Torch的一個很大的改進，在那里每個模塊只能使用一次，
    """
    h_relu = self.input_linear(x).clamp(min=0)
    for _ in range(random.randint(0, 3)):
        h_relu = self.middle_linear(h_relu).clamp(min=0)
    y_pred = self.output_linear(h_relu)
    return y_pred

N is batch size; D_in is input dimension;

H is hidden dimension; D_out is output dimension.

N, D_in, H, D_out = 64, 1000, 100, 10

創建持有輸入和輸出的隨機張量

x = torch.randn(N, D_in)
y = torch.randn(N, D_out)

通過實體化上面定義的類，來創建模型

model = DynamicNet(D_in, H, D_out)

構建我們的損失函式和優化器，使用普通的SGD 來訓練這個奇怪的模型是很難的，

所以我們使用了帶有動量項的SGD來優化模型，

criterion = torch.nn.MSELoss(reduction='sum')
optimizer = torch.optim.SGD(model.parameters(), lr=1e-4, momentum=0.9)
for t in range(500):
# 前向程序: 把 x 傳遞給model, 計算 predicted y
y_pred = model(x)

# 計算并輸出loss
loss = criterion(y_pred, y)
print(t, loss.item())

# 把梯度置零， 執行后向傳遞, 以及 更新權重
optimizer.zero_grad()
loss.backward()
optimizer.step()

示例

您可以在這里單獨瀏覽和下載上面的每個示例代碼，

轉載請註明出處，本文鏈接：https://www.uj5u.com/qita/45422.html

標籤：其他

上一篇：泥娃VOIP服務器雙線免費版 v1.8.0.zip下載

下一篇：BERT實作QA中的問句語意相似度計算

PyTorch-22 學習 PyTorch 的 Examples

要查看格式更加優美的圖文并茂的教程，請移步：http://studyai.com/pytorch-1.4/beginner/pytorch_with_examples.html

-- coding: utf-8 --

N is batch size; D_in is input dimension;

H is hidden dimension; D_out is output dimension.

產生隨機輸入和輸出資料

隨機初始化權重

-- coding: utf-8 --

device = torch.device("cuda:0") # 去掉這行注釋就可以在GPU上運行

N is batch size; D_in is input dimension;

H is hidden dimension; D_out is output dimension.

產生隨機輸入和輸出資料

隨機初始化權重

-- coding: utf-8 --

device = torch.device("cuda:0") #去掉這行注釋就可以在GPU上運行

N is batch size; D_in is input dimension;

H is hidden dimension; D_out is output dimension.

創建隨機張量以持有輸入和輸出.

設定 requires_grad=False 表明 我們在反向傳遞階段

不需要計算相對于這些張量的梯度

創建隨機張量用來存放模型的可學習引數: weights

設定 requires_grad=True 表明 我們在反向傳遞階段

需要計算相對于這些張量的梯度

-- coding: utf-8 --

device = torch.device("cuda:0") # Uncomment this to run on GPU

N is batch size; D_in is input dimension;

H is hidden dimension; D_out is output dimension.

Create random Tensors to hold input and outputs.

Create random Tensors for weights.

-- coding: utf-8 --

First we set up the computational graph:

N is batch size; D_in is input dimension;

H is hidden dimension; D_out is output dimension.

Create placeholders for the input and target data; these will be filled

with real data when we execute the graph.

Create Variables for the weights and initialize them with random data.

A TensorFlow Variable persists its value across executions of the graph.

Forward pass: Compute the predicted y using operations on TensorFlow Tensors.

Note that this code does not actually perform any numeric operations; it

merely sets up the computational graph that we will later execute.

Compute loss using operations on TensorFlow Tensors

Compute gradient of the loss with respect to w1 and w2.

Update the weights using gradient descent. To actually update the weights

we need to evaluate new_w1 and new_w2 when executing the graph. Note that

in TensorFlow the the act of updating the value of the weights is part of

the computational graph; in PyTorch this happens outside the computational

graph.

Now we have built our computational graph, so we enter a TensorFlow session to

actually execute the graph.

-- coding: utf-8 --

N is batch size; D_in is input dimension;

H is hidden dimension; D_out is output dimension.

Create random Tensors to hold inputs and outputs

使用 nn package 來把我們的模型定義為layers構成的序列，nn.Sequential

是一個包含了其他Modules的Module, 并把它們應用在序列中產生輸出，

每個Linear Module使用線性函式從輸入計算輸出，并且持有內部張量用于存盤它的權重和偏置，

nn package 也包含了各種廣泛使用的損失函式;

在這里，我們使用 Mean Squared Error (MSE) 作為我們的損失函式，

-- coding: utf-8 --

N is batch size; D_in is input dimension;

H is hidden dimension; D_out is output dimension.

創建持有輸入和輸出的隨機張量

使用 nn package 來 定義模型和損失函式

使用 optim package 來定義一個優化器(Optimizer),用于為我們更新模型的權重，

這里我們使用 Adam; optim package 包含很多其他的優化演算法，

Adam 建構式的第一個引數告訴優化器哪些Tensors需要被更新，

-- coding: utf-8 --

N is batch size; D_in is input dimension;

H is hidden dimension; D_out is output dimension.

創建持有輸入和輸出的隨機張量

通過實體化上面定義的類，來創建模型

構建我們的損失函式和優化器，在SGD的構造器中呼叫 model.parameters()

將會包含來自兩個nn.Linear modules的可學習引數；

這兩個 nn.Linear modules 是我們自定義的模型的類成員，

-- coding: utf-8 --

N is batch size; D_in is input dimension;

H is hidden dimension; D_out is output dimension.

創建持有輸入和輸出的隨機張量

通過實體化上面定義的類，來創建模型

構建我們的損失函式和優化器，使用 普通的SGD 來訓練這個奇怪的模型是很難的，

設定 requires_grad=False 表明我們在反向傳遞階段

設定 requires_grad=True 表明我們在反向傳遞階段

使用 nn package 來定義模型和損失函式

構建我們的損失函式和優化器，使用普通的SGD 來訓練這個奇怪的模型是很難的，