0704-使用GPU加速_cuda

一、CPU 和 GPU 資料相互轉換
二、使用 GPU 的注意事項
三、設定默認 GPU
四、GPU 之間的切換

pytorch完整教程目錄：https://www.cnblogs.com/nickchen121/p/14662511.html

一、CPU 和 GPU 資料相互轉換

在 torch 中以下資料結構分為 CPU 和 GPU 兩個版本：

Tensor
Variable（包括 Parameter）
nn.Module（包括常用的 layer、loss function，以及容器 Sequential 等）

它們都帶有一個 .cuda 方法，通過這個方法可以把它們轉換對應的 GPU 物件，

但是在把 cpu 上的資料轉化成 gpu 上的資料時，需要注意以下兩點：

tensor.cuda 和 variable.cuda 都會回傳一個新物件，這個新物件存放在 GPU 中，而之前的資料則依然還會在 CPU 上，
module.cuda 會將所有的資料都遷移到 GPU，并且回傳自己，也就是說 module=module.cuda() 和 module.cuda() 的效果是一樣的

其實 variable 和 nn.Module 在 cpu 和 gpu 之間的轉換，本質上還是利用了 tensor 在 cpu 和 gpu 之間的轉換，比如 variable.cuda 實際上是把 variable.data 轉移到指定的 gpu 上，而 nn.Module 的 cuda 方法是把 nn.Module 下的所有 parameter（包括子 module 的 parameter）都轉移到 gpu 上，而 Parameter 的本質其實又是 variable，

下面舉例說明，但是需要有兩塊 gpu 設備，

注：為什么把資料轉移到 gpu 的方法叫做 .cuda 而不是 .gpu 呢？這是因為 gpu 的編程介面采用 cuda，而目前并不是所有的 gpu 都支持 cuda，只有部分 NVIDIA 的 gpu 才支持，torch 未來可能還會支持 AMD 的 gpu，而 AMD GPU 的編程介面采用 OpenCL，因此 torch 還預留著 .cl 方法，用于以后支持 AMD 等的 GPU，

import torch as t

# tensor 測驗
tensor = t.Tensor(3, 4)
tensor.cuda(0)  # 回傳一個新的 tensor，保存在第 1 塊 GPU 上，但原來的 tensor 并沒有改變
tensor.is_cuda  # False  # 原來的 tensor 依然再 cpu 上

tensor = tensor.cuda()  # 不指定所使用的 GPU 設備，將默認使用第 1 塊 GPU
tensor.is_cuda  # False

# variable 測驗
variable = t.autograd.Variable(tensor)
variable.cuda()
variable.is_cuda()  # False  # 原來的 variable 依然再 cpu 上

# nn.module 測驗
module = nn.Linear(3, 4)
module.cuda(device_id=1)
module.weight.is_cuda  # True

class VeryBigModule(nn.Module):
    def __init__(self):
        super(VeryBigModule, self).__init__()
        self.GiantParameter1 = t.nn.Parameter(t.randn(100000, 20000)).cuda(0)
        self.GiantParameter2 = t.nn.Parameter(t.randn(20000, 100000)).cuda(1)

    def forward(self, x):
        x = self.GiantParameter1.mm(x.cuda(0))
        x = self.GiantParameter2.mm(x.cuda(1))
        return x

在 VeryBigModule 類中，兩個 Parameter 所占用的記憶體非常大，大概是 8GB，如果兩者放在一塊 GPU 上，可能會把顯存占滿，因此把這兩個 Parameter 放在兩塊 GPU 上，

二、使用 GPU 的注意事項

關于使用 GPU 有一些小小的建議：

gpu 運算很快，但是運算量小時，不能體現出它的優勢，因此一些簡單的操作可以使用 cpu 完成
資料在 cpu 和 gpu 之間的傳遞會比較耗時，應當盡量避免
在進行低精度的計算時，可以考慮使用 HalfTensor 時，相比較 FloatTensor 能節省一半的顯存，但需要注意數值溢位的情況

注：大部分的損失函式也都屬于 nn.Module，但在使用 gpu 時，很多時候我們都忘記使用它的 .cuda 方法，在大多數情況下不會保存，因為損失函式沒有可學習的引數，但在某些情況下會出錯，為了保險起見也為了代碼更規范，也應該記得呼叫 criterion.cuda，下面舉例說明：

# 交叉熵損失函式，帶權重
criterion = t.nn.CrossEntropyLoss(weight=t.Tensor([1, 3]))
inp = t.autograd.Variable(t.randn(4, 2)).cuda()
target = t.autograd.Variable(t.Tensor([1, 0, 0, 1])).long().cuda()

# 下面這行會報錯，因為 weight 沒有被轉移到 GPU 上
# loss = criterion(inp, target)

# 這行則不會報錯
criterion.cuda()
loss = criterion(inp, target)

criterion._buffers

三、設定默認 GPU

除了呼叫 .cuda 方法外，還可以使用 torch.cuda.device 指定默認使用哪一塊 GPU，或使用 torch.set_default_tensor_type 使程式默認使用 GPU，不需要手動呼叫 cuda

# 如果沒有指定使用哪塊 GPU，默認使用 GPU 0
x = t.cuda.FloatTensor(2, 3)
# x.get_device() == 0
y = t.FloatTensor(2, 3).cuda()
# y.get_device() == 0

# 指定默認使用 GPU 1
with t.cuda.device(1):
    # 在 GPU 1 上構建 tensor
    a = t.cuda.FloatTensor(2, 3)

    # 把 tensor 轉移到 GPU 1
    b = t.FloatTensor(2, 3).cuda()
    print(a.get_device() == b.get_device() == 1)

    c = a + b
    print(c.get_device() == 1)

    z = x + y
    print(z.get_device() == 0)

    # 手動指定使用 GPU 0
    d = t.randn(2, 3).cuda(0)
    print(d.get_device() == 2)

    # t.set_default_tensor_type('torch.cuda.FloatTensor')  # 指定默認 tensor 的型別為 GPU 上的 FloatTensor
    a = t.ones(2, 3)
    a.is_cuda()

四、GPU 之間的切換

如果服務器有多個 gpu，tensor.cuda() 方法將會把 tensor 保存到第一快 gpu 上，這等價于 tensor.cuda(0)，這個時候如果想使用第二塊 gpu，需要手動指定 tensor.cuda(1)，但是這需要修改大量代碼，因此很繁瑣，這里有兩種代替的方法：

第一種方法是先呼叫 t.cuda.set_device(1) 指定使用第二塊 gpu，后序的 .cuda() 都不需要改變，
另外一種方法是設定環境變數 CUDA_VISIBLE_DEVICES，例如當 export CUDA_VISIBLE_DEVICES=1 時，只使用物理上的第二塊 GPU，但在程式中這塊 cpu 會被看成是第一塊邏輯 gpu，當然，CUDA_VISIBLE_DEVICES 還可以指定多個 gpu，如 export CUDA_VISIBLE_DEVICES=0,2,3，那么第一、三、四塊物理 GPU 將會被映射成第一、二、三塊邏輯 GPU，也就是說 tensor.cuda(1) 將會把 Tensor 轉移到第三塊物理 GPU 上，

設定 CUDA_VISIBLE_DEVICES 有兩種方法：

一種是在命令列中 CUDA_VISIBLE_DEVICES=0,1 python main.py；
另一種是在程式中 import os; os.environ["CUDA_VISIBLE_DEVICES"] = "2"

上述一般都是自己使用的情況，在實際工程中，可能還會用到分布式 GPU，由于一般人員使用不到這種方法，這里不做贅述，想詳細了解的可以看官方檔案——GPU 分布式通信

轉載請註明出處，本文鏈接：https://www.uj5u.com/qita/281978.html

標籤：其他

上一篇：從0開始學游戲開發,游戲開發入門

下一篇：Nginx學習筆記：配置