深度學習框架PyTorch的技巧總結-有解無憂

1.在訓練模型時指定GPU的編號

設定當前使用的GPU設備僅為0號設備，設備名稱為"/gpu:0"，os.environ["CUDA_VISIBLE_DEVICES"]="0";
設定當前使用的GPU設備為0，1兩個設備，名稱依次為"/gpu:0","/gpu:1"，os.environ["CUDA_VISIBLE_DEVICES"]="0,1";根據順序優先表示使用0號設備，然后使用1號設備；
同樣，也可以在訓練腳本外面指定，CUDA_VISIBLE_DEVICES=0,1 python train.py,注意，如果此時使用的是8卡中的6和7，CUDA_VISIBLE_DEVICES=6,7 python train.py，但是在模型并行化的時候，仍然指定0和1，model=nn.DataParallel(mode, devices=[0,1];
在這里，需要注意的是，指定GPU的命令需要放在和網路模型操作的最前面；

2.查看模型每層的輸如輸出詳情

1.需要安裝torchsummary或者torchsummaryX(pip install torchsummary);
2.使用示例如下：

from torchvision import models

vgg16 = models.vgg16()
vgg16 = vgg16.cuda()

# 1.torchsummary使用方法
from torchsummary import summary
summary(vgg16, (3, 224, 224))    # (3, 224, 224)是網路模型的輸入尺寸

# 2.torchsummaryX使用方法
from torchsummaryX import summary as summaryX

inputx = torch.randn(1, 3, 224, 224)
summaryX(vgg16, inputx)

輸出的結果如下圖所示(每層輸出的shape以及模型的計算量)：
輸出結果

3.梯度裁剪：防止在模型優化程序中出現梯度爆炸或者彌散

import torch
import torch.nn as nn

...
outputx = model(inputx)
optimizer.zero_grad()
loss.backward()
nn.utils.clip_grad_norm_(model.parameters(), max_norm=20, norm_type=2)
optimizer.step()

nn.utils.clip_grad_norm_的引數：

parameters:基于變數的迭代器，會進行梯度歸一化；
max_norm:梯度的最大范數；
norm_type:規定范數的型別，默認為L2;
需要注意的是，梯度裁剪在某些任務上會額外消耗大量的計算時間，

4.擴張單張圖片的維度

因為在模型訓練的時候，輸入資料的維度是(batch_size,c,h,w)，而在測驗的時候是單張圖片(c,h,w)，所以會需要進行維度擴張

import cv2
import torch
import numpy as np
    
####### 基于numpy的方法 #########
# 方法1.
image = cv2.imread(imgpath)
print(image.shape)
image = image[np.newaxis, :, :, :]
print(image.shape)   

####### 基于pytorch的方法 #########
# 方法2.
image = cv2.imread(imgpath)
image = torch.tensor(image)
print(image.shape)
image = image.view(1, *image.shape)
print(image.shape)

# 方法3.
image = cv2.imread(imgpath)
image = torch.tensor(image)
print(image.shape)
image = image.unsqueeze(dim=0)
print(image.shape)

tensor.unsqueeze(dim):擴展維度，dim指定擴展哪個維度；tensor.squeeze(dim):去除dim指定的且size為1的維度，當維度都大于1時，seqeeze()不起作用，不指定dim時，去除所有size為1的維度，

5.one-hot編碼

在PyTorch里面的定義的交叉熵的時候，會自動把label轉換成one-hot編碼，所以不需要手動轉換，而使用MSE需要手動轉換成one-hot編碼，以下是轉換示例：

import torch
class_num = 8
batch_size = 4

def one_hot(label):
	"""
	Convert the label of one division to one-hot
	Argument:
		label: (type, tensor), the gt label, shape: (batch_size,)
	Return:
		one_hot_out: (type, tensor), the one-hot label, shape: (batch_size, class_num)
	"""
	label = label.resize_(batch_size, 1)
	m_zeros = torch.zeros(batch_size, class_num)
	one_hot_out = m_zeros.scatter_(1, label, 1)    # (dim, index, value)
	return one_hot_out

label = torch.LongTensor(batch_size).random_() % class_num
print(one_hot(label))

在PyTorch1.1之后，one_hot函式可以直接呼叫torch.nn.functional.one_hot

import torch
import torch.nn.functional as F

tensor = torch.arange(0, 5) % 3
one_hot = F.one_hot(tensor)

# F.one_hot會檢測不同類別的個數，生成對應的one-hot，也可以自己定義類別數
one_hot = F.one_hot(tensor, num_classes=10)

6.在驗證模型時，防止顯存爆炸

在驗證模型的程序中是不需要求導，既不需要梯度計算，關閉autograd，可以提高速度，節約記憶體，如果不關閉可能會爆顯存：

with torch.no_grad():
	model.eval()

7.學習率的衰減策略

在模型的訓練程序中動態地調整學習率，避免陷入區域優化點，

import torch
import torch.optim as optim
from torch.optim import lr_scheduler

# init optimier
optimizer = optim.Adam(model.parameters(), lr=0.001)
scheduler = lr_scheduler.StepLR(optimizer, 10, 0.1)     # 每隔10個epoch，學習率乘以0.1

# train process
for n in n_epoch:
	scheduler.step()
...

8.訓練程序中凍結某些層的引數

當加載預訓練模型的時候，或者在遷移學習中的分類模型，需要凍結前面幾層，保證其features不動，使其在訓練程序中不發生變化，

from torchvision import models

net = models.vgg16()
for name, value in net.named_parameters():
	print('name: {0}, \t grad: {1}'.format(name, value.requires_grad)
    
no_grad = ['cnn.VGG_16.convolution1_1.weight', 
            'cnn.VGG_16.convolution1_1.bias'
          ]
   
for name, value in net.named_parameters():
    if name in no_grad:
        value.requires_grad = False
    else:
        value.requires_grad = True
            
# 定義優化器
optimizer = optim.Adam(filter(lambda p: p.requires_grad, net.parameters()), lr=0.01)

9.訓練程序中針對不同的層設定不同的學習率

根據模型在優化程序中，會根據需要，對不同的層，設定不同的的學習率，代碼如下：

from torchvision import models

net = models.vgg16()
for name, value in net.named_parameters():
	print('name: {}'.format(name)
    
# split the layer according to the key words，
# feature layers:finetune，classifiery layers:from scratch
conv_params = []
fc_params = []
for name, params in net.named_parameters():
	if 'conv' in name:
    	conv_params += [params]
    else:
    	fc_params += [params]
        
# define the optimizer
optimizer = optim.Adam([
            	{'params': conv_params, 'lr': 1e-4}, 
                {'params': fc_params, 'lr': 1e-2}], weight_decay=1e-3)

將模型層劃分為兩部分，存放于一個串列中，每個部分就對應上面的一個字典，在字典里設定不同的學習率，當這兩部分有相同的其他引數時，就將該引數放到串列外面作為全域引數，就像上面的’weight_decay’，也可以在串列外面設定一個全域學習率，當各個部分字典里設定了區域學習率時，就使用該學習率，否則就使用串列外面的全域學習率optimizer = optim.Adam([{'params': conv_params, 'lr': 1e-4}], lr=1e-2, weight_decay=1e-3)

10.模型的保存和加載方式

在模型的訓練程序中需要對模型進行保存，使用模型的時候需要加載訓練好的模型，Pytorch中保存和加載模型的主要分為兩類：1. 保存加載整個模型；2. 只保存加載模型引數；

1.保存加載模型基本用法

保存加載整個模型(網路結構+模型的引數，比較耗時)

# save model
torch.save(model, 'net.pkl')

# load model 
model = torch.load('net.pkl')     # the model must have be defined

只保存加載模型引數(速度快，占記憶體少，推薦方法)

# save model parameters
torch.save(model.state_dict(), 'net_params.pkl'

# load model parameters， must build model firstly, load parameters secondly
model = Net()
state_dict = torch.load('net_params.pkl')
model.load_state_dict(state_dict)

2.保存加載自定義模型

上面保存的net.pkl檔案其實是一個字典，通常包括以下內容： a.網路結構：輸入尺寸，輸出尺寸以及隱含層資訊，以便能夠在加載時重建模型； b.模型的權重引數：包括各個網路層訓練后的可學習引數，可以在模型實體上呼叫state_dict()方法來獲取，比如只保存模型權重引數時用到的model.state_dict(); c.優化器引數：有時候保存模型之后需要接著訓練，那么就必須保存優化器的狀態和所使用的超引數，也就是在優化器實體上呼叫state_dict()方法來獲取這些引數； d.其他資訊：有時候需要保存其他資訊，比如epoch,batch_size等超引數，這樣就可以自定義需要保存的內容，如下所示，

# saving a checkpoint assuming the network class named Net
checkpoint = {
    'model':Net(), 
    'model_state_dict':model.state_dict(), 
    'optimizer_state_dict':optimizer.state_dict(),
    'epoch':epoch
}

torch.save(chekpoint, 'checkpoint.pkl')

# load the model infor
def load_checkpoint(filepath):
    checkpoint = torch.load(filepath)
    model = checkpoint['model']     # 網路結構
    model.load_state_dict(checkpoint['model_state_dict'])    # 加載網路模型引數
    optimizer = optim.SGD()
    optimizer.load_state_dict(checkpoint['optimizer_state_dict'])    # 加載優化器引數

    for params in model.parameters():
        params.requires_grad = False
    
    model.eval()
    return model 

model = load_checkpoint('checkpoint.pkl')

加載模型是為了進行測驗，則將每一層的requires_grad置為False，固定這些引數；還需要呼叫model.eval()將模型置為測驗模式，主要是將Dropout和BatchNormalization進行固定，否則模型的預測結果每次都會不同，如果繼續訓練，則呼叫model.train()確保網路模型處于訓練模式，

3.跨設備保存加載模型

在GPU上訓練的模型，在CPU上加載(Save on GPU, Load on CPU):

device = torch.device('cpu')
model = Net()
# load all tensors onto the CPU device
model.load_state_dict(torch.load('net_params.pkl', map_location=device))
# <===> model.load_state_dict(torch.load('net_params.pkl', map_location='cpu'))

在GPU上訓練的模型，在GPU上加載(Save on GPU, Load on GPU):

device = torch.device('cuda')
model = Net()
model.load_state_dict(torch.load('net_params.pkl'))
model.to(device)

在這里使用map_location引數不起作用，要使用model.to(torch.device("cuda"))將模型轉換為CUDA優化的模型，

還需要對將輸入模型的資料呼叫data=data.to(device)，即將資料從CPU轉到GPU，注意，呼叫my_tensor.to(device)會回傳一個my_tensor在GPU上的副本，它不會覆寫my_tensor，因此需要手動覆寫張量：my_tensor = my_tensor.to(device)

在CPU上訓練的模型，在GPU上加載(Save on CPU, Load on GPU):

device = torch.device('cuda')
model = Net()
model.load_state_dict(torch.load('net_params.pkl', map_location='cuda:0'))
model.to(device)

11.GPU相關的幾個函式

# 判斷cuda時候可用
print(torch.cuda.is_available()

# 獲取gpu數量
print(torch.cuda.device_count()

# 獲取gpu名字
print(torch.cuda.get_device_name(0))

# 獲取當前gpu設備索引，默認從0開始
print(torch.cuda.current_device())

# 將模型和資料從cpu移到gpu
use_cuda = torch.cuda.is_available()

# 方法1
if use_cuda:
    data = data.cuda()
    model.cuda()

# 方法2
device = torch.device('cuda' if use_cuda else 'cpu')
data = data.to(device)
model.to(device)

12.列印模型在inference中的特征圖

包裝模型(在forward中輸出特征圖);

import os
import cv2
import numpy as np
from PIL import Image

import torch
import torch.nn as nn
import torch.nn.functional as F
import torchvision.models as models


class FeatureVisualizaiton:
    input_size = 256
    def __init__(self, imgpath='', layers_idx=[1, 2], save_features_dir='/'):
        self.imgpath = imgpath
        self.layers_idx = layers_idx
        self.save_features_dir= save_features_dir
        self.net = models.vgg16()
    
    @staticmethod
    def preprocess_image(imgpath):
        assert os.path.isfile(imgpath), "The image of {%s} must be existed!" % imgpath
        img = cv2.imread(imgpath)
        # resize
        img = cv2.resize(img, (input_size, input_size))
        # normalize as [0, 1]
        img = (img / 255.).astype('float32').transpose((2, 0, 1))[np.newaxis, :, :, :]   # (1, 3, 256, 256)
        # <===>
        # img = (img / 255.).astype('float32').swapaxis(1, 2).swapaxis(0, 1)
        # img = np.expand_dims(img, axis=0)
        img = torch.from_numpy(img)
        return img
       
    def get_features(self):
        """Extract features"""
        features = {}
        inputx = self.preprocess_image(self.imgpath)
        print('inputx shape', inputx.shape)
        if torch.cuda.is_available():
            inputx = inputx.cuda()
            model = self.net.cuda()
            
        x = inputx 
        for index, (name, module) in enumerate(model.named_modules()):
            x = module(x)
            if index in self.layers_idx:
                features[name] = x
        return features
        
    def save_features(self):
        """Save features"""
        features = self.get_features()
        for name, feature in features.items():
            feature = self.process_feature(feature)
            cv2.imwrite(os.path.join(self.save_features_dir, name + '.jpg'), feature)
        
        
    @statcimethod
    def process_feature(feature):
        """
        Normalize the feature
        Arguments:
            feature: (type, tensor(b, c, h, w)), normalize to (0, 255) 
        """
        feature = feature.cpu().detach().numpy()
        
        # use sigmoid to [0, 1]
        feature = (1.0 / (1 + np.exp(-1 * feature))
        feature = np.round(feature * 255)
        return feature

if __name__ == '__main__':
    featurevisualization = FeatureVisualization()
    featurevisualization.save_features()

使用hook:利用pytorch里面的hook，可以不改變輸入輸出中間的網路結構，可以方便的獲取，改變網路中間層的值和梯度(幾種hook和forward，backward的先后關系在nn.module的__call__函式里面可以看得更清楚)，可以看到，對于register_forward_hook在forward的呼叫之后，

import os
import cv2
import numpy as np
from PIL import Image

import torch
import torch.nn as nn
import torch.nn.functional as F
import torchvision.models as models


class FeatureVisualizaiton:
    input_size = 256
    def __init__(self, imgpath='', layers_idx=[1, 2], save_features_dir='/'):
        self.imgpath = imgpath
        self.layers_idx = layers_idx
        self.save_features_dir= save_features_dir
        self.net = models.vgg16()
    
    @staticmethod
    def preprocess_image(imgpath):
        assert os.path.isfile(imgpath), "The image of {%s} must be existed!" % imgpath
        img = cv2.imread(imgpath)
        # resize
        img = cv2.resize(img, (input_size, input_size))
        # normalize as [0, 1]
        img = (img / 255.).astype('float32').transpose((2, 0, 1))[np.newaxis, :, :, :]   # (1, 3, 256, 256)
        # <===>
        # img = (img / 255.).astype('float32').swapaxis(1, 2).swapaxis(0, 1)
        # img = np.expand_dims(img, axis=0)
        img = torch.from_numpy(img)
        return img
       
    def get_features(self):
        """Extract features"""
        features = {}
        inputx = self.preprocess_image(self.imgpath)
        print('inputx shape', inputx.shape)
        if torch.cuda.is_available():
            inputx = inputx.cuda()
            model = self.net.cuda()
        
        # closure
        def get_activation(name):
            def hook(model, input, output):
                features[name] = output.detach()
            return hook
        
        # register hook
        for layer_idx in self.layers_idx:
            handle = model[layer_idx].register_forward_hook(get_activation(str(layer_idx))

        outputx = model(inputx)
        handle.remove()
        
        return features
        
    def save_features(self):
        """Save features"""
        features = self.get_features()
        for name, feature in features.items():
            feature = self.process_feature(feature)
            cv2.imwrite(os.path.join(self.save_features_dir, name + '.jpg'), feature)
        
    @statcimethod
    def process_feature(feature):
        """
        Normalize the feature
        Arguments:
            feature: (type, tensor(b, c, h, w)), normalize to (0, 255) 
        """
        feature = feature.cpu().detach().numpy()
        
        # use sigmoid to [0, 1]
        feature = (1.0 / (1 + np.exp(-1 * feature))
        feature = np.round(feature * 255)
        return feature

if __name__ == '__main__':
    featurevisualization = FeatureVisualization()
    featurevisualization.save_features()

13.Tensor型別之間的轉換(三種方式)

使用獨立函式：

import torch
import torch.nn as nn
    
x = torch.randn(3, 5)
print(x)
# convert x as long
x_long = x.long()
# convert x as half
x_half = x.half()
# convert x as int 
x_int = x.int()
# convert x as double
x_double = x.double()
# convert x as float
x_float = x.float()
# convert x as char
x_char = x.char()
# convert x as byte
x_byte = x.byte()
# convert x as short
x_short = x.short()

使用**torch.type()**函式：

import torch
import torch.nn as nn
    
x = torch.randn(3, 5)
x_int = x.type(torch.IntTensor)
print(x_int)

使用**type_as(ano_tensor)**將tensor轉換為給定型別的tensor:

import torch
import torch.nn as nn
    
x = torch.FloatTensor(5)    
y = torch.IntTensor([10, 20])
    
x_int = x.type_as(y)
assert isinstance(x_int, torch.IntTensor)

該文章總結了自己在pytorch使用程序中的一些小技識訓累，后續會持續更新，如果有錯誤不當之處，歡迎各位大牛批評指正！

轉載請註明出處，本文鏈接：https://www.uj5u.com/qita/255142.html

標籤：AI

上一篇：用Python標記識別人臉制作鏤空圖案的“笑臉”照片墻

下一篇：C++初識