1.在訓練模型時指定GPU的編號
- 設定當前使用的GPU設備僅為0號設備,設備名稱為"/gpu:0",
os.environ["CUDA_VISIBLE_DEVICES"]="0"; - 設定當前使用的GPU設備為0,1兩個設備,名稱依次為"/gpu:0","/gpu:1",
os.environ["CUDA_VISIBLE_DEVICES"]="0,1";根據順序優先表示使用0號設備,然后使用1號設備; - 同樣,也可以在訓練腳本外面指定,
CUDA_VISIBLE_DEVICES=0,1 python train.py,注意,如果此時使用的是8卡中的6和7,CUDA_VISIBLE_DEVICES=6,7 python train.py,但是在模型并行化的時候,仍然指定0和1,model=nn.DataParallel(mode, devices=[0,1];
在這里,需要注意的是,指定GPU的命令需要放在和網路模型操作的最前面;
2.查看模型每層的輸如輸出詳情
- 1.需要安裝torchsummary或者torchsummaryX(pip install torchsummary);
- 2.使用示例如下:
from torchvision import models
vgg16 = models.vgg16()
vgg16 = vgg16.cuda()
# 1.torchsummary使用方法
from torchsummary import summary
summary(vgg16, (3, 224, 224)) # (3, 224, 224)是網路模型的輸入尺寸
# 2.torchsummaryX使用方法
from torchsummaryX import summary as summaryX
inputx = torch.randn(1, 3, 224, 224)
summaryX(vgg16, inputx)
輸出的結果如下圖所示(每層輸出的shape以及模型的計算量):

3.梯度裁剪:防止在模型優化程序中出現梯度爆炸或者彌散
import torch
import torch.nn as nn
...
outputx = model(inputx)
optimizer.zero_grad()
loss.backward()
nn.utils.clip_grad_norm_(model.parameters(), max_norm=20, norm_type=2)
optimizer.step()
nn.utils.clip_grad_norm_的引數:
- parameters:基于變數的迭代器,會進行梯度歸一化;
- max_norm:梯度的最大范數;
- norm_type:規定范數的型別,默認為L2;
- 需要注意的是,梯度裁剪在某些任務上會額外消耗大量的計算時間,
4.擴張單張圖片的維度
因為在模型訓練的時候,輸入資料的維度是(batch_size,c,h,w),而在測驗的時候是單張圖片(c,h,w),所以會需要進行維度擴張
import cv2
import torch
import numpy as np
####### 基于numpy的方法 #########
# 方法1.
image = cv2.imread(imgpath)
print(image.shape)
image = image[np.newaxis, :, :, :]
print(image.shape)
####### 基于pytorch的方法 #########
# 方法2.
image = cv2.imread(imgpath)
image = torch.tensor(image)
print(image.shape)
image = image.view(1, *image.shape)
print(image.shape)
# 方法3.
image = cv2.imread(imgpath)
image = torch.tensor(image)
print(image.shape)
image = image.unsqueeze(dim=0)
print(image.shape)
tensor.unsqueeze(dim):擴展維度,dim指定擴展哪個維度;tensor.squeeze(dim):去除dim指定的且size為1的維度,當維度都大于1時,seqeeze()不起作用,不指定dim時,去除所有size為1的維度,
5.one-hot編碼
在PyTorch里面的定義的交叉熵的時候,會自動把label轉換成one-hot編碼,所以不需要手動轉換,而使用MSE需要手動轉換成one-hot編碼,以下是轉換示例:
import torch
class_num = 8
batch_size = 4
def one_hot(label):
"""
Convert the label of one division to one-hot
Argument:
label: (type, tensor), the gt label, shape: (batch_size,)
Return:
one_hot_out: (type, tensor), the one-hot label, shape: (batch_size, class_num)
"""
label = label.resize_(batch_size, 1)
m_zeros = torch.zeros(batch_size, class_num)
one_hot_out = m_zeros.scatter_(1, label, 1) # (dim, index, value)
return one_hot_out
label = torch.LongTensor(batch_size).random_() % class_num
print(one_hot(label))
在PyTorch1.1之后,one_hot函式可以直接呼叫torch.nn.functional.one_hot
import torch
import torch.nn.functional as F
tensor = torch.arange(0, 5) % 3
one_hot = F.one_hot(tensor)
# F.one_hot會檢測不同類別的個數,生成對應的one-hot,也可以自己定義類別數
one_hot = F.one_hot(tensor, num_classes=10)
6.在驗證模型時,防止顯存爆炸
在驗證模型的程序中是不需要求導,既不需要梯度計算,關閉autograd,可以提高速度,節約記憶體,如果不關閉可能會爆顯存:
with torch.no_grad():
model.eval()
7.學習率的衰減策略
在模型的訓練程序中動態地調整學習率,避免陷入區域優化點,
import torch
import torch.optim as optim
from torch.optim import lr_scheduler
# init optimier
optimizer = optim.Adam(model.parameters(), lr=0.001)
scheduler = lr_scheduler.StepLR(optimizer, 10, 0.1) # 每隔10個epoch,學習率乘以0.1
# train process
for n in n_epoch:
scheduler.step()
...
8.訓練程序中凍結某些層的引數
當加載預訓練模型的時候,或者在遷移學習中的分類模型,需要凍結前面幾層,保證其features不動,使其在訓練程序中不發生變化,
from torchvision import models
net = models.vgg16()
for name, value in net.named_parameters():
print('name: {0}, \t grad: {1}'.format(name, value.requires_grad)
no_grad = ['cnn.VGG_16.convolution1_1.weight',
'cnn.VGG_16.convolution1_1.bias'
]
for name, value in net.named_parameters():
if name in no_grad:
value.requires_grad = False
else:
value.requires_grad = True
# 定義優化器
optimizer = optim.Adam(filter(lambda p: p.requires_grad, net.parameters()), lr=0.01)
9.訓練程序中針對不同的層設定不同的學習率
根據模型在優化程序中,會根據需要,對不同的層,設定不同的的學習率,代碼如下:
from torchvision import models
net = models.vgg16()
for name, value in net.named_parameters():
print('name: {}'.format(name)
# split the layer according to the key words,
# feature layers:finetune,classifiery layers:from scratch
conv_params = []
fc_params = []
for name, params in net.named_parameters():
if 'conv' in name:
conv_params += [params]
else:
fc_params += [params]
# define the optimizer
optimizer = optim.Adam([
{'params': conv_params, 'lr': 1e-4},
{'params': fc_params, 'lr': 1e-2}], weight_decay=1e-3)
將模型層劃分為兩部分,存放于一個串列中,每個部分就對應上面的一個字典,在字典里設定不同的學習率,當這兩部分有相同的其他引數時,就將該引數放到串列外面作為全域引數,就像上面的’weight_decay’,也可以在串列外面設定一個全域學習率,當各個部分字典里設定了區域學習率時,就使用該學習率,否則就使用串列外面的全域學習率optimizer = optim.Adam([{'params': conv_params, 'lr': 1e-4}], lr=1e-2, weight_decay=1e-3)
10.模型的保存和加載方式
在模型的訓練程序中需要對模型進行保存,使用模型的時候需要加載訓練好的模型,Pytorch中保存和加載模型的主要分為兩類:1. 保存加載整個模型;2. 只保存加載模型引數;
1.保存加載模型基本用法
- 保存加載整個模型(網路結構+模型的引數,比較耗時)
# save model
torch.save(model, 'net.pkl')
# load model
model = torch.load('net.pkl') # the model must have be defined
- 只保存加載模型引數(速度快,占記憶體少,推薦方法)
# save model parameters
torch.save(model.state_dict(), 'net_params.pkl'
# load model parameters, must build model firstly, load parameters secondly
model = Net()
state_dict = torch.load('net_params.pkl')
model.load_state_dict(state_dict)
2.保存加載自定義模型
上面保存的net.pkl檔案其實是一個字典,通常包括以下內容: a.網路結構:輸入尺寸,輸出尺寸以及隱含層資訊,以便能夠在加載時重建模型; b.模型的權重引數:包括各個網路層訓練后的可學習引數,可以在模型實體上呼叫state_dict()方法來獲取,比如只保存模型權重引數時用到的model.state_dict(); c.優化器引數:有時候保存模型之后需要接著訓練,那么就必須保存優化器的狀態和所使用的超引數,也就是在優化器實體上呼叫state_dict()方法來獲取這些引數; d.其他資訊:有時候需要保存其他資訊,比如epoch,batch_size等超引數, 這樣就可以自定義需要保存的內容,如下所示,
# saving a checkpoint assuming the network class named Net
checkpoint = {
'model':Net(),
'model_state_dict':model.state_dict(),
'optimizer_state_dict':optimizer.state_dict(),
'epoch':epoch
}
torch.save(chekpoint, 'checkpoint.pkl')
# load the model infor
def load_checkpoint(filepath):
checkpoint = torch.load(filepath)
model = checkpoint['model'] # 網路結構
model.load_state_dict(checkpoint['model_state_dict']) # 加載網路模型引數
optimizer = optim.SGD()
optimizer.load_state_dict(checkpoint['optimizer_state_dict']) # 加載優化器引數
for params in model.parameters():
params.requires_grad = False
model.eval()
return model
model = load_checkpoint('checkpoint.pkl')
加載模型是為了進行測驗,則將每一層的requires_grad置為False,固定這些引數;還需要呼叫model.eval()將模型置為測驗模式,主要是將Dropout和BatchNormalization進行固定,否則模型的預測結果每次都會不同,如果繼續訓練,則呼叫model.train()確保網路模型處于訓練模式,
3.跨設備保存加載模型
-
在GPU上訓練的模型,在CPU上加載(Save on GPU, Load on CPU):
device = torch.device('cpu') model = Net() # load all tensors onto the CPU device model.load_state_dict(torch.load('net_params.pkl', map_location=device)) # <===> model.load_state_dict(torch.load('net_params.pkl', map_location='cpu')) -
在GPU上訓練的模型,在GPU上加載(Save on GPU, Load on GPU):
device = torch.device('cuda') model = Net() model.load_state_dict(torch.load('net_params.pkl')) model.to(device)
在這里使用map_location引數不起作用,要使用model.to(torch.device("cuda"))將模型轉換為CUDA優化的模型,
還需要對將輸入模型的資料呼叫data=data.to(device),即將資料從CPU轉到GPU,注意,呼叫my_tensor.to(device)會回傳一個my_tensor在GPU上的副本,它不會覆寫my_tensor,因此需要手動覆寫張量:my_tensor = my_tensor.to(device)
-
在CPU上訓練的模型,在GPU上加載(Save on CPU, Load on GPU):
device = torch.device('cuda') model = Net() model.load_state_dict(torch.load('net_params.pkl', map_location='cuda:0')) model.to(device)
11.GPU相關的幾個函式
# 判斷cuda時候可用
print(torch.cuda.is_available()
# 獲取gpu數量
print(torch.cuda.device_count()
# 獲取gpu名字
print(torch.cuda.get_device_name(0))
# 獲取當前gpu設備索引,默認從0開始
print(torch.cuda.current_device())
# 將模型和資料從cpu移到gpu
use_cuda = torch.cuda.is_available()
# 方法1
if use_cuda:
data = data.cuda()
model.cuda()
# 方法2
device = torch.device('cuda' if use_cuda else 'cpu')
data = data.to(device)
model.to(device)
12.列印模型在inference中的特征圖
- 包裝模型(在forward中輸出特征圖);
import os
import cv2
import numpy as np
from PIL import Image
import torch
import torch.nn as nn
import torch.nn.functional as F
import torchvision.models as models
class FeatureVisualizaiton:
input_size = 256
def __init__(self, imgpath='', layers_idx=[1, 2], save_features_dir='/'):
self.imgpath = imgpath
self.layers_idx = layers_idx
self.save_features_dir= save_features_dir
self.net = models.vgg16()
@staticmethod
def preprocess_image(imgpath):
assert os.path.isfile(imgpath), "The image of {%s} must be existed!" % imgpath
img = cv2.imread(imgpath)
# resize
img = cv2.resize(img, (input_size, input_size))
# normalize as [0, 1]
img = (img / 255.).astype('float32').transpose((2, 0, 1))[np.newaxis, :, :, :] # (1, 3, 256, 256)
# <===>
# img = (img / 255.).astype('float32').swapaxis(1, 2).swapaxis(0, 1)
# img = np.expand_dims(img, axis=0)
img = torch.from_numpy(img)
return img
def get_features(self):
"""Extract features"""
features = {}
inputx = self.preprocess_image(self.imgpath)
print('inputx shape', inputx.shape)
if torch.cuda.is_available():
inputx = inputx.cuda()
model = self.net.cuda()
x = inputx
for index, (name, module) in enumerate(model.named_modules()):
x = module(x)
if index in self.layers_idx:
features[name] = x
return features
def save_features(self):
"""Save features"""
features = self.get_features()
for name, feature in features.items():
feature = self.process_feature(feature)
cv2.imwrite(os.path.join(self.save_features_dir, name + '.jpg'), feature)
@statcimethod
def process_feature(feature):
"""
Normalize the feature
Arguments:
feature: (type, tensor(b, c, h, w)), normalize to (0, 255)
"""
feature = feature.cpu().detach().numpy()
# use sigmoid to [0, 1]
feature = (1.0 / (1 + np.exp(-1 * feature))
feature = np.round(feature * 255)
return feature
if __name__ == '__main__':
featurevisualization = FeatureVisualization()
featurevisualization.save_features()
- 使用hook:利用pytorch里面的hook,可以不改變輸入輸出中間的網路結構,可以方便的獲取,改變網路中間層的值和梯度(幾種hook和forward,backward的先后關系在
nn.module的__call__函式里面可以看得更清楚),可以看到,對于register_forward_hook在forward的呼叫之后,
import os
import cv2
import numpy as np
from PIL import Image
import torch
import torch.nn as nn
import torch.nn.functional as F
import torchvision.models as models
class FeatureVisualizaiton:
input_size = 256
def __init__(self, imgpath='', layers_idx=[1, 2], save_features_dir='/'):
self.imgpath = imgpath
self.layers_idx = layers_idx
self.save_features_dir= save_features_dir
self.net = models.vgg16()
@staticmethod
def preprocess_image(imgpath):
assert os.path.isfile(imgpath), "The image of {%s} must be existed!" % imgpath
img = cv2.imread(imgpath)
# resize
img = cv2.resize(img, (input_size, input_size))
# normalize as [0, 1]
img = (img / 255.).astype('float32').transpose((2, 0, 1))[np.newaxis, :, :, :] # (1, 3, 256, 256)
# <===>
# img = (img / 255.).astype('float32').swapaxis(1, 2).swapaxis(0, 1)
# img = np.expand_dims(img, axis=0)
img = torch.from_numpy(img)
return img
def get_features(self):
"""Extract features"""
features = {}
inputx = self.preprocess_image(self.imgpath)
print('inputx shape', inputx.shape)
if torch.cuda.is_available():
inputx = inputx.cuda()
model = self.net.cuda()
# closure
def get_activation(name):
def hook(model, input, output):
features[name] = output.detach()
return hook
# register hook
for layer_idx in self.layers_idx:
handle = model[layer_idx].register_forward_hook(get_activation(str(layer_idx))
outputx = model(inputx)
handle.remove()
return features
def save_features(self):
"""Save features"""
features = self.get_features()
for name, feature in features.items():
feature = self.process_feature(feature)
cv2.imwrite(os.path.join(self.save_features_dir, name + '.jpg'), feature)
@statcimethod
def process_feature(feature):
"""
Normalize the feature
Arguments:
feature: (type, tensor(b, c, h, w)), normalize to (0, 255)
"""
feature = feature.cpu().detach().numpy()
# use sigmoid to [0, 1]
feature = (1.0 / (1 + np.exp(-1 * feature))
feature = np.round(feature * 255)
return feature
if __name__ == '__main__':
featurevisualization = FeatureVisualization()
featurevisualization.save_features()
13.Tensor型別之間的轉換(三種方式)
-
使用獨立函式:
import torch import torch.nn as nn x = torch.randn(3, 5) print(x) # convert x as long x_long = x.long() # convert x as half x_half = x.half() # convert x as int x_int = x.int() # convert x as double x_double = x.double() # convert x as float x_float = x.float() # convert x as char x_char = x.char() # convert x as byte x_byte = x.byte() # convert x as short x_short = x.short() -
使用**torch.type()**函式:
import torch import torch.nn as nn x = torch.randn(3, 5) x_int = x.type(torch.IntTensor) print(x_int) -
使用**type_as(ano_tensor)**將tensor轉換為給定型別的tensor:
import torch import torch.nn as nn x = torch.FloatTensor(5) y = torch.IntTensor([10, 20]) x_int = x.type_as(y) assert isinstance(x_int, torch.IntTensor)
該文章總結了自己在pytorch使用程序中的一些小技識訓累,后續會持續更新,如果有錯誤不當之處,歡迎各位大牛批評指正!
轉載請註明出處,本文鏈接:https://www.uj5u.com/qita/255142.html
標籤:AI
上一篇:用Python標記識別人臉制作鏤空圖案的“笑臉”照片墻
下一篇:C++初識
