超解析度重建測驗（ESRGAN）-有解無憂

測驗鏈接：GitHub - xinntao/BasicSR: Open Source Image and Video Restoration Toolbox for Super-resolution, Denoise, Deblurring, etc. Currently, it includes EDSR, RCAN, SRResNet, SRGAN, ESRGAN, EDVR, BasicVSR, SwinIR, ECBSR, etc. Also support StyleGAN2, DFDNet.

上面這個鏈接里提供了很多模型，這里簡單測驗一下ESRGAN的實際效果，本人使用500張1024*1024的人臉資料對模型進行了測驗

1.資料準備

訓練資料主要是通過resize得到，首先是將1024大小的影像resize成512作為訓練時的原始影像，其中resize選擇最近鄰元方式，這么做是因為1024太大算力有限，低解析度影像也是通過resize得到，大小為128*128，并且利用高斯模糊降低影像質量，具體如下：

高解析度訓練影像（512*512）

低質量低解析度模擬訓練影像（128*128）

說明：人為降低影像質量和自然條件下的低質量肯定是由一定區別的，這里因為沒有資料，所以只能先這樣測驗了

資料集來自這里：新資料集

訓練資料目錄

face_sub中放入的是512*512影像，face_X2_sub中放入128*128影像

GT中放入的是512*512影像，LR放入128*128影像，和上面是一樣的，用于訓練時的驗證

2.修改組態檔

因為用的ESRGAN，找到對應的組態檔，在BasicSR/options/train/ESRGAN/train_ESRGAN_x4.yml，需要注意的是組態檔里scale引數那個4對應了代碼中對訓練影像的要求，也就是高質量影像與低質量影像之間尺寸大小是4倍的關系（512/128=4）

需要更改的主要地方就是路徑，dataroot_gt放高質量影像路徑，dataroot_lq放低質量影像路徑，pretrain_network_g放預訓練模型，作者有提供的，我的訓練配置如下：

# general settings
name: 052_ESRGAN_x4_f64b23_DIV2K_400k_B16G1_051pretrain_wandb
model_type: ESRGANModel
scale: 4
num_gpu: 1  # set num_gpu: 0 for cpu mode
manual_seed: 0

# dataset and data loader settings
datasets:
  train:
    name: DIV2K
    type: PairedImageDataset
    dataroot_gt: /home/zc/wcs/BasicSR/datasets/train/face_sub
    dataroot_lq: /home/zc/wcs/BasicSR/datasets/train/face_X2_sub
    # (for lmdb)
    # dataroot_gt: datasets/DIV2K/DIV2K_train_HR_sub.lmdb
    # dataroot_lq: datasets/DIV2K/DIV2K_train_LR_bicubic_X4_sub.lmdb
    filename_tmpl: '{}'
    io_backend:
      type: disk
      # (for lmdb)
      # type: lmdb

    gt_size: 128
    use_flip: true
    use_rot: true

    # data loader
    use_shuffle: true
    num_worker_per_gpu: 6
    batch_size_per_gpu: 8
    dataset_enlarge_ratio: 100
    prefetch_mode: ~

  val:
    name: Set14
    type: PairedImageDataset
    dataroot_gt: /home/zc/wcs/BasicSR/datasets/val/GT
    dataroot_lq: /home/zc/wcs/BasicSR/datasets/val/LR
    io_backend:
      type: disk

# network structures
network_g:
  type: RRDBNet
  num_in_ch: 3
  num_out_ch: 3
  num_feat: 64
  num_block: 23

network_d:
  type: VGGStyleDiscriminator128
  num_in_ch: 3
  num_feat: 64

# path
path:
  pretrain_network_g: experiments/ESRGAN_PSNR_SRx4_DF2K_official-150ff491.pth
  strict_load_g: true
  resume_state: ~

# training settings
train:
  ema_decay: 0.999
  optim_g:
    type: Adam
    lr: !!float 1e-4
    weight_decay: 0
    betas: [0.9, 0.99]
  optim_d:
    type: Adam
    lr: !!float 1e-4
    weight_decay: 0
    betas: [0.9, 0.99]

  scheduler:
    type: MultiStepLR
    milestones: [50000, 100000, 200000, 300000]
    gamma: 0.5

  total_iter: 400000
  warmup_iter: -1  # no warm up

  # losses
  pixel_opt:
    type: L1Loss
    loss_weight: !!float 1e-2
    reduction: mean
  perceptual_opt:
    type: PerceptualLoss
    layer_weights:
      'conv5_4': 1  # before relu
    vgg_type: vgg19
    use_input_norm: true
    range_norm: false
    perceptual_weight: 1.0
    style_weight: 0
    criterion: l1
  gan_opt:
    type: GANLoss
    gan_type: vanilla
    real_label_val: 1.0
    fake_label_val: 0.0
    loss_weight: !!float 5e-3

  net_d_iters: 1
  net_d_init_iters: 0

# validation settings
val:
  val_freq: !!float 5e3
  save_img: true

  metrics:
    psnr: # metric name, can be arbitrary
      type: calculate_psnr
      crop_border: 4
      test_y_channel: false

# logging settings
logger:
  print_freq: 100
  save_checkpoint_freq: !!float 5e3
  use_tb_logger: true
  wandb:
    project: ~
    resume_id: ~

# dist training settings
dist_params:
  backend: nccl
  port: 29500

3.訓練

以上準備好后就可以訓練了，根據作者BasicSR/TrainTest.md at master · xinntao/BasicSR · GitHub

中的說明，本人只有一張顯卡，使用CUDA_VISIBLE_DEVICES=0 python basicsr/train.py -opt options/train/ESRGAN/train_ESRGAN_x4.yml，默認迭代40萬次，訓練模型及程序記錄是自動創建的在下面檔案中

4.預測

預測腳本有專門的，在inference/inference_esrgan.py，只需要修改對應的路徑就好了，模型，輸入路徑，輸出路徑

5.結果

高質量原圖1024*1024 （原始）

高質量測驗圖原圖（512*512）

低質量測驗圖+高斯模糊 128*128

ESRGAN預測結果（512*512）

低質量測驗圖+高斯模糊 512*512 （單純通過resize變大影像是無法改變影像清晰度的）

對比發現還是有細節上的差異，但是這個資料是很正面，且模擬資料也只有高斯模糊，我嘗試用網上隨便一張圖測驗，發現效果不好，可能是因為圖不是這樣很規整的正面照，本身樣本里也沒有類似的，還需要繼續探索

原圖，測驗圖，結果圖（另外一個類似的資料集）

拿外國人像測驗發現還是比較明顯的，尤其是人眼部分的重建

上面是在網路上隨便找的一張截圖下來測驗的，很明顯，這個重建就很差了，但我不覺得是模型的問題，因為資料本身也只是我人為模擬的，況且在人為模擬的資料集上表現還是很正常的，我覺得搜集一下網路上的圖片，然后用人造資料的模型做基礎應該會表現好點，另外訓練這個模型，引數我沒有修改，不知道是否合適，這也是因素之一，因此，這還需要繼續探索

轉載請註明出處，本文鏈接：https://www.uj5u.com/qita/330449.html

標籤：其他

上一篇：python使用socket高效傳輸視頻資料幀(連續發送圖片)

下一篇：Image Classification ——【AlexNet】