使用GPU，并行計算和編譯優化加速numpy矩陣運算（相關材料整理）

v1(主要針對numpy運算的加速) 2020/12/16

總結：基于GPU加速numpy：cupy 和 minpy

? 基于編譯的優化加速numpy：numba

? 基于并行計算加速numpy：Mars

? 既可以并行又可以用GPU：Mars

文章目錄

使用GPU，并行計算和編譯優化加速numpy矩陣運算（相關材料整理）
- cupy
- minpy和MXnet
- mars
- jit和numba
- RAPIDS

numpy學習網址：

https://numpy.net/

? https://www.numpy.org.cn/

? http://cs231n.stanford.edu/syllabus.html

cupy

cupy支持使用GPU來加速Numpy，

cupy documents：https://docs.cupy.dev/en/stable/

如果已經安裝好cuda，安裝cupy只需要（安裝之前一定要保證pip更新到最新的版本）

$ pip install cupy

也可以使用下面這個方法安裝，

#根據自己安裝的cuda版本是哪一個，然后直接下載安裝適合的版本，實測這個方法比較快


#然后進行安裝命令
# CUDA 8.0
pip install cupy-cuda80
 
# CUDA 9.0
pip install cupy-cuda90
 
# CUDA 9.1
pip install cupy-cuda91
 
# CUDA 9.2
pip install cupy-cuda92
 
# CUDA 10.0
pip install cupy-cuda100
 
# CUDA 10.1
pip install cupy-cuda101

1.安裝的官方檔案：https://docs.cupy.dev/en/stable/install.html#installing-cupy

2.官方的github網址：https://github.com/cupy/cupy

github里面有安裝教程，可以在docker上運行，還有使用cupy加速運算的代碼實體

3.cupy的api介面檔案：https://docs.cupy.dev/en/stable/reference/index.html

4.關于cupy的使用和加速效果的博客：

（英文的簡單介紹博客）https://towardsdatascience.com/heres-how-to-use-cupy-to-make-numpy-700x-faster-4b920dda1f56

https://www.jiqizhixin.com/articles/2019-08-29-8

https://www.jianshu.com/p/b5a6ee8564df

https://blog.csdn.net/ChenVast/article/details/100140494

minpy和MXnet

minpy在效果和用法上，好像和cupy差不多，但從網上可以找到資料的多少來看，minpy沒有cupy那么熱門

minpy的介面與numpy都一樣

所以只需要修改import陳述句，就可以將numpy的計算進行GPU加速，

但是有的介面不支持GPU加速，這個時候，minpy會自動將這個介面的函式在CPU上像numpy一樣運行（可以有效減少bug），這個特性不知道cupy是不是可以，

import minpy.numpy as np

安裝程序：

（MXnet是亞馬遜發布的深度學習庫，這個庫可以支持minpy，所以使用minpy前必須首先安裝好MXNet和cuda）

安裝MXNet（安裝網址在下面，實測安裝的很快，2分鐘）
安裝minpy

1.MXNet的安裝網址：https://mxnet.incubator.apache.org/get_started?

2.minpy官方檔案：https://minpy.readthedocs.io/en/latest/index.html

3.numpy中文網給出的minpy說明檔案：https://www.numpy.org.cn/article/other/minpy-the-numpy-interface-upon-mxnets-backend.html

4.github關于minpy的博客+代碼：https://github.com/dmlc/minpy

? MXNet的github網址https://github.com/apache/incubator-mxnet

5.關于minpy使用的博客：https://www.sohu.com/a/124626121_465975

關于minpy加速效果測驗的博客：https://blog.csdn.net/DarrenXf/article/details/86305215

介紹MXNet深度學習框架的博客：https://www.jiqizhixin.com/graph/technologies/c59f79b4-eb36-48f7-842e-aefd4397799a

https://www.jiqizhixin.com/articles/2016-08-10-2

mars

Mars 是由阿里云高級軟體工程師秦續業等人開發的一個基于張量的大規模資料計算的統一框架，

Mars可以讓 Numpy、pandas 和 scikit-learn 等庫并行和分布式執行，利用多核優勢來縮短程式運行時間，

CPU Time：行程時間也稱CPU時間，用以度量行程使用的中央處理器資源，行程時間以時鐘嘀嗒計算，實際時間（Real），用戶CPU時間（User），系統CPU時間（Sys）

Wall Time：行程運行的時間總量，其值與系統中同時運行的行程數有關從行程從開始運行到結束，時鐘走過的時間，這其中包含了行程在阻塞和等待狀態的時間，

在使用了mars后，會出現Wall Time < CPU Time 的情形，說明mars利用了多核處理器的并行執行優勢，

1.使用mars縮短numpy并行的例子可以參考：mars的官方檔案：https://docs.mars-project.io/zh_CN/latest/

2.mars的安裝：https://docs.mars-project.io/zh_CN/latest/installation/index.html

3.在集群中部署mars：https://docs.mars-project.io/zh_CN/latest/installation/deploy.html#deploy

4.mars不僅可以利用多核來加速程式，也可以使用GPU（單卡，單機多卡，分布式）來加速numpy（mars tensor 依賴于cupy，所以也要先安裝cupy），指定 gpu=True 詳情可以參考：https://docs.mars-project.io/zh_CN/latest/getting_started/gpu.html#gpu

5.mars的github官方網址：https://github.com/mars-project/mars

6.mars的介紹性博客：https://blog.csdn.net/weixin_42137700/article/details/85274241

? https://www.zhihu.com/question/307050812/answer/561528003

mars的網上資源較少，基本所有能找到的技術資訊都主要在官方檔案中，并且官方檔案寫的比較友好，參考官方檔案就可以解決問題，

jit和numba

jit和numba是相輔相成的，

JIT（just-in-time compilation）：當某段代碼要被執行之間，進行一下編譯，因而叫“即時編譯”，

Numba在運行時使用LLVM編譯將Python函式提前編譯一下，轉換為優化的機器碼，從而實作在計算程序中的加速，Numba用Python編譯的數值演算法可以接近C或FORTRAN的速度，不需要替換Python解釋器，只需要from numba inport jit，就可以用了，

根據官方檔案，在實際將代碼向numba遷移的程序中，還需要其他的代碼添加，例如@jit等，

import numpy as np
import numba
from numba import jit

#numba對于for回圈的加速是非常明顯的

1.關于jit的博客和document：https://blog.csdn.net/shenwansangz/article/details/95601232

? https://developer.ibm.com/zh/articles/j-lo-just-in-time/

2.關于numba的介紹性博客： https://www.cnblogs.com/zhuwjwh/p/11401215.html

? https://zhuanlan.zhihu.com/p/68720474（這個介紹了LLVM）

? https://zhuanlan.zhihu.com/p/60994299

關于numba安裝的博客：https://blog.csdn.net/marchphy/article/details/52207878

? https://www.jianshu.com/p/5341ad607b71

3.numba的官方檔案：https://numba.readthedocs.io/en/stable/index.html

4.numba的官網：https://numba.pydata.org/

5.numba的github網址：https://github.com/numba/numba

RAPIDS

感覺RAPIDS更多的是利用GPU對pandas和scikit-learn的加速，從而加速機器學習，

RAPIDS庫中的cuDF相當于用GPU加速Pandas，cuML相當于用GPU加速scikit-learn，

RAPIDS庫對于numpy好像沒有加速的支持，

1.RAPIDS的官網地址：https://rapids.ai/index.html

2.RAPIDS的安裝地址：https://rapids.ai/start.html#get-rapids

3.RAPIDS的document網址：https://docs.rapids.ai/

4.RAPIDS的github網址：https://github.com/rapidsai

5.RAPIDS介紹性博客： https://www.sohu.com/a/283723350_100007018

https://www.zhihu.com/question/304042299

https://www.datalearner.com/blog/1051562381920769（這個博客比較好，還介紹了GPU加速的原理）

轉載請註明出處，本文鏈接：https://www.uj5u.com/qita/236555.html

標籤：其他

上一篇：一起學做扣扣（python） — 00

下一篇：部署OpenStack架構