大家好,我想知道什么
sudo nvidia-smi --gpu-reset -i 0
這個命令是不是只是釋放了 GPU 的記憶體而什么都不做?
uj5u.com熱心網友回復:
重置 GPU 狀態。可用于清除雙位 ECC 錯誤或恢復掛起的 GPU。需要 -i 切換到目標特定設備。僅在 Linux 上可用。 https://developer.download.nvidia.com/compute/DCGM/docs/nvidia-smi-367.38.pdf
uj5u.com熱心網友回復:
從nvidia-smi幫助選單 ( man nvidia-smi):
-r, --gpu-reset
Trigger a reset of one or more GPUs. Can be used to clear GPU HW and SW state in situations that would otherwise require a machine reboot. Typically useful if a double bit ECC
error has occurred. Optional -i switch can be used to target one or more specific devices. Without this option, all GPUs are reset. Requires root. There can't be any appli‐
cations using these devices (e.g. CUDA application, graphics application like X server, monitoring application like other instance of nvidia-smi). There also can't be any com‐
pute applications running on any other GPU in the system.
Starting with the NVIDIA Ampere architecture, GPUs with NVLink connections can be individually reset. On NVSwitch systems, Fabric Manager is required to facilitate reset.
If Fabric Manager is not running, or if any of the GPUs being reset are based on an architecture preceding the NVIDIA Ampere architecture, any GPUs with NVLink connections to a
GPU being reset must also be reset in the same command. This can be done either by omitting the -i switch, or using the -i switch to specify the GPUs to be reset. If the -i
option does not specify a complete set of NVLink GPUs to reset, this command will issue an error identifying the additional GPUs that must be included in the reset command.
GPU reset is not guaranteed to work in all cases. It is not recommended for production environments at this time. In some situations there may be HW components on the board
that fail to revert back to an initial state following the reset request. This is more likely to be seen on Fermi-generation products vs. Kepler, and more likely to be seen if
the reset is being performed on a hung GPU.
Following a reset, it is recommended that the health of each reset GPU be verified before further use. If any GPU is not healthy a complete reset should be instigated by power
cycling the node.
GPU reset operation will not be supported on MIG enabled vGPU guests.
Visit http://developer.nvidia.com/gpu-deployment-kit to download the GDK.
轉載請註明出處,本文鏈接:https://www.uj5u.com/shujuku/358161.html
上一篇:降級Nodejs和Npm
