🔖

Multiple CUDA enviroment & pytorch & gcc

2023/07/25に公開

Multiple CUDA enviroment & pytorch & gcc

Problem:
To ensure performance, we want to compile pytorch and tensorflow from source(modified something).

_

For pytorch, stable version now is 2.0.1, only pre-build for CUDA 11.7/8. If you use the latest version 12.1, the only choice is nightly.

I use archlinux, btw. So, I want to build multiple CUDA enviroment.

1. download (cuda, cudnn).sh from nvidia (or extract and rsync -Prva)
2. install to ~/local/cuda-(version)
3. link ~/local/cuda~(version) to ~/local/cuda
4. add path, CUDA_HOME to ~/.zshrc
#cuda
export PATH="$HOME/local/cuda/bin${PATH:+:}${PATH}"
export LD_LIBRARY_PATH="$HOME/local/cuda/lib64${LD_LIBRARY_PATH:+:}${LD_LIBRARY_PATH}"
export CPATH="$HOME/local/cuda/include${CPATH:+:}${CPATH}"
#export CUDA_HOME="$HOME/local/cuda"
nvcc -V
System cuda:
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Tue_Jun_13_19:16:58_PDT_2023
Cuda compilation tools, release 12.2, V12.2.91
Build cuda_12.2.r12.2/compiler.32965470_0

user local cuda:
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Sep_21_10:33:58_PDT_2022
Cuda compilation tools, release 11.8, V11.8.89
Build cuda_11.8.r11.8/compiler.31833905_0

Therefore, we build a multiple CUDA enviroment.

BUT!

https://gcc.gnu.org/develop.html#timeline

1. not all building script read variable CUDA_HOME.
Some just read path /usr/local/cuda
(Small problem)

2. CUDA 11.8 only support to GCC 11.
OMG, GCC 11 is no maintenanced from  "GCC 11.4 release (2023-05-29)".
All stable pytorch stuff is using an unmaintenanced gcc!
(If you use archlinux, using `yay` to install gcc 11.4)
(You can't compile pytorch with cuda 11.8 using gcc 13.1)

Conclusion,

I preferred using the latest version of python, cuda, pytorch. But some old project may not work under latest settings.

Also prepared a version that lts Ubuntu used stable pytorch(1.x / 2.0) + cuda 11.8 + python 3.10

Discussion