🙆

WSL2 で cuQuantum (1)

2022/10/16に公開1件

WSL 2

NVIDIA

cuquantum

tech

WSL2 で cuQuantum (インストール編)

前提条件 / 対象読者

Nvidia GPUを使用していること。
WSL2自体のインストールは済んでいるものとします。
自分のローカル環境(WSL2)で cuQuantum やりたい方。

Linux x86 CUDA Toolkit のインストール

この記事では CUDA 11.8 を想定しています。

はじめに古い GPG key を削除。

> sudo apt-key del 7fa2af80

次に、CUDA Toolkit のインストールを実行。

> wget https://developer.download.nvidia.com/compute/cuda/repos/wsl-ubuntu/x86_64/cuda-wsl-ubuntu.pin
> sudo mv cuda-wsl-ubuntu.pin /etc/apt/preferences.d/cuda-repository-pin-600
> wget https://developer.download.nvidia.com/compute/cuda/11.8.0/local_installers/cuda-repo-wsl-ubuntu-11-8-local_11.8.0-1_amd64.deb
> sudo dpkg -i cuda-repo-wsl-ubuntu-11-8-local_11.8.0-1_amd64.deb
> sudo cp /var/cuda-repo-wsl-ubuntu-11-8-local/cuda-*-keyring.gpg /usr/share/keyrings/
> sudo apt-get update
> sudo apt-get -y install cuda

インストールができるとCUDAの情報が確認できます。

> nvidia-smi
Sun Oct 16 17:56:31 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 520.61.03    Driver Version: 522.06       CUDA Version: 11.8     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  On   | 00000000:09:00.0  On |                  N/A |
| 59%   47C    P8    43W / 370W |   2861MiB / 24576MiB |     27%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
(後略)

cuQuantum のインストール

シンボリックリンクをはらないといいけないものがあるのではります。

> wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/cuda-keyring_1.0-1_all.deb
> sudo dpkg -i cuda-keyring_1.0-1_all.deb
> sudo ln -s /usr/lib/wsl/lib/libcuda.so.1 /usr/local/cuda/lib64/libcuda.so
> sudo apt-get update
> sudo apt-get -y install cuquantum cuquantum-dev cuquantum-doc

つづいて、.bashrcにパス等の設定を追記して再読み込みする。

export PATH="/usr/local/cuda/bin:$PATH"
export LD_LIBRARY_PATH="/usr/local/cuda/lib64:$LD_LIBRARY_PATH"

export CUQUANTUM_ROOT="/opt/nvidia/cuquantum"
export LD_LIBRARY_PATH=${CUQUANTUM_ROOT}/lib:${LD_LIBRARY_PATH}

cuQuantum pythonのインストール

この記事では CUDA 11.8 を想定しています。

> pip install cupy-cuda11x
> pip install cuquantum-python

実行テスト

公式リポジトリからソースをひっぱってきてテストを実行します。
なお、実行に伴い必要パッケージもインストールします。

> mkdir ~/tmp
> mv ~/tmp
> git clone https://github.com/NVIDIA/cuQuantum
> cd cuQuantum/python
> cd tests
> pip install -r requirements.txt
> cd ../
> pytest tests

Qiskitへの対応 (番外編)

WSL2、無理やりqiskit-aerをcuQuantum対応させる。

OpenBLASをまずインストール。

sudo apt-get update
sudo apt-get install -y libopenblas-dev

次に、qiskit-aerをソースからビルドする。

git clone https://github.com/Qiskit/qiskit-aer/
cd qiskit-aer
python ./setup.py bdist_wheel -- -DAER_THRUST_BACKEND=CUDA -DCUSTATEVEC_ROOT=$CUQUANTUM_ROOT

pipを用いてインストールする。whlのファイル名はそれぞれの環境で異なります。

pip install dist/qiskit_aer-0.12.0-cp**-cp**-linux_x86_64.whl

サンプルコードを走らせて比較

下記のサンプルコード利用して比較しました。

from qiskit import *
from qiskit.circuit.library import *
from qiskit.providers.aer import *

# False/Trueで比較
sim = AerSimulator(method='statevector', device='GPU', cuStateVec_enable=False)

qubits = 29
depth=10
shots = 10

circuit = QuantumVolume(qubits, depth, seed=0)
circuit.measure_all()
circuit = transpile(circuit, sim)
result = sim.run(circuit,sim,shots=shots,seed_simulator=12345).result()

metadata = result.to_dict()['results'][0]['metadata']
if 'cuStateVec_enable' in metadata and metadata['cuStateVec_enable']:
    print("cuStateVector is used for the simulation")
else:
    print("cuStateVector is not used for the simulation")
print("{0} qubits, Time = {1} sec".format(qubits,result.to_dict()['results'][0]['time_taken']))
counts = result.get_counts()
print(counts)

device='CPU' の場合

~/projects/qiskit_cuQuantum$ python hoge.py 
cuStateVector is used for the simulation
29 qubits, Time = 48.710545902 sec
{'10010011010001101001110010000': 1, '01111000011010001010101000100': 1, '10101111101110110110011100101': 1, '00000111001010101011101000111': 1, '00110011110101110000101010010': 1, '01100110100110100101001011110': 1, '10101111100111001111111111111': 1, '01011011010101110011000100100': 1, '10001111101001000110110011000': 1, '00110011111110001111001001100': 1}

device='GPU', cuStateVec_enable=False の場合

~/projects/qiskit_cuQuantum$ python hoge.py 
cuStateVector is not used for the simulation
29 qubits, Time = 7.498817975 sec
{'10010011010001101001110010000': 1, '01111000011010001010101000100': 1, '10101111101110110110011100101': 1, '00000111001010101011101000111': 1, '00110011110101110000101010010': 1, '01100110100110100101001011110': 1, '10101111100111001111111111111': 1, '01011011010101110011000100100': 1, '10001111101001000110110011000': 1, '00110011111110001111001001100': 1}

device='GPU', cuStateVec_enable=True の場合

~/projects/qiskit_cuQuantum$ python hoge.py 
cuStateVector is used for the simulation
29 qubits, Time = 5.154102078 sec
{'10010011010001101001110010000': 1, '01111000011010001010101000100': 1, '10101111101110110110011100101': 1, '00000111001010101011101000111': 1, '00110011110101110000101010010': 1, '01100110100110100101001011110': 1, '10101111100111001111111111111': 1, '01011011010101110011000100100': 1, '10001111101001000110110011000': 1, '00110011111110001111001001100': 1}

所感

CPUと比べて大幅に高速化されています
qiskit-aer-gpuと比較してもそこそこ高速化されている模様
量子ビット数が少ない場合は False のほうが早かった
無理やりWSL2にインストールしたので内部的な動きについて要確認

Discussion

derwind

大変参考になる記事をありがとうございます。ところで、

result = sim.run(circuit,sim,shots=shots,seed_simulator=12345).result()

の run の 2 つ目の引数 sim は不要そうです。aerbackend.py#L129-L133 によると、ここは

validate (bool): validate the Qobj before running (default: False).

が入ることになっています。