iTranslated by AI
Playing with Qiskit (21) — Drawing with QAOA
Objective
There is a TYTAN tutorial recommendation 5 (Nonogram) for TYTANSDK. Since it should theoretically be solvable with QAOA, I want to try it out.
I have experimented with cuTensorNet before, but it took a long time and I was left wondering when it would ever be solved, so this time I've decided to use cuStateVec.
The following content was calculated using a T4 on Google Colab.
Problem to Solve
Import the necessary modules.
from __future__ import annotations
import time
from collections.abc import Callable, Sequence
import matplotlib.pyplot as plt
from tytan import symbols_list, Compile, sampler, Auto_array
import numpy as np
from scipy.optimize import minimize
from qiskit import QuantumCircuit, transpile
from qiskit.circuit.library import QAOAAnsatz
from qiskit.quantum_info import SparsePauliOp
from qiskit_aer import AerSimulator
from qiskit_aer.primitives import Estimator as AerEstimator
QUBO Formulation using TYTAN
from tytan import *
import numpy as np
n_qubits = 25
q00, q01, q02, q03, q04, \
q05, q06, q07, q08, q09, \
q10, q11, q12, q13, q14, \
q15, q16, q17, q18, q19, \
q20, q21, q22, q23, q24 = symbols_list(n_qubits, "q{}")
H = 0
H += (q00 + q01 + q02 + q03 + q04 - 2)**2
H += (q05 + q06 + q07 + q08 + q09 - 3)**2
H += (q10 + q11 + q12 + q13 + q14 - 3)**2
H += (q15 + q16 + q17 + q18 + q19 - 3)**2
H += (q20 + q21 + q22 + q23 + q24 - 2)**2
H += (q00 + q05 + q10 + q15 + q20 - 3)**2
H += (q01 + q06 + q11 + q16 + q21 - 2)**2
H += (q02 + q07 + q12 + q17 + q22 - 5)**2
H += (q03 + q08 + q13 + q18 + q23 - 2)**2
H += (q04 + q09 + q14 + q19 + q24 - 1)**2
H += -0.1 * (q00 * q01) -0.1 * (q01 * q02) -0.1 * (q02 * q03) -0.1 * (q03 * q04)
H += -0.1 * (q05 * q06) -0.1 * (q06 * q07) -0.1 * (q07 * q08) -0.1 * (q08 * q09)
H += -0.1 * (q10 * q11) -0.1 * (q11 * q12) -0.1 * (q12 * q13) -0.1 * (q13 * q14)
H += -0.1 * (q15 * q16) -0.1 * (q16 * q17) -0.1 * (q17 * q18) -0.1 * (q18 * q19)
H += -0.1 * (q00 * q05) -0.1 * (q05 * q10) -0.1 * (q10 * q15) -0.1 * (q15 * q20)
H += -0.1 * (q01 * q06) -0.1 * (q06 * q11) -0.1 * (q11 * q16) -0.1 * (q16 * q21)
H += -0.1 * (q02 * q07) -0.1 * (q07 * q12) -0.1 * (q12 * q17) -0.1 * (q17 * q22)
H += -0.1 * (q03 * q08) -0.1 * (q08 * q13) -0.1 * (q13 * q18) -0.1 * (q18 * q23)
H += 0.1 * (q20 * q21) + 0.1 * (q21 * q22) + 0.1 * (q22 * q23) + 0.1 * (q23 * q24)
Solving with Pseudo-Quantum Annealing
First, verify the answer using Simulated Annealing with TYTANSDK.
qubo, offset = Compile(H).get_qubo()
print(f"{offset=}")
solver = sampler.SASampler()
result = solver.run(qubo)
def rename(qname):
symbol = qname[0]
params = int(qname[1:])
return symbol + f"{params // 5}_{params % 5}"
for r in result[:5]:
print(f'Energy {round(r[1] + offset, 2)}, Occurrence {r[2]}')
renamed_dict = {
rename(k): v for k, v in r[0].items()
}
arr, subs = Auto_array(renamed_dict).get_ndarray('q{}_{}')
print(arr)
img, subs = Auto_array(renamed_dict).get_image('q{}_{}')
plt.figure(figsize=(2, 2))
plt.imshow(img, cmap="gray")
plt.show()
offset=78
Energy -1.50, Occurrence 10
[[0 0 1 1 0]
[0 0 1 1 1]
[1 1 1 0 0]
[1 1 1 0 0]
[1 0 1 0 0]]

Thus, the goal for this time is to obtain an image that looks like a dog (?).
Road to QAOA
The following is a utility for converting QUBO into Z and ZZ Hamiltonians.
def _calc_key(num_qubits: int, k: tuple[str] | tuple[str, str]) -> int:
if len(k) == 1:
left = k[0]
ln = int(left[1:])
return num_qubits * ln - 1
elif len(k) == 2:
left, right = k
ln = int(left[1:])
rn = int(right[1:])
return num_qubits * num_qubits * ln + num_qubits * rn
else:
raise ValueError(f"len(k) = {len(k)} must be one or two.")
def get_ising(
qubo: dict[tuple[str, str], float], num_qubits: int
) -> tuple[dict[tuple[str] | tuple[str, str], float], float]:
ising_dict: dict[tuple[str] | tuple[str, str], float] = {}
offset = 0.0
for k, v in qubo.items():
left, right = k
ln = int(left[1:])
rn = int(right[1:])
new_k: tuple[str] | tuple[str, str]
if rn < ln:
ln, rn = rn, ln
if ln == rn:
new_k = (f"z{ln}",)
ising_dict.setdefault(new_k, 0.0)
ising_dict[new_k] += -v / 2
offset += v / 2
else:
new_k = (f"z{ln}", f"z{rn}")
ising_dict.setdefault(new_k, 0.0)
ising_dict[new_k] += v / 4
new_k = (f"z{ln}",)
ising_dict.setdefault(new_k, 0.0)
ising_dict[new_k] += -v / 4
new_k = (f"z{rn}",)
ising_dict.setdefault(new_k, 0.0)
ising_dict[new_k] += -v / 4
offset += v / 4
ising_dict = {k: v for k, v in ising_dict.items() if not np.isclose(v, 0)}
ising_dict = dict(
sorted(ising_dict.items(), key=lambda k_v: _calc_key(num_qubits, k_v[0]))
)
return ising_dict, offset
def get_hamiltonian(
ising_dict: dict[tuple[str] | tuple[str, str], float],
num_qubits: int
) -> tuple[list[str], np.ndarray]:
hamiltonian: list[str] = []
for k in ising_dict.keys():
ham = ["I"] * num_qubits
if len(k) == 1:
left = k[0]
ln = int(left[1:])
ham[ln] = "Z"
elif len(k) == 2:
left, right = k
ln = int(left[1:])
rn = int(right[1:])
ham[ln] = ham[rn] = "Z"
else:
raise ValueError(f"len(k) = {len(k)} must be one or two.")
hamiltonian.append("".join(ham))
return hamiltonian, np.array(list(ising_dict.values()))
Use the above to obtain the Hamiltonian qubit_op.
ising_dict, additional_offset = get_ising(qubo, n_qubits)
hamiltonian, coefficients = get_hamiltonian(ising_dict, n_qubits)
# Normalize the coefficients to avoid cycling through 2π multiple times.
coefficients /= np.max(abs(coefficients))
qubit_op = SparsePauliOp(
[ham[::-1] for ham in hamiltonian],
coefficients,
)
Running QAOA
It is tedious to implement the ansatz myself, so I will create it using Qiskit. While using the Sampler Primitive would follow the content of Playing with Qiskit (20) — QAOA in Qiskit Optimization, this time I will use cuStateVec to calculate the expectation value from the state vector using the Estimator Primitive. In particular, since I am using a GPU, I will use the Primitive from the Qiskit Aer side.
Defining the Ansatz
I felt that I might not get the desired solution if n_reps was small, so I boldly set it to 10, though it might be possible to reduce it further.
%%time
initial_state_circuit = QuantumCircuit(n_qubits)
initial_state_circuit.h(initial_state_circuit.qregs[0][:])
n_reps = 10
ansatz = QAOAAnsatz(
cost_operator=qubit_op,
reps=n_reps,
initial_state=initial_state_circuit,
name='QAOA',
flatten=True,
)
CPU times: user 1.51 ms, sys: 0 ns, total: 1.51 ms
Wall time: 1.52 ms
What is the circuit depth?
ansatz.depth()
1491
It's quite deep... It would probably be impossible to calculate on a NISQ device even with error mitigation (it would effectively become noise), but I won't worry about that this time.
Defining the Cost Function
losses = []
count = 0
# Use the Estimator Primitive from Qiskit Aer.
# Since it creates an AerSimulator internally, set the GPU options to be passed to it.
estimator = AerEstimator(
backend_options = {
"device": "GPU",
"cuStateVec_enable": True,
}
)
def compute_expectation(params, *args):
global count
(estimator, ansatz, qubit_op) = args
time_start = time.time()
energy = estimator.run([ansatz], [qubit_op], params).result().values[0]
if count % 10 == 0:
time_end = time.time()
print(f"[{count}] {energy} (elapsed={round(time_end - time_start, 3)}s)")
count += 1
losses.append(energy)
return energy
Running the Optimization
Now that we are ready, let's perform the calculation. This time, I will use the Powell method for optimization. I don't have a deep understanding of it, but when I tried optimizing with the COBYLA method, it gave up surprisingly quickly for some reason, so I just went with Powell instead[1].
%%time
rng = np.random.default_rng(42)
# Set initial values as random values within a small range.
init = rng.random(ansatz.num_parameters) * np.pi
result = minimize(
compute_expectation,
init,
args=(estimator, ansatz, qubit_op),
method="Powell",
options={
"maxiter": 1000
},
)
print(result.message)
print(f"opt value={round(result.fun, 3)}")
...
[1660] -6.324880292338709 (elapsed=3.565s)
[1670] -2.5975932459677407 (elapsed=3.196s)
[1680] -6.224010836693549 (elapsed=3.291s)
[1690] -5.934916834677419 (elapsed=3.224s)
[1700] -6.297379032258063 (elapsed=3.184s)
[1710] -6.196005544354837 (elapsed=3.288s)
Optimization terminated successfully.
opt value=-6.272
CPU times: user 1h 30min 6s, sys: 4min 34s, total: 1h 34min 40s
Wall time: 1h 34min 55s
It seems like compute_expectation is called multiple times per iteration, so it didn't finish within maxiter calls. From a quick search, it appears that this is normal for certain optimization methods. I've decided not to worry about it this time.
Checking the Optimal State
I will also use cuStateVec for the simulator. With 25 qubits, we're approaching the limits of the T4's 16GB VRAM, but there is still some room to spare.
%%time
opt_ansatz = ansatz.bind_parameters(result.x)
opt_ansatz.measure_all()
sim = AerSimulator(device="GPU", method="statevector", cuStateVec_enable=True)
t_qc = transpile(opt_ansatz, backend=sim)
counts = sim.run(t_qc, shots=1024*50).result().get_counts()
CPU times: user 6.89 s, sys: 193 ms, total: 7.09 s
Wall time: 7.14 s
I was truly amazed by the speed of cuStateVec this time. It really excels at repeatedly performing parallel calculations on huge matrices...
Visualization
topk = 5
for i, (k, n) in enumerate(sorted(counts.items(), key=lambda k_v: -k_v[1])):
if n < 2:
continue
state = k[::-1] # reverse order
result_dict = {f"q{i // 5}_{i % 5}": int(v) for i, v in enumerate(state)}
img, subs = Auto_array(result_dict).get_image('q{}_{}')
plt.figure(figsize=(2, 2))
plt.imshow(img, cmap="gray")
i += 1
if i >= topk:
break





As a result, I was able to confirm that the desired image appears at the top.
Cost Transition
Let's also take a look at the movement of the cost function just in case.
plt.figure()
x = np.arange(0, len(losses), 1)
plt.plot(x, losses, color="blue")
plt.show()

I don't know the details of the Powell method, but it seems to call compute_expectation several extra times, which results in a strange-looking plot. However, if you follow only the lower bound, you can see the cost gradually decreasing... so I'm satisfied with that.
Incidentally, when optimizing with the COBYLA method, it gave up at a point like the following, and the optimization just wouldn't proceed any further.

Summary
cuStateVec is truly the strongest. In any case, for up to about 30 qubits, I think cuStateVec should definitely be your first choice without hesitation. When you get to the 50-qubit class, cuTensorNet will likely come into play, but frankly, you might not be able to expect much in terms of calculation speed. I wonder if it would be faster if distributed computing were performed on multi-node GPUs?
Anyway, I'm glad I was able to get the dog (?) image.
-
I couldn't decipher the SciPy implementation, so I'm just trying things out... ↩︎
Discussion