Performance

GPU acceleration

QXel runs gate operations on CPU kernels by default. For anything beyond about 20 qubits the GPU is dramatically faster, because each extra qubit doubles the statevector and GPUs apply gates to that vector in massively parallel fashion. You opt in with the compute_type option.

Choosing a kernel set

compute_type selects which kernels run gate operations: • 'cpu': CPU kernels (default), works anywhere • 'cuda': hand-tuned CUDA GPU kernels, usually fastest • 'cuquantum': NVIDIA cuQuantum library

Running on the GPU

Pass compute_type as a keyword argument to run(). Everything else about your circuit stays the same:

python

qxel = LocalSimulator(backend="QXel-sv")
result = qxel.run(circuit, shots=1000, compute_type="cuda").result()
print(result.measurement_counts)

The numerical result is identical to a CPU run; only the runtime changes. If you are not sure whether the GPU is being used, run nvidia-smi in another terminal during a large circuit and watch GPU utilization rise.

Note 'cuda' and 'cuquantum' need an NVIDIA GPU with a compatible driver. On a CPU-only machine they raise an error, so keep the default 'cpu' there. Try 'cuda' first; 'cuquantum' can be faster for some gate patterns, so benchmark both on your workload.

← PennyLane Storage offloading →