Reference

run() options

QXel adds a few keyword arguments to the standard Braket run() method. They are all optional; the defaults give a correct CPU run. Tune them once a circuit is too slow or too large to fit in memory.

compute_type

Selects which gate kernels run. Start with the default 'cpu' to get a result, then switch to 'cuda' for GPU acceleration once you confirm the circuit is correct.

compute_type: 'cpu' (default) | 'cuda' | 'cuquantum' • 'cpu' portable CPU kernels; no GPU required • 'cuda' hand-tuned CUDA kernels; needs an NVIDIA GPU • 'cuquantum' NVIDIA cuQuantum library; needs an NVIDIA GPU

offload_type and path

Controls where the statevector lives. Use it only when the statevector no longer fits in GPU/CPU memory. 'storage' requires path, a list of one or more devices that are striped together for bandwidth.

offload_type: 'none' (default) | 'cpu' | 'storage' • 'none' keep the statevector in GPU/CPU memory • 'cpu' offload to CPU DRAM • 'storage' offload to NVMe/HDD; requires path=[...] path: List[str] storage devices for 'storage' offloading

max_fusion

Sets the maximum number of qubits a fused gate may act on. Fusing adjacent gates means fewer, larger kernel launches and less memory traffic. The default of 1 is safe; raise it (2 to 4 is typical) on deep circuits to trade extra setup time for faster execution.

max_fusion: int (default 1) larger = more aggressive gate fusion

A fully tuned run

CUDA kernels, the statevector striped across two NVMe drives, and 2-qubit gate fusion:

python
result = qxel.run(
    circuit,
    shots=4096,
    compute_type="cuda",
    offload_type="storage",
    max_fusion=2,
    path=["/dev/nvme0n1", "/dev/nvme1n1"],
).result()