QXel
The quantum simulator built for HPC.
QXel is QubiStack's high-performance, GPU-accelerated quantum circuit simulator. It pushes a single node past the usual memory wall, runs on the SDKs you already use, and scales from a laptop to a multi-node GPU cluster.
42
Qubits on a single node
3
Simulators (state vector, density matrix, stabilizer)
3
SDKs (Braket, Qiskit, PennyLane)
What QXel gives you.
How it works.
- Storage offloading
- Use NVMe or HDD instead of expensive DRAM to scale qubit count without scaling memory cost.
- GPU kernels
- Hand-tuned CUDA gate kernels and cuQuantum integration for maximum throughput per device.
- Distributed simulation
- MPI-based multi-node execution, in single-process-per-node and single-process-per-device modes.
- Gate fusion
- Fewer kernel launches, less memory traffic, faster runs.
ghz.py
from braket.devices import LocalSimulator
from braket.circuits import Circuit
# 3-qubit GHZ state
circuit = Circuit().h(0).cnot(0, 1).cnot(1, 2)
circuit.probability()
# GPU kernels + statevector offloaded to two NVMe drives
qxel = LocalSimulator(backend="QXel-sv")
result = qxel.run(
circuit,
shots=4096,
compute_type="cuda",
offload_type="storage",
max_fusion=2,
path=["/dev/nvme0n1", "/dev/nvme1n1"],
).result()
print(result.values)Tune every run.
Pass keyword arguments to run() to choose kernels, offloading, and gate fusion.
compute_typecpu · cuda · cuquantumPick the kernel set per run.
offload_typenone · cpu · storageOffload the statevector to DRAM or secondary storage.
max_fusionint (default 1)Max gate width for fusion; fewer kernel launches.
pathList<string>Storage devices for offloading, striped RAID0-style.
Result types
AmplitudeExpectationVarianceProbabilityStateVectorDensityMatrixSample
Ready to run on QXel?
Read the docs to install QXel and run your first circuit, or explore the guides and references.