QXel

The quantum simulator built for HPC.

QXel is QubiStack's high-performance, GPU-accelerated quantum circuit simulator. It pushes a single node past the usual memory wall, runs on the SDKs you already use, and scales from a laptop to a multi-node GPU cluster.

42

Qubits on a single node

3

Simulators (state vector, density matrix, stabilizer)

3

SDKs (Braket, Qiskit, PennyLane)

How it works.

Storage offloading
Use NVMe or HDD instead of expensive DRAM to scale qubit count without scaling memory cost.
GPU kernels
Hand-tuned CUDA gate kernels and cuQuantum integration for maximum throughput per device.
Distributed simulation
MPI-based multi-node execution, in single-process-per-node and single-process-per-device modes.
Gate fusion
Fewer kernel launches, less memory traffic, faster runs.
ghz.py
from braket.devices import LocalSimulator
from braket.circuits import Circuit

# 3-qubit GHZ state
circuit = Circuit().h(0).cnot(0, 1).cnot(1, 2)
circuit.probability()

# GPU kernels + statevector offloaded to two NVMe drives
qxel = LocalSimulator(backend="QXel-sv")
result = qxel.run(
    circuit,
    shots=4096,
    compute_type="cuda",
    offload_type="storage",
    max_fusion=2,
    path=["/dev/nvme0n1", "/dev/nvme1n1"],
).result()

print(result.values)

Tune every run.

Pass keyword arguments to run() to choose kernels, offloading, and gate fusion.

compute_typecpu · cuda · cuquantum

Pick the kernel set per run.

offload_typenone · cpu · storage

Offload the statevector to DRAM or secondary storage.

max_fusionint (default 1)

Max gate width for fusion; fewer kernel launches.

pathList<string>

Storage devices for offloading, striped RAID0-style.

Result types

AmplitudeExpectationVarianceProbabilityStateVectorDensityMatrixSample

Ready to run on QXel?

Read the docs to install QXel and run your first circuit, or explore the guides and references.