QXel

The quantum simulator built for HPC.

QXel is QubiStack's high-performance, GPU-accelerated quantum circuit simulator. It pushes a single node past the usual memory wall, runs on the SDKs you already use, and scales from a laptop to a multi-node GPU cluster.

Browse the guides

Qubits on a single node

Simulators (state vector, density matrix, stabilizer)

SDKs (Braket, Qiskit, PennyLane)

What QXel gives you.

42-qubit single node

Conventional simulators max out at 30 to 35 qubits per node. Secondary-storage offloading pushes QXel to 42, at a fraction of the memory cost.

Multi-framework

Bring your existing code. qiskit, pennylane, and the AWS Braket SDK all run on QXel through a single Local backend.

GPU & distributed

CUDA and cuQuantum kernels for GPU acceleration. MPI-based distributed simulation across multiple nodes for the largest circuits.

Zero-setup cloud

Submit a circuit to a managed GPU worker and get the result back. Job history and results are all handled for you.

How it works.

Storage offloading: Use NVMe or HDD instead of expensive DRAM to scale qubit count without scaling memory cost.
GPU kernels: Hand-tuned CUDA gate kernels and cuQuantum integration for maximum throughput per device.
Distributed simulation: MPI-based multi-node execution, in single-process-per-node and single-process-per-device modes.
Gate fusion: Fewer kernel launches, less memory traffic, faster runs.

ghz.py

from braket.devices import LocalSimulator
from braket.circuits import Circuit

# 3-qubit GHZ state
circuit = Circuit().h(0).cnot(0, 1).cnot(1, 2)
circuit.probability()

# GPU kernels + statevector offloaded to two NVMe drives
qxel = LocalSimulator(backend="QXel-sv")
result = qxel.run(
    circuit,
    shots=4096,
    compute_type="cuda",
    offload_type="storage",
    max_fusion=2,
    path=["/dev/nvme0n1", "/dev/nvme1n1"],
).result()

print(result.values)

Tune every run.

Pass keyword arguments to run() to choose kernels, offloading, and gate fusion.

compute_typecpu · cuda · cuquantum

Pick the kernel set per run.

offload_typenone · cpu · storage

Offload the statevector to DRAM or secondary storage.

max_fusionint (default 1)

Max gate width for fusion; fewer kernel launches.

pathList<string>

Storage devices for offloading, striped RAID0-style.

Result types

AmplitudeExpectationVarianceProbabilityStateVectorDensityMatrixSample

Ready to run on QXel?

Read the docs to install QXel and run your first circuit, or explore the guides and references.

QXel guides QXel SaaS guides