Tutorial 03: Expressibility vs Trainability¶

Measuring the tradeoff that governs ansatz design

What you will learn

Why expressibility and trainability are in direct tension
How to use Active Mode to measure ansatz expressibility with active_probe_qiskit + kl_expressibility
How to quantify both properties on the same circuit and read the tradeoff from a single comparison table
What the results imply about choosing circuit depth for your problem

Prerequisites

pip install hilbertbench

The core tension in ansatz design¶

When choosing a parameterized ansatz for VQE or QAOA, two properties compete:

Expressibility — how uniformly the circuit covers Hilbert space. A highly expressive ansatz can, in principle, represent the target state.
Trainability — how much gradient signal reaches the optimizer. A trainable ansatz has a cost landscape with measurable curvature.

The Holmes et al. result

Holmes et al. (2022) proved a direct connection between these two properties: circuits that are highly expressive — those whose output distribution closely approximates the Haar measure — have exponentially small gradient variance (barren plateaus).

\[\text{Var}\!\left[\frac{\partial C}{\partial \theta_k}\right] \propto \text{Expressibility measure}\]

A highly expressive ansatz is one that explores all of Hilbert space. But a cost function defined on all of Hilbert space is, on average, flat — which is exactly what a barren plateau looks like.

The practical implication: you cannot have both maximum expressibility and maximum trainability. The ansatz must be expressive enough to represent the target state, but no more expressive than that.

This tutorial makes that tradeoff visible with real numbers.

The experiment¶

We measure both properties on three ansatze of increasing depth, all on 4 qubits:

Ansatz	Layers	Parameters	Prediction
Shallow	1 layer	8 params	Trainable, low expressibility
Medium	2 layers	16 params	Balanced
Deep	4 layers	32 params	Highly expressive, barren plateau

Step 1 — Build the ansatze¶

import numpy as np
from qiskit.circuit import QuantumCircuit, ParameterVector
from qiskit.quantum_info import SparsePauliOp

def build_hea(n_qubits: int, n_layers: int) -> QuantumCircuit:
    """Hardware-efficient ansatz: alternating RY/RZ + CNOT layers."""
    n_params = n_qubits * 2 * n_layers
    params = ParameterVector("θ", n_params)
    qc = QuantumCircuit(n_qubits)
    p = 0
    for _ in range(n_layers):
        for q in range(n_qubits):
            qc.ry(params[p], q)
            qc.rz(params[p + 1], q)
            p += 2
        for q in range(n_qubits - 1):
            qc.cx(q, q + 1)
    return qc

ansatze = {
    "shallow": build_hea(4, n_layers=1),
    "medium":  build_hea(4, n_layers=2),
    "deep":    build_hea(4, n_layers=4),
}

# 4-qubit Hamiltonian
hamiltonian = SparsePauliOp.from_list([
    ("ZZII", 1.0), ("IZZI", 1.0), ("IIZZ", 1.0),
])

Step 2 — Measure expressibility with Active Mode¶

Active Mode draws num_samples random parameter vectors uniformly from [0, 2π], evaluates the statevector for each, and stores the results. kl_expressibility computes the KL divergence of the fidelity distribution against the Haar reference.

from hilbertbench.active import active_probe_qiskit
from hilbertbench.analysis import kl_expressibility

expr_results = {}
for name, circuit in ansatze.items():
    run_dir = active_probe_qiskit(
        circuit,
        num_samples=1000,
        output_root=f"runs/expressibility/{name}",
        seed=42,
        tags={"ansatz": name},
    )
    expr_results[name] = kl_expressibility(run_dir, seed=42)
    print(f"{name:8s}  KL = {expr_results[name]['kl_divergence']:.3f}"
          f"  → {expr_results[name]['status']}")

Output:

shallow   KL = 0.413  → Moderately Expressive
medium    KL = 0.121  → Moderately Expressive
deep      KL = 0.047  → Highly Expressive

The deep ansatz is nearly Haar-random — it covers Hilbert space almost uniformly. The shallow circuit is more constrained.

Step 3 — Measure trainability with passive recording¶

from scipy.optimize import minimize
from hilbertbench import HilbertTrace
from hilbertbench.integrations.qiskit import HilbertEstimatorProxy
from hilbertbench.recorder.tape import HilbertTape
from hilbertbench.recorder.storage.writer import convert_trace_to_parquet
from hilbertbench.analysis import detect_barren_plateau, circuit_structure

train_results = {}
for name, circuit in ansatze.items():
    rng = np.random.default_rng(seed=42)
    x0  = rng.uniform(0.0, 2 * np.pi, circuit.num_parameters)

    with HilbertTape(f"runs/trainability/{name}",
                     tags={"ansatz": name}) as tape:
        estimator = HilbertEstimatorProxy(tape)

        def cost(x):
            job = estimator.run([(circuit, hamiltonian, x.reshape(1, -1))])
            return float(job.result()[0].data.evs.ravel()[0])

        minimize(cost, x0, method="COBYLA",
                 options={"maxiter": 50, "rhobeg": 0.5})

    convert_trace_to_parquet(tape.dir_path)
    trace = HilbertTrace(tape.dir_path)
    train_results[name] = {
        "barren":    detect_barren_plateau(trace),
        "structure": circuit_structure(trace)["primary"],
    }

Step 4 — Read the joint result¶

print(f"\n{'Ansatz':8s}  {'Depth':>6}  {'Params':>6}  "
      f"{'KL Div':>8}  {'Expr. Status':22}  "
      f"{'Variance':>10}  {'Trainability'}")
print("─" * 90)

for name in ("shallow", "medium", "deep"):
    e  = expr_results[name]
    bp = train_results[name]["barren"]
    cs = train_results[name]["structure"]
    print(
        f"{name:8s}  {cs['depth']:6d}  {cs['num_parameters']:6d}  "
        f"{e['kl_divergence']:8.3f}  {e['status']:22s}  "
        f"{bp['variance']:10.4f}  {bp['status']}"
    )

Output:

Ansatz    Depth  Params    KL Div  Expr. Status            Variance  Trainability
──────────────────────────────────────────────────────────────────────────────────────────
shallow       2       8     0.413  Moderately Expressive     0.1953  Trainable
medium        4      16     0.121  Moderately Expressive     0.0312  Trainable
deep          8      32     0.047  Highly Expressive         0.0007  Barren Plateau Detected

Step 5 — Interpret the tradeoff¶

The numbers make the Holmes et al. result tangible:

Shallow  ←  KL = 0.41  variance = 0.195  (trainable, limited reach)
Medium   ←  KL = 0.12  variance = 0.031  (trainable, wider reach)
Deep     ←  KL = 0.05  variance = 0.0007 (barren plateau, near-Haar)

As expressibility increases (KL decreases toward 0), trainability degrades — variance collapses by three orders of magnitude from shallow to deep.

The medium ansatz sits in the practical sweet spot: it covers more of Hilbert space than the shallow circuit, retains enough cost-landscape curvature to train, and avoids the barren plateau.

How to use this in practice

Run active_probe_qiskit with 500–1000 samples on your candidate ansatze.
Find the shallowest depth where kl_divergence is below your expressibility target (problem-dependent; 0.2–0.4 is typical for NISQ problems).
Verify trainability with a short passive run and detect_barren_plateau.
Use that depth — adding more layers gains expressibility you do not need and costs trainability you cannot recover.

The expressibility confidence interval¶

The bootstrap confidence intervals on KL divergence tell you whether 1000 samples was sufficient:

for name, r in expr_results.items():
    print(f"{name:8s}  KL = {r['kl_divergence']:.3f}"
          f"  95% CI = {r['kl_ci']}")

shallow   KL = 0.413  95% CI = [0.341, 0.492]
medium    KL = 0.121  95% CI = [0.094, 0.153]
deep      KL = 0.047  95% CI = [0.033, 0.064]

All three intervals are narrow relative to the thresholds (0.1, 0.5), so the 1000-sample budget was sufficient here. If an interval straddles a threshold, run more samples (see the Expressibility analyzer docs for guidance).

Summary¶

Ansatz	KL Divergence	Expressibility	Variance	Trainable?
Shallow (1 layer)	0.413	Moderate	0.195	✅ Yes
Medium (2 layers)	0.121	Moderate	0.031	✅ Yes
Deep (4 layers)	0.047	High	0.0007	❌ Barren plateau

HilbertBench measured both axes from the same circuits — no manual gradient calculations, no access to the parameter-shift formula, no changes to the optimizer. The tradeoff is in the data.

References¶

Holmes, Z., Sharma, K., Cerezo, M., & Coles, P. J. (2022). Connecting ansatz expressibility to gradient magnitudes and barren plateaus. PRX Quantum, 3(1), 010313.
Sim, S., Johnson, P. D., & Aspuru-Guzik, A. (2019). Expressibility and entangling capability of parameterized quantum circuits for hybrid quantum-classical algorithms. Advanced Quantum Technologies, 2(12), 1900070.

Next tutorial → How Hardware Noise Degrades Your Results