TopQAD's Tools

Compiler

The Compiler automates the compilation of circuits from an intermediate representation (IR) format such as OpenQASM down to the scheduling of lattice surgeries (i.e., stabilization instructions) to be implemented in the core processor. Note that the final machine-level instructions for execution of a program on a QPU include also the schedule of lattice surgeries (stabilizations) in modules other than just the core processor (e.g., MSFs); the complete lattice surgery schedule an output of the (lower-level) Assembler.

Compilation in TopQAD begins with circuit synthesis. This is the process of converting¹ all gates in a programmed algorithm into the ISA gate set. During this process, some gates must be replaced and approximated. This will incur synthesis error. The Compiler ensures that the accumulated synthesis errors remain under the user-specified target error budget.

For the example of the Pauli product rotations ISA, arbitrary-angle rotation gates in the programmed algorithm are approximated by gates in the Clifford+ $T$ gate set; TopQAD currently employs the Solovay–Kitaev algorithm [6] and its extensions [7] and [8]. This step incurs synthesis error. The resulting Clifford+ $T$ gates are converted into Pauli product rotations as per the ISA without incurring synthesis error.

TopQAD’s Compiler has the ability to optimize a circuit into one that is more compact by commuting Clifford gates to the end of the circuit so they are absorbed by the final qubit measurements. The resulting circuit will contain only non-Clifford gates, thus significantly reducing the number of operations. However, this process makes the algorithm less parallelizable by creating higher-weight and longer-ranged lattice surgeries [9].

Next, qubits defined in the synthesized circuit are allocated on the logical microarchitecture and the Compiler creates a schedule, which defines when and how (e.g., using which bus patches) each operation will be executed. TopQAD solves this scheduling problem by optimizing the resource usage (in terms of space and time) for the operations required to execute the algorithm. In particular, it minimizes the circuit depth by searching for operations that can be performed in parallel, thereby reducing resource requirements.

The Compiler selects a core processor microarchitecture that is designed to leverage the parallelization potential of gates within the synthesized circuit. This parallelization potential is measured as the ratio of the circuit depth, determined by gate commutativity, to the circuit length, which represents the total number of gates in the circuit. TopQAD may select different core processor layouts depending on the parallelizability of the circuit, two examples of which are given in Fig. 2.

Parallelizable layout — **Figure 2.** Example logical microarchitecture layouts for the memory zone of the core processor, for the Pauli product rotations ISA implemented using a rotated surface code microarchitecture. Both layouts follow conventions detailed in the legend of Fig. 1. Arrows represent access points to auto-correction zones. The logical microarchitecture in (a) has multiple access points for the auto-correction zone and an expanded quantum bus size that increases the chances of finding feasible paths for the lattice surgeries to be performed in parallel. The correction qubit patches are used for Clifford gates. TopQAD determines the number of required auto-correction units and correction qubit patches for the core processor based on the parallelization potential. This logical microarchitecture is preferable for a circuit with high parallelization potential. The logical microarchitecture in (b) has a reduced quantum bus size to save space. Only one access point to an auto-correction unit is created. This enforces serial execution of the gates and is preferable for circuits with reduced parallelization potential. Clifford gates must have been removed during circuit synthesis for this memory zone to be selected, due to the absence of correction qubit patches.

Compact layout — **Figure 2.** Example logical microarchitecture layouts for the memory zone of the core processor, for the Pauli product rotations ISA implemented using a rotated surface code microarchitecture. Both layouts follow conventions detailed in the legend of Fig. 1. Arrows represent access points to auto-correction zones. The logical microarchitecture in (a) has multiple access points for the auto-correction zone and an expanded quantum bus size that increases the chances of finding feasible paths for the lattice surgeries to be performed in parallel. The correction qubit patches are used for Clifford gates. TopQAD determines the number of required auto-correction units and correction qubit patches for the core processor based on the parallelization potential. This logical microarchitecture is preferable for a circuit with high parallelization potential. The logical microarchitecture in (b) has a reduced quantum bus size to save space. Only one access point to an auto-correction unit is created. This enforces serial execution of the gates and is preferable for circuits with reduced parallelization potential. Clifford gates must have been removed during circuit synthesis for this memory zone to be selected, due to the absence of correction qubit patches.

Example: Scheduling Single Operations

Here, we provide an example of scheduling Pauli product rotations gates on a microarchitecture using the rotated surface code.

A Pauli product rotation is represented as

P_{\theta}=\exp⁡(i \theta P),

where $P$ is a Pauli operator which represents a tensor product of single-qubit Pauli operators ( $X$ , $Y$ , or $Z$ ) that act on a set of qubits, and $\theta$ is the rotation angle.

The gates in the Pauli product rotations gate set correspond to the following operations on microarchitectures using the rotated surface code (see the Quantum Architecture Example for more detail):

$\boldsymbol{P_{\pi/8}}$ rotation gates: These correspond to Pauli product rotations of $P_{\pi/8}$ , which can be implemented by performing two multi-qubit measurements in parallel. The first requires a lattice surgery for a $P \otimes Z$ measurement, where $Z$ involves the Pauli operator $P$ and a Pauli- $Z$ operator acting on a magic state storage patch. The second requires a lattice surgery for a $Z \otimes Y$ measurement inside an auto-correction unit involving a Pauli- $Z$ operator acting on a magic state storage patch and a Pauli- $Y$ operator acting on a correction qubit patch for the auto-correcting operation. An example $P_{\pi/8}$ rotation gate is shown in Fig. 3.
$\boldsymbol{P_{\pi/4}}$ rotation gates: These correspond to Pauli product rotations of $P_{\pi/4}$ in the ISA, which can be implemented by performing a multi-qubit measurement for a $P \otimes Y$ measurement involving a Pauli- $Y$ operator in a correction qubit patch.
Qubit measurements: These correspond to the Pauli operator $P$ , which can be implemented by performing a measurement involving only the computational qubits measured. Lattice surgery conducted strictly within the memory zone is sufficient for performing such multi-qubit and perhaps long-range measurements.

Figure 3. Example schedule of a $P_{\pi/8}$ rotation gate for the Pauli product rotations ISA using a rotated surface code microarchitecture.

The scheduling problem is solved at discrete time steps determined by the logical cycle, which is defined as the time needed to perform a parallel set of logical gates in the algorithm.

A key part of the scheduling problem is determining when gates can be executed in parallel. Parallel operations occur when multiple logical gates are scheduled within the same logical cycle. Parallelization can reduce the overall quantum runtime, as it allows for better utilization of available resources. Additionally, parallelizing operations can help minimize computational errors, since idling logical qubits, that is, those not used for operations in a given logical cycle, go through the same stabilization cycles as the ones involved in the logical gates.

Example: Scheduling Parallel Operations

Here, we provide an example of scheduling parallel Pauli product rotations gates on a microarchitecture using the rotated surface code.

Figure 4. Example of two $P_{\pi/8}$ rotations performed in parallel in the memory zone. Assuming commutability between the Pauli operators, parallelization is possible since no bus conflict exists in the lattice surgeries to be performed. The lighter grey line represents a lattice surgery involving 14 bus patches that are used to perform a $P_{\pi/8}$ rotation with a Pauli operator $P=X \otimes Z \otimes X$ involving the computational qubits 3, 6, and 12, respectively. The darker grey line represents a lattice surgery involving 12 bus patches that are used to perform another $P_{\pi/8}$ rotation with a Pauli operator $P=X \otimes Z \otimes Y \otimes X$ involving the computational qubits 1, 4, 8, and 9, respectively. Note that this bus is connected to both $X$ - and $Z$ -basis edges of the computational qubit 8, thus representing a $Y$ operator.

TopQAD solves the scheduling problem using a decomposition approach. First, the logical relations between the logical gates are determined using a trivial commutation rule and then mapped onto a dependency graph. Similarly, the core processor layout is mapped onto an adjacency graph (see Ref. [9] for more information). The decomposition approach uses an earliest-available-first (EAF) policy, where operations are tentatively scheduled as they become available. See Ref. [9] for further details.

By solving the scheduling problem, the Compiler is able to provide a detailed estimate of logical resources used within the core processor when logical gates are being executed, indicating when and which patches of the core processor are active. In addition to the logical resource estimate, the Compiler generates metrics such as the expected number of active logical cycles of the core processor and statistics for the sizes of buses needed to perform each operation. For a more detailed discussion of how to interact with the Compiler, including its inputs and outputs, see the Compiler service page. For a comprehensive explanation of resource requirements that extends beyond the core processor—such as those involving physical resource estimation—please refer to the section on TopQAD’s QRE service.

Assembler

The Assembler is a low-level compilation tool designed to convert a compiled circuit, for example, circuits produced by TopQAD’s Compiler, into sequences of stabilization instructions for execution by QPU controllers and decoders. The assembly process depends on the ISA, the microarchitecture, and the noise profile of the QPUs of the computer. Therefore, TopQAD’s Assembler receives inputs from both the Compiler and the Noise Profiler.

The QPU noise profile, in particular, the performance of various fault-tolerant protocols—e.g., quantum memory, magic state preparation, magic state distillation, code growth, and logical operations—is used by the Assembler to determine how to allocate appropriate physical space for various microarchitecture modules and their interconnects such that the compiled program can be executed within a user-defined error budget. This budget covers errors that might occur during the tasks of running logical operations and producing the states they require. These error rates are predicted using mathematical models or by using data from simulations or experiments, such as those provided by the Noise Profiler.

The Assembler uses specific features of the scheduled program (such as the amount of parallelization or the structure of various segments of the compiled program) to optimize the space–time trade-offs in the execution of the quantum algorithm. With these inputs, the assembler determines the size of the required microarchitecture modules (e.g., the MSF hierarchy and the core processor) and the optimal QEC settings. In this version of TopQAD, the assembler provides a time-optimal microarchitecture, that is, sufficient redundancies in the number of distillation units in the MSF are used for providing a balanced supply of generated magic states for consumption in the core processors.

Error Modelling

The Assembler uses an algorithm that designs and optimizes a logical microarchitecture such that the quantum algorithm it executes will be within an input error budget by providing a logical microarchitecture layout that balances space (the number of physical qubits used) and time (the expected runtime) costs, and the assembled machine-level (stabilization) instructions for that microarchitecture. The Assembler models this as a bi-objective optimization problem, making decisions as to the number of distillation levels required in the MSF, the number of preparation and distillation units at each level, and the code distances required in each level and in the core processor.

Both the compilation and physical execution of a quantum circuit contribute to the total computational error. The Assembler ensures that the assembled program meets an error budget $E$ according to

E_{\text{synth}} + \sum_i E_{\text{mod},i} \leq E,

where $E_{\text{synth}}$ is the synthesis error produced by compilation, and each $E_{\text{mod},i}$ is the error generated in the $i$ -th module during execution. The $E_{\text{synth}}$ value can be determined by TopQAD’s Compiler and provided to the Assembler. However, each $E_{\text{mod},i}$ depends on the architecture of the invoked modules (e.g., core processor, MSF hierarchies, and QROM), and is therefore determined by the Assembler itself. The Assembler proposes an initial logical microarchitecture and iterates on its layout until the error falls within the error budget.

Example: Error Accumulation

In the following example, we consider errors for modules implementing the Pauli product rotations ISA.

Core Processor Errors

The Assembler approximates the total error contribution of the core processor, $E_{\text{core}}$ :

E_{\text{core}} = V_{\text{idle}}\, e_{\text{mem, core}} + \sum_{i=1}^{T} e_{\text{surg}, i},

where $e_{\text{mem, core}}$ is the error rate associated with protecting the idling logical qubits in the core and $e_{\text{surg}, i}$ are error probabilities for the lattice surgery of each operation $i$ in the circuit. Errors from idling qubits are modelled based on the idling volume $V_{\text{idle}}$ , which represents the total exposure of idling computational qubits to errors. More details about the specific models used for $e_{\text{mem, core}}$ and $e_{\text{surg}, i}$ can be found in Ref. [5].

Magic State Distillation Errors

Here, we consider errors for an MSF with multiple distillation levels. Errors are accumulated in this MSF during magic state preparation, the various levels of distillation, and code growth.

The Assembler will increase the number of distillation levels until the logical magic state error rate $e_{\text{msf}}$ achieves a value below the target threshold, that is, by the following condition being met:

e_{\text{msf}} \leq \frac{E - (E_{\text{core}} + E_{\text{synth}})}{T},

where $T$ is the number of magic states required for the compiled algorithm.

The magic state preparation error rate $e_{\text{prep}}$ is determined by the hardware characteristics, such as the physical qubit fidelity and the protocol used for state preparation. Since the preparation error rate is above the target magic state error rate, prepared magic states are input into at least a first distillation level.

For each distillation level $l$ , the input error rate $e_{\text{in}, l}$ will depend on either $e_{\text{prep}}$ , if $l=1$ , or the output error rate from the previous level, $e_{\text{out}, l-1}$ . In addition, each $e_{\text{in}, l}$ incorporates the $l$ -th level’s growth error rate $e_{\text{grow}, l}$ . Each magic state distillation level may encode qubits with different code distances. Thus, $e_{\text{grow}, l}$ is the error rate resulting when magic states are expanded, from a lower code distance to a higher one, between the distillation levels $l-1$ and $l$ .

The Clifford operations performed for distillation itself generate errors at a rate $e_{\text{cliff, l}}$ . Thus, the error rate of the magic states output from a distillation level $e_{\text{out}, l}$ is a function of $e_{\text{in}, l}$ and an average Clifford error rate $e_{\text{cliff, l}}$ . The model used to predict $e_{\text{cliff, l}}$ is determined by the distillation protocol and the code distance used in the $l$ -th level distillation units.

After $L$ distillation levels, the logical magic state error rate $e_{\text{msf}}$ , determined using the last ( $L$ -th) error rates $e_{\text{out}, L}$ and $e_{\text{grow}, L}$ , must be bounded by the target error rate.

The Assembler tunes code distances, production and consumption rates, and preparation and distillation parameters, optimizing the MSF microarchitecture to meet the error budget while minimizing the number of qubits and runtime. More details about the specific error models used for $e_{\text{prep}}$ , $e_{\text{out}}$ , and $e_{\text{grow}}$ can be found in Ref. [5].

Noise Profiler

In a fault-tolerant quantum computer, QECCs are used to protect logical quantum states from physical errors. Logical operations on logical states protected in this manner are performed using FTQC protocols. These protocols, consisting of physical operations and classical computation, are designed such that a given logical operation succeeds with a high probability of success. Generally, the probability of failure or logical error is dependent on the distance of the code in which the quantum state is encoded, but also on other protocol parameters such as the number of stabilization cycles. Formally, an FTQC protocol, for a given set of protocol parameters, is described by a quantum circuit operating on physical qubits that possibly includes some classical conditional logic based on measurement outcomes, and a decoder for the underlying QECC. The Noise Profiler includes routines for generating such protocols.

To estimate the performance of a protocol on a quantum computer requires knowledge of the physical noise experienced by the qubits and gates of which it is composed. This information can be obtained experimentally by characterization techniques, for example, randomized benchmarking, resulting in a set of qubit and gate characterization parameters, as outlined below. To assess the impact of these parameters on the performance of a protocol, the Noise Profiler simulates noisy quantum channels representing the physical circuit of the FTQC protocol. An example noise model supported by the Noise Profiler is the depolarizing noise model, details about which can be found in Appendix E of Ref. 10.

The protocol circuit with added noise can be simulated with the help of a Clifford simulator [10][11] coupled to a fast and accurate decoder [12]. Given the probabilistic nature of errors, Monte Carlo sampling is used to estimate the protocol’s performance metrics, such as the logical error rate, the post-selection rate, or the error-suppression rate.

To model and predict the logical error rates of high-distance fault-tolerant protocols, the Noise Profiler combines Monte Carlo simulations with theoretical models and numerical regressions, fitting a smaller number of model parameters to reflect realistic error behaviour. For example, the logical error rates of the memory protocol can be modelled as

\text{LER} = \mu d^2 \Lambda^{\frac{d+1}{2}},

where the two parameters are:
- Error prefactor ( $\mu$ ): captures the baseline error rate based on the physical characteristics of the quantum processor; and
- Error suppression rate ( $\Lambda$ ): describes how quickly error rates decrease as the code distance increases.

In what follows, the collection of all the data input to the Noise Profiler and the logical error rates of fault-tolerant protocols predicted by it is called the noise profile of the QPU.

A quantum algorithm may be described by a sequence of operations, referred to as gates, belonging to a gate set, such as the universal Clifford+ $T$ set. These operations are followed by a final sequence of qubit measurements to extract the results of the computation. For a given QECC, which underlies the microarchitecture, the gates that can be directly applied are limited to the ISA gate set that the microarchitecture implements. Thus, conversion from a programmed quantum algorithm to the ISA gate set is necessary. ↩

TopQAD's Tools

Compiler

Assembler

Noise Profiler

Footnotes