TopQAD1Qbit

TopQAD's Tools

Compiler

The Compiler automates the compilation of circuits from an intermediate representation (IR) format such as OpenQASM down to the scheduling of lattice surgeries (i.e., stabilization instructions) to be implemented in the core processor. Note that the final machine-level instructions for execution of a program on a QPU include also the schedule of lattice surgeries (stabilizations) in modules other than just the core processor (e.g., MSFs); the complete lattice surgery schedule an output of the (lower-level) Assembler.

Compilation in TopQAD begins with circuit synthesis. This is the process of converting1 all gates in a programmed algorithm into the ISA gate set. During this process, some gates must be replaced and approximated. This will incur synthesis error. The Compiler ensures that the accumulated synthesis errors remain under the user-specified target error budget.

For the example of the Pauli product rotations ISA, arbitrary-angle rotation gates in the programmed algorithm are approximated by gates in the Clifford+TT gate set; TopQAD employs the gridsynth algorithm [6] to decompose arbitrary-angle gates. This step incurs synthesis error. The resulting Clifford+TT gates are converted into Pauli product rotations as per the ISA without incurring synthesis error.

TopQAD’s Compiler has the ability to optimize a circuit into one that is more compact by commuting Clifford gates to the end of the circuit so they are absorbed by the final qubit measurements. The resulting circuit will contain only non-Clifford gates, thus significantly reducing the number of operations. However, this process makes the algorithm less parallelizable by creating higher-weight and longer-ranged lattice surgeries [9].

Next, qubits defined in the synthesized circuit are allocated on the logical microarchitecture and the Compiler creates a schedule, which defines when and how (e.g., using which bus patches) each operation will be executed. TopQAD solves this scheduling problem by optimizing the resource usage (in terms of space and time) for the operations required to execute the algorithm. In particular, it minimizes the circuit depth by searching for operations that can be performed in parallel, thereby reducing resource requirements.

The Compiler selects a core processor microarchitecture that is designed to leverage the parallelization potential of gates within the synthesized circuit. This parallelization potential is measured as the ratio of the circuit depth, determined by gate commutativity, to the circuit length, which represents the total number of gates in the circuit. TopQAD may select different core processor layouts depending on the parallelizability of the circuit, two examples of which are given in Fig. 2.

(a) Parallelizable layout

(b) Compact layout

Figure 2. Example logical microarchitecture layouts for the memory zone of the core processor, for the Pauli product rotations ISA implemented using a rotated surface code microarchitecture. Both layouts follow conventions detailed in the legend of Fig. 1. Arrows represent access points to correction zones. The logical microarchitecture in (a) has multiple access points for the correction zone and an expanded quantum bus size that increases the chances of finding feasible paths for the lattice surgeries to be performed in parallel. The correction qubit patches are used for Clifford gates. TopQAD determines the number of required correction units and correction qubit patches for the core processor based on the parallelization potential. This logical microarchitecture is preferable for a circuit with high parallelization potential. The logical microarchitecture in (b) has a reduced quantum bus size to save space. Only one access point to a correction unit is created. This enforces serial execution of the gates and is preferable for circuits with reduced parallelization potential. Clifford gates must have been removed during circuit synthesis for this memory zone to be selected, due to the absence of correction qubit patches.

The scheduling problem is solved over discrete timesteps, where each timestep corresponds to the earliest logical cycle relative to the beginning of the scheduling where the PZP \otimes Z measurement corresponding to the consumption of the magic state can be performed. A logical cycle corresponds to the time required to perform a lattice surgery, or, in case of several operations being performed in parallel, the time needed to perform the parallel set of corresponding PZP \otimes Z measurements required for each one. This abstraction provides an upper bound on the circuit's parallelization potential within the given architecture. In practice, reaction times must also be considered to determine whether scheduling parallelizable operations in the same logical cycle is advantageous.

The Compiler generates a detailed schedule of lattice surgeries, specifying the earliest timestep at which each operation can be executed. A central challenge in this process is identifying opportunities for parallel execution, that is, when multiple logical gates can be scheduled within the same logical cycle. Parallelization can reduce the overall runtime by improving resource utilization while mitigating errors, since idling logical computational qubits, which are not involved in operations during a given cycle, go through stabilization and are thus prone to error.

TopQAD solves the scheduling problem using a decomposition approach. First, the logical relations between the logical gates are determined using a trivial commutation rule and then mapped onto a dependency graph. Similarly, the core processor layout is mapped onto an adjacency graph (see Ref. [9] for more information). The decomposition approach uses an earliest-available-first (EAF) policy, where operations are tentatively scheduled as they become available. See Ref. [9] for further details.

By solving the scheduling problem, the Compiler is able to provide a detailed estimate of logical resources used within the core processor when logical gates are being executed, indicating when and which patches of the core processor are active. In addition to the logical resource estimate, the Compiler generates metrics such as the expected number of active logical cycles of the core processor and statistics for the sizes of buses needed to perform each operation. For a more detailed discussion of how to interact with the Compiler, including its inputs and outputs, see the Compiler service page. For a comprehensive explanation of resource requirements that extends beyond the core processor—such as those involving physical resource estimation—please refer to the section on TopQAD’s QRE service.

Assembler

The Assembler is a low-level compilation tool designed to convert a compiled circuit, for example, circuits produced by TopQAD’s Compiler, into sequences of stabilization instructions for execution by QPU controllers and decoders. The assembly process depends on the ISA, the microarchitecture, and the noise profile of the QPUs of the computer. Therefore, TopQAD’s Assembler receives inputs from both the Compiler and the Noise Profiler.

The QPU noise profile, in particular, the performance of various fault-tolerant protocols—e.g., quantum memory, magic state preparation, magic state distillation, code growth, and lattice surgery—is used by the Assembler to determine how to allocate appropriate physical space for various microarchitecture modules and their interconnects such that the compiled program can be executed within a user-defined error budget. This budget covers errors that might occur during the tasks of running logical operations and producing the states they require. These error rates are predicted using mathematical models or by using data from simulations or experiments, such as those provided by the Noise Profiler.

The Assembler uses specific features of the scheduled program (such as the amount of parallelization or the structure of various segments of the compiled program) to optimize the space–time trade-offs in the execution of the quantum algorithm. With these inputs, the Assembler determines the size of the required microarchitecture modules (e.g., the MSF hierarchy and the core processor) and the optimal QEC settings. It then provides a range of efficient options to minimize space (using fewer physical qubits) and time (by estimating and reducing the wall-clock runtime). This is achieved by adjusting the size of the MSF to allow the core processor to run at a deficit of magic state supply, thus allowing the execution of the quantum algorithm with a reduced number of physical qubits. The Pareto frontier provided by the Assembler allows users to choose an efficient balance for their specific circuit and hardware, enabling adaptable quantum resource management tailored to their needs. In the Quantum Resource Estimation service, a detailed report on the layouts created by the Assembler is provided.

Error Modelling

The Assembler uses an algorithm that designs and optimizes a logical microarchitecture such that the quantum algorithm it executes will be within an input error budget by providing a logical microarchitecture layout that balances space (the number of physical qubits used) and time (the expected runtime) costs, and the assembled machine-level (stabilization) instructions for that microarchitecture. The Assembler models this as a bi-objective optimization problem, making decisions as to the number of distillation levels required in the MSF, the number of preparation and distillation units at each level, and the code distances required in each level and in the core processor.

Both the compilation and physical execution of a quantum circuit contribute to the total computational error. The Assembler ensures that the assembled program meets an error budget EE according to

Esynth+iEmod,iE,E_{\text{synth}} + \sum_i E_{\text{mod},i} \leq E,

where EsynthE_{\text{synth}} is the synthesis error produced by compilation, and each Emod,iE_{\text{mod},i} is the error generated in the ii-th module during execution. The EsynthE_{\text{synth}} value can be determined by TopQAD’s Compiler and provided to the Assembler. However, each Emod,iE_{\text{mod},i} depends on the architecture of the invoked modules (e.g., core processor, MSF hierarchies, and QROM), and is therefore determined by the Assembler itself. The Assembler proposes an initial logical microarchitecture and iterates on its layout until the error falls within the error budget.

Noise Profiler

In a fault-tolerant quantum computer, QECCs are used to protect logical quantum states from physical errors. Logical operations on logical states protected in this manner are performed using FTQC protocols. These protocols, consisting of physical operations and classical computation, are designed such that a given logical operation succeeds with a high probability of success. Generally, the probability of failure or logical error is dependent on the distance of the code in which the quantum state is encoded, but also on other protocol parameters such as the number of stabilization rounds. Formally, an FTQC protocol, for a given set of protocol parameters, is described by a quantum circuit operating on physical qubits that possibly includes some classical conditional logic based on measurement outcomes, and a decoder for the underlying QECC. The Noise Profiler includes routines for generating such protocols.

To estimate the performance of a protocol on a quantum computer requires knowledge of the physical noise experienced by the qubits and gates of which it is composed. This information can be obtained experimentally by characterization techniques, for example, randomized benchmarking, resulting in a set of qubit and gate characterization parameters, as outlined below. To assess the impact of these parameters on the performance of a protocol, the Noise Profiler simulates noisy quantum channels representing the physical circuit of the FTQC protocol. An example noise model supported by the Noise Profiler is the depolarizing noise model, details about which can be found in Appendix E of Ref. 10.

The protocol circuit with added noise can be simulated with the help of a Clifford simulator [11][12] coupled to a fast and accurate decoder [13]. Given the probabilistic nature of errors, Monte Carlo sampling is used to estimate the protocol’s performance metrics, such as the logical error rate, the post-selection rate, or the error-suppression rate.

To model and predict the logical error rates of high-distance fault-tolerant protocols, the Noise Profiler combines Monte Carlo simulations with theoretical models and numerical regressions, fitting a smaller number of model parameters to reflect realistic error behaviour. For example, the logical error rates of the memory protocol can be modelled as

LER=μd2Λd+12,\text{LER} = \mu d^2 \Lambda^{\frac{d+1}{2}},

where the two parameters are:
- Error prefactor (μ\mu): captures the baseline error rate based on the physical characteristics of the quantum processor; and
- Error suppression rate (Λ\Lambda): describes how quickly error rates decrease as the code distance increases.

In what follows, the collection of all the data input to the Noise Profiler and the logical error rates of fault-tolerant protocols predicted by it is called the noise profile of the QPU. A noise profile is specific to the combination of fault-tolerant protocol and noise model that is selected for noise profiling. In the following, we describe the protocols and noise models available in TopQAD.

Protocols

This section explains the protocols currently available for the Noise Profiler. It will be updated as additional models become available.

Quantum Memory

This protocol is used to protect idle qubits from noise. For simulation, a logical basis state of the rotated surface code of distance dd is simultaneously created and stabilized for rr rounds. A logical zero state 0L|0\rangle_L is created by first indiviually preparing all the physical data qubits in the zero state 0|0\rangle and then the rr stabilization rounds are conducted. A plus state +L|+\rangle_L is created similarly, except that the physical data qubits are individually prepared in the plus state +|+\rangle.

To verify that the state was correctly preserved in memory, after r rounds, all data qubits are measured. Then, the value of the logical observable is extracted. For example, to verify that the 0L|0\rangle_L was preserved, the logical ZZ operator is determined by multiplying the value of all data qubits along the top row of the surface code grid. If the value is +1+1, then no logical XX error occured to flip the value of the state.

Magic State Preparation Hook Injection

This protocol has two stages [14]. In the first stage, two things happen simultaneously. First, a distance 2 surface code patch is created. However, in this process an intentional hook error is introduced that rotates the logical state of the code into the TT state. Second, a surface code of distance d1d_1 is created. Consequently, a TT logical state is created in a distance d1d_1 surface code. If any detectors trigger in this stage, the protocol is restarted. Otherwise, the second stage proceeds, in which the magic state is grown to a final distance d2d_2.

In this implementation, an analogous protocol is constructed suitable for simulation on a Clifford simulator. Instead of creating a logical TT state, a logical XX state is created.

Magic State Preparation Rep Code

This protocol was inspired by Ref. [15]. Unlike in Ref. [15], this implementation is on the CSS rotated surface code with odd distances. This protocol has two stages. First a two-qubit repetition code is created and its state is rotated with the aid of a two-qubit rotation gate to create a logical magic state. Next, the repetition code is deformed/expanded into a surface code patch. If, during this expansion, any of the detectors trigger, the protocol is restarted. In the second stage, the logical state is grown to some final desired distance.

In this implementation, an analogous protocol is constructed suitable for simulation on a Clifford simulator. Instead of creating a logical TT state, a logical XX or YY state is created.

Lattice Surgery

Lattice surgery is the method by which entangling operations can be performed on surface code qubits. A lattice surgery is equivalent to measuring a joint Pauli operator of two or more qubits. For example, two logical surface code patches could have their XXXX operator measured. If they are in the a+++ba|++\rangle + b|--\rangle state then the outcome would be +1+1, if they are in a++b+a|+-\rangle + b|-+\rangle state then the outcome would be 1-1, and if they are in any other state, then the outcome would be random.

In this protocol, a lattice surgery is implemented between two surface code patches. Two distinct logical qubits are created in the provided bases. They may be initially stabilized for a few rounds. Then a surface code merge is performed. This takes a logical operator of each of the qubits and joins the two surface code patches along these two operators. This is accomplished with the aid of a bus connecting the two qubit patches. This may entangle the two qubits depending on the bases and operators chosen. The merge code is stabilized for at least O(d)\mathcal{O}(d) rounds. Finally, the bus region is measured. This is called splitting the code. If done correctly, the stabilizer measurements of the bus yield the value of the joint observable of the merge. The two qubits may be stabilized for an additonal number of rounds before they too are measured.

Stability

The stability protocol [16] is used to estimate how well a fault-tolerant system can move logical observables through space or, equivalently, determine the product of a large region of stabilizers. Logical observables are, for example, moved through space in lattice surgery operations.

Noise Models

In this section, we explain the noise models currently available for the Noise Profiler. The section will be updated as additional models become available.

Uniform Depolarizing

A simple uniform depolarizing noise model, in which all noise channels have the same strength pp. The noise channels added before or after every circuit operations are as follows.

  • Preparation and reset are followed by bit or phase flip errors, depending on the basis.
  • Measurements are noisy with probability pp of error, and are followed by a one-qubit depolarizing noise channel.
  • One-qubit gates are followed by a one-qubit depolarizing noise channel.
  • Two-qubit gates are followed by a two-qubit depolarizing noise channel.
  • Idle qubits experience a one-qubit depolarizing noise channel at each tick.

This model was inspired by the model described in Appendix A of Ref. [10].

Physical Depolarizing

A physical depolarizing noise model, based on the model described in Appendix E of Ref. [9].

It consists of three types of errors: gate errors, idling errors, and state preparation and measurement (SPAM) errors.

Gate Errors

Imperfect gates are modelled by adding a depolarizing noise channel at rate pp to the gate qubits after the application of each gate, where the parameterization assumes that 1p1 - p is the probability of leaving the input state unaltered.

The rate pp is determined by utilizing the formula for the depolarizing channel's average gate fidelity,

Fdep,n=1(2n1)2n22n1p,F_{\text{dep}, n} = 1 - \frac{(2^n - 1) 2^n}{2^{2n} - 1} p,

where nn is the number of gate qubits.

Idling Errors

Any qubit that is idling experiences an error, dependent on both the T1T_1 (longitudinal relaxation time) and the T2T_2 (transverse relaxation time) of the qubit, as well as the time tt it takes to apply the gate(s) to the active qubits.

The error is modelled with a one-qubit stochastic Pauli noise channel defined by the parameter vector

(px,py,pz)=(14(1et/T1),14(1et/T1),14(1+et/T12et/T2)),(p_x, p_y, p_z) = \left(\frac{1}{4} (1 - e^{-t/T_1}), \frac{1}{4} (1 - e^{-t/T_1}), \frac{1}{4} (1 + e^{-t/T_1} - 2 e^{-t/T_2})\right),

where px,py,pzp_x, p_y, p_z are the probabilities of X,Y,ZX, Y, Z errors, respectively.

SPAM Errors

The errors in state preparation and reset are captured by assuming that, with rate pp, the orthogonal state is produced, that is, 0|0\rangle is prepared instead of 1|1\rangle and vice versa. Similarly, measurements in the ZZ basis are flipped at rate pp. In all three cases, the fidelity of the operation is

FSPAM=P(00)+P(11)2,F_{\text{SPAM}} = \frac{P(0|0) + P(1|1)}{2},

from which we directly determine the rate p=1FSPAMp = 1 - F_{\text{SPAM}}.


Footnotes

  1. A quantum algorithm may be described by a sequence of operations, referred to as gates, belonging to a gate set, such as the universal Clifford+TT set. These operations are followed by a final sequence of qubit measurements to extract the results of the computation. For a given QECC, which underlies the microarchitecture, the gates that can be directly applied are limited to the ISA gate set that the microarchitecture implements. Thus, conversion from a programmed quantum algorithm to the ISA gate set is necessary.