# Improving Linearity in CMOS Phase Interpolators

Amit Kumar Mishra<sup>10</sup>, *Member, IEEE*, Yifei Li, *Member, IEEE*, Pawan Agarwal<sup>10</sup>, *Member, IEEE*, and Sudip Shekhar<sup>10</sup>, *Senior Member, IEEE* 

Abstract-We compare the prior art in phase interpolators (PIs), classifying them as current-mode, voltage-mode, and integrating-mode PI. Next, we present an integrating-mode PI where the voltage slopes with high phase linearity are generated through the integration of phase-shifted weighted current sources. The constant and variable voltage slopes are generated by current sources/sinks created using stacked devices in a 0.75-V 5-nm finFET technology. This PI technique supports the high-speed and low-power operation and achieves dual-edge interpolation with improved duty-cycle distortion characteristics. The PI generates an output clock with 9 bits of resolution and a small peak-to-peak integral nonlinearity (INLpp) and peak-to-peak differential nonlinearity (DNLpp) of 2.4° and 1.4°, respectively, at 13.3 GHz with just quadrature clock inputs. The PI has a 71-fs<sub>rms</sub> random jitter (integrated from 3 MHz to 3 GHz) and occupies an active area of 0.006 mm<sup>2</sup> while consuming 6-mW power at 14 GHz. An integrated rotation spur of -42.6 dBc for 256-ppm modulation at 13.3-GHz operating frequency is achieved for 1-GHz update rate for the dynamic linearity measurements.

*Index Terms*—AM–PM, digital-to-phase converter (DPC), dynamic linearity, fractional frequency synthesis, integratingmode, phase mixer, phase rotator, plesiochronous.

# I. INTRODUCTION

**F**AST-GROWING data traffic in data centers demands the wireline transceivers to operate at higher data rates. A multilane transceiver implementation is required to support high data rates necessitating low-power and compact clocking. The CMOS technology scaling also helps advanced nodes [1], [2] provide fast-switching transistors. However, the reduced supply voltage which accompanies node scaling favors digital-friendly techniques, and consequently, clocking solutions using current-mode logic (CML) [3] are unamenable.

In a receiver (RX), a global phase-locked loop (PLL) often produces a differential clock which is distributed to multiple lanes. The differential phases are provided to a multiphase generator (MPG) in each lane for generating multiple phase clocks, which are supplied to a phase interpolator (PI) in

Manuscript received 20 May 2022; revised 11 October 2022, 24 November 2022, and 10 January 2023; accepted 2 February 2023. This article was approved by Associate Editor Hui Pan. This work was supported in part by Maxlinear, Inc., in part by the Natural Sciences and Engineering Research Council of Canada, and in part by Intel Corporation. (*Corresponding author: Amit Kumar Mishra.*)

Amit Kumar Mishra and Sudip Shekhar are with the Department of Electrical and Computer Engineering, The University of British Columbia, Vancouver, BC V6T1Z4, Canada (e-mail: amit@ece.ubc.ca).

Yifei Li and Pawan Agarwal are with Maxlinear, Inc., Carlsbad, CA 92008 USA.

Color versions of one or more figures in this article are available at https://doi.org/10.1109/JSSC.2023.3243305.

Digital Object Identifier 10.1109/JSSC.2023.3243305

a clock and data recovery (CDR) system. The PI output is a local clock that samples the data optimally in time for recovering the data with the lowest possible bit error rate (BER). Depending on clock recovery architecture, the PI output clock requires either phase or both phase and frequency offset correction. Only phase deskew is needed for the source-synchronous clocking where the transmitter (TX) clock is sent to the RX [4], [5]. In plesiochronous systems, the RX needs to recover its clock from the data. However, due to a mismatch between the crystal oscillators at the TX and the RX, the global PLL clock frequency at the RX differs from the TX by  $\Delta f$ . The CDR, therefore, rotates the PI for accumulating  $\Delta f$  in a PI-based CDR. Moreover, the PI can also be used to provide the frequency modulation required for spread-spectrum clocking [6].

The PI time quantization error,  $T_{LSB}$ , combined with phase differential nonlinearity (DNL) and phase integral nonlinearity (INL) constitutes the PI deterministic jitter (DJ) [7]. Narrow symbol time periods at higher data rates reduce the sampling time margin demanding small jitter and high linearity from a PI. Moreover, the ability of a PI to work with quadrature inputs greatly relaxes the MPG design and input phase correction concerns. However, PI linearity degrades for greater input phase separation [3], [8]. Therefore, eight or more phases are often required from an MPG which consumes significant power, uses complex architectures, and faces challenges for phase accuracy. In [3], both a quadrature delay-locked loop (DLL) and an eight-phase injection-locked ring oscillator (ILRO) work in tandem to lower the phase errors for eight-phase generation at the cost of higher power dissipation. An eight-phase delay line with three cascaded eight-phase ILROs is used in [9] to improve the phase accuracy increasing the clocking power consumption.

We present a scaling friendly, low power, and compact PI [10] which addresses the above mentioned problems by: 1) employing a higher resolution of 9 bit for reducing  $T_{LSB}$ , and 2) using a high phase linearity technique for reducing the phase INL and DNL. Furthermore, the input phase separation requirement is relaxed due to high phase linearity, enabling the PI to operate with just quadrature phase inputs.

This article is organized as follows. Section II describes the classification and comparative analysis of PIs and reasons for their phase nonlinearity. Section III discusses the conceptual operation, architecture, design considerations, and static and dynamic linearity of the high-speed PI in this work. Measurement results are presented in Section IV and this article is concluded in Section V.

0018-9200 © 2023 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See https://www.ieee.org/publications/rights/index.html for more information.



Fig. 1. Current-mode phase interpolator.

# **II. CLASSIFICATION OF PHASE INTERPOLATORS**

PIs for high-speed links can be broadly classified into three categories: 1) current-mode PI (CMPI); 2) voltage-mode PI (VMPI); and 3) integrating-mode PI (IMPI).

#### A. Current-Mode Phase Interpolator

As shown in Fig. 1, a CMPI is implemented as an I-Q phase mixer architecture where two orthogonal sinusoids are weighted and summed to produce an interpolated output. The prominent sources of phase nonlinearity for this PI architecture are: 1) high voltage swings [11]; 2) higher harmonics in the inputs clock signals [11]; 3) implementation of a linear weighting scheme for interpolation [12]; and 4) inadequate harmonic filtering at the output node [13].

The methods employed to mitigate each of the mentioned sources of nonlinearity are as follows.

- Input amplitude is prudently chosen—large enough to steer the current but small enough to avoid hard switching and generation of higher harmonics. CMOS-CML conversion in [3] with tunability sets an appropriate voltage swing.
- 2) Higher harmonics introduce distortion into the interpolated output, degrading phase linearity. Techniques such as poly-phase filtering [14] and *LC*-based filtering [11] remove higher harmonics at the inputs.
- 3) Ideally, a sinusoidal weight implementation scheme is required, and the *I* and *Q* weights,  $\alpha$  and  $\beta$ , respectively, lie on a circle such that the generated interpolated phases at the load,  $V_X$ , have a constant amplitude (see Fig. 1). A constant amplitude is desired because the swing-dependent delay characteristic of the CML-to-CMOS (C2C) circuit results in AM–PM distortion which eventually manifests as phase nonlinearity.

Some techniques, such as octagonal weight constellation [9], [15], [16], and  $45^{\circ}$  offset *I-Q* combination [17] coarsely approach the circular weight constellation for reducing AM at the expense of additional circuitry and implementation complexity. An octagonal constellation CMPI can be implemented more compactly (close to a diamond constellation) than the CMPI with  $45^{\circ}$  offset combination technique. However, the latter technique has shown better phase linearity. A linear weighting



Fig. 2. Voltage-mode phase interpolator.

scheme often employed [11], [12] allows ease of implementation; however, it produces a diamond constellation that causes phase nonlinearity. The INL for the first quadrant (see Fig. 1) is given as

INL = 
$$(1 - \alpha)\frac{\pi}{2} - \arctan\left(\frac{1 - \alpha}{\alpha}\right)$$
 (1)

where  $0 \le \alpha \le 1$ . Therefore, the INL<sub>pp</sub> contribution just from the diamond constellation weight implementation is 8.14°.

4) An *RC* network with tuning is widely used as an output load [3], [7], [11]; however, the harmonic filtering may still be inadequate for various applications. *LC* networks can be used for better filtering [13], [18] at the cost of area and susceptibility to electromagnetic coupling. An active inductor load used in [19] provides bandwidth extension while occupying a small area; however, it requires load tunability to adjust the peaking transfer function with PVT change.

# B. Voltage-Mode Phase Interpolator

Interpolation in the voltage domain with the input signals,  $V_{\Phi_{in1}}$  and  $V_{\Phi_{in2}}$ , as square waves does not yield an interpolated phase output. For the square wave inputs, if the output rise (fall) time of the PI inverter is considerably short than the time duration of the  $[V_{\Phi_{in1}}, V_{\Phi_{in2}}] = [0, 1]$  ([1, 0]) region, the PI output settles at an intermediate voltage level between the VDD and the GND in that region [20]. These code-dependent voltage levels are formed due to the voltage division from the PMOS-NMOS (PN) short-circuit path between the VDD and the GND arising in those regions and substantially degrades the PI phase linearity. Therefore, input signals used for interpolation need to slew, about three times their input time separation, requiring slew control circuits at the inputs [1], [8], [20]. A larger input slew requires a smaller input phase separation, which implies using more input phases. Therefore, it is challenging to use a few input phases, e.g., quadrature inputs, and simultaneously achieve high phase linearity in a VMPI. Some VMPI architectures also use filtering at the output to improve phase linearity [21]. In a slice-based implementation for VMPI, (M-K) and K slices are selected for the  $V_{\Phi_{in1}}$ and  $V_{\Phi_{in2}}$ , respectively, to achieve an interpolation factor of K/M (see Fig. 2). A slice is usually implemented as a tri-state inverter. The interpolated outputs, however, are variable-slope signals with slope-shape dependent on K. These interpolated signals suffer code-dependent delay when evaluated by a comparator, leading to phase nonlinearity [22].



Fig. 3. (a) Generic IMPI architecture. Variable and constant slope generation. (b) Possible implementations for  $I_1$  and  $I_2$ —as current source, sink, or source-sink duo. (c) IMPI prior art-I waveforms. Single-edge interpolation and output DCD as architectural limitations are illustrated.

# C. Integrating-Mode Phase Interpolator

A generic IMPI architecture, as shown in Fig. 3(a), underpins the IMPI operation in prior works [5], [23], [24], [25] and this work. In this architecture, two input phases, either directly or through a control logic, control the current sources  $I_1$  and  $I_2$  to charge/discharge the capacitor ( $C_0$ ), producing voltage waveforms,  $V_X$ . The PI code sets the magnitude of  $I_1$  and  $I_2$ . A discharge signal,  $V_{\text{dischrg}}$  (optional, used in some architectures), performs the reset operation and is generated by the control logic [23] and/or PI feedback [5], [24]. The current sources can be implemented either as current sources, sinks, or a current source-sink duo, as shown in Fig. 3(b).

Constant and variable voltage slope regions are integral to IMPI operation. Fig. 3(a) shows the variable and constant slope generation in prior art [5], [23], [24], [25]. In the variable slope region where only  $I_1$  is active, the PI code sets the current magnitude giving rise to code-controlled (variable) slopes. Variable slopes essentially create linear-spaced vertical voltage levels at the end of the variable slope region from where the constant slope region starts. For the constant slope, the PI code chooses currents  $I_1$  and  $I_2$  such that total current  $I_1 + I_2$  is constant, and therefore  $C_0$  is charged with a constant current resulting in constant slopes. The constant slopes emanating from these linear-spaced voltage steps will be linear-spaced in time in the constant slope region. A comparator evaluates these waveforms in the constant slope region, thus producing linear interpolated outputs.

1) IMPI Prior Art-I: The IMPIs described in [23], [24], and [5] are operationally similar and are labeled here as prior art-I. In [5], variable slopes are generated in region-I from  $(M-K) \cdot I_u$  current and constant slopes are generated in region-II from the  $M \cdot I_u$  current followed by a reset in region-III producing voltage  $V_X$  [see Fig. 3(c)]. Here,  $I_u$  is the current that flows through a unit current source (sink) for charging (discharging)  $C_O$ . When evaluated by a comparator, a pulsewidth modulation (PWM) signal  $V_O$  is generated. Being operationally similar, all prior art-I architectures produce the same  $V_X$  waveforms and have some limitations: 1) it only provides single-edge interpolation; 2) the outputs have duty-cycle distortion (DCD), resulting in PWM outputs; and 3) the DCD depends on the interpolation factor.

Moreover, PWM signal propagation through buffers, sharing of PWM signals between complementary PIs [23], and reset using feedback [24] poses challenges for implementation and frequency scaling. IMPI in [5] improves on the architectures in [23], and [24] for linearity and power supply sensitivity, uses a differential comparator to remove the inverter (comparator) threshold variation, and employs replica integrators to remove the nonlinearity caused by quiescent currents in [24]. However, the architecture [5] still faces several challenges.

- 1) Need for Current Calibration: Three cases are shown in Fig. 4 (top) where the current is: i) lower than optimal; ii) optimal; and iii) higher than optimal. Case i) leads to voltage swing reduction at  $V_X$ , resulting in a decreased comparison time for the comparator evaluating  $V_X$  waveforms. A shrink in comparison time poses difficulty for high-frequency operation. In case iii), phase nonlinearity occurs as the comparator threshold,  $V_{\text{th}}$ , crosses the  $V_X$  signals in the variable slope region. Therefore, current calibration methods using a digital DLL in [24] are used.
- 2) *Need for Optimal*  $V_{th}$ : Consider the scenario where the IMPI is at its optimal current setting, but the  $V_{th}$  changes [see Fig. 4 (bottom)]. If  $V_{th}$  is greater than optimal, as in case 1), the comparison time reduces, affecting the high-frequency operation. If  $V_{th}$  is lower than optimal, as in case 3), then phase nonlinearity occurs.
- 3) Need for Auxiliary Circuit: Since these IMPIs work on a single edge, an edge combiner is used for combining two single-edge PWM outputs to construct differential signals with 50% duty cycle. The improved IMPI in [5] still uses several auxiliary circuits, which include replica integrating cores, differential comparators, buffers, edge combiner, etc., costing power and area. Furthermore, the comparators still need optimal V<sub>th</sub> and input commonmode voltage, VCM, which were provided externally.



Fig. 4. IMPI prior art-I: Cases illustrating the need for current and  $V_{\rm th}$  calibration.

2) IMPI Prior Art-II: Another IMPI architecture, prior art-II [25], as shown in Fig. 5(a) is a slice-stacked-based implementation that provides dual-edge operation with 50% duty cycle outputs at 2-GHz operating frequency. The input clock phases,  $V_{\Phi_{in1}}$  and  $V_{\Phi_{in2}}$ , control the top/bottom PN current sources. The S1-S4 switches control the current flow in and out of the current sources to the  $V_X$  node and, therefore, the interpolation factor. Retention cells, structurally similar to the IMPI slice, are required to retain the logic level when the interpolation operation is over to prevent node  $V_X$  from floating. In regions I, II, IV, and V, the S signals enable/disable the controlled transistors in IMPI, and the IMPI effectively assumes the current source-sink structure shown in Fig. 5(c). In these regions, the R signals cut the signal path from the retention cell to node  $V_X$ , thus isolating the IMPI from the retention cell. In regions III and VI, the S signals disable all the controlled transistors in IMPI, which presents the issue of the floating node at  $V_X$ . To prevent this condition, the R signals connect the node  $V_X$  to the retention cell, providing the desired logic voltage level. Thus, S and R signals keep changing to EN and disable the transistors in IMPI and retention cells, respectively, for guiding the IMPI from region-I to region-VI in a  $T_{\text{period}}$ . It is important to note that the control signals, S and R, are not fixed for a PI code but are PWM signals switching close to clock rate.

- 1) Regions of Operation and Waveforms: In a clock time period ( $T_{period}$ ), this PI undergoes six regions, as shown in Fig. 5(b). In region-I,  $V_{\Phi_{in1}} = 0$  and  $V_{\Phi_{in2}} =$ 1, and thus (M-K) PMOS current sources are ON, providing variable rising slopes. Both (M-K) and KPMOS current sources are ON in region-II, giving rise to constant rising slopes. In region-III, all the *S* switches turn off; therefore, the  $V_X$  node is floating. A signal from the retention cell is used to maintain the logic level. A similar operation sequence follows with NMOS current sinks during the falling edge, and the PI goes through regions IV-to-VI.
- Challenges in Scaling Operating Frequency: However, it is challenging to accommodate two interpolation and two retention regions within T<sub>period</sub> as the operating frequency increase and T<sub>period</sub> shrinks [see Fig. 5(d)].

TABLE I Comparing High-Speed IMPI With State-of-the-Art PIs

| Feature                                                  | CMPI        | VMPI         | IMPI (prior-art) | IMPI (this work) |
|----------------------------------------------------------|-------------|--------------|------------------|------------------|
| Phase linearity                                          | Moderate    | Low-Moderate | Moderate-High    | High             |
| Voltage swing control at i/p required?                   | Yes         | No           | No               | No               |
| Slew-rate control/Harmonic filtering at inputs required? | Yes         | Yes          | No               | No               |
| Linear weighing scheme                                   | Undesirable | Desirable    | Desirable        | Desirable        |
| Scalable to FinFET process                               | No          | Yes          | Yes              | Yes              |
| Chip area                                                | Moderate    | Small        | Small            | Smaller          |
| Harmonic filtering at o/p required?                      | Yes         | May be       | No               | No               |
| High frequency operation                                 | Yes         | Yes          | No               | Yes              |
| Tuning/calibration required?                             | Yes         | Yes          | May be           | No               |

Moreover, the generation of *S* and *R* signals uses feedback and complex logic; and therefore, they are difficult to generate for a small  $T_{\text{period}}$ . Thus, this architecture is not amenable for high-frequency operations. For the implementation in [25], a small phase spacing of 22.5° between input signals favorably provided a small time span of  $T_{\text{period}}/16$  for the variable slope region, which considerably relaxed the linearity requirements from this IMPI. However, such small phase spacing requires more phases from the MPG.

# D. Comparison of CMPI, VMPI, and IMPI

In contrast to IMPI, both CMPI and VMPI require some or all of these: 1) input slew control; 2) input harmonic filtering; and 3) output harmonic filtering. However, the residual phase nonlinearity remains in a: 1) CMPI due to inadequate harmonic filtering and nonsinusoidal weighting and 2) VMPI from the variable-slope signals, which suffer phase distortion from the evaluating comparator.

An IMPI works suitably with linear weights and relieves the abovementioned requirements because of the following reasons.

- Its operation involves switching the current sources ON/OFF, suited for inputs as square waves. Therefore, slew or harmonic filtering at inputs is undesired and therefore not required.
- In an IMPI, the output is evaluated by a comparator in the constant-slope region. Thus, the phase nonlinearity which the comparator otherwise presents for evaluating varying slope signals is eliminated [22].

A comparison is provided in Table I. In summary, a CMPI works at high frequencies and provides superior power supply noise rejection but does not scale well with the process or VDD and requires nonlinear weighting to improve phase linearity. A VMPI implements linear weighting, is compact and scaling compatible, and works at high frequency but provides low-moderate linearity. The resolution for CMPI and VMPI are primarily limited by their phase nonlinearities. An IMPI has fundamentally better phase linearity. However, its implementation is complex and faces challenges for high-frequency operation, as described in Section III. This work presents an IMPI that works at high speed and achieves a resolution of 9 bits.

# III. HIGH-SPEED IMPI (THIS WORK)

This work aims for an IMPI with high linearity and high-speed operation having: 1) no control logic for the region determination; 2) inherent dual-edge operation with 50% duty



Fig. 5. IMPI prior art-II. (a) Architecture. (b) Regions of operation. (c) Equivalent architecture with current sources and sinks. (d) Waveforms during different regions of operation.

cycle outputs; 3) operation at the highest frequency attainable by digital circuits in a technology; 4) no calibration (slew-rate, current magnitude, comparator threshold, etc.); 5) no voltage bias requirements for gate control of current sources; 6) no retention/reset region of operation; and 7) no feedback control.

# A. Conceptual Operation

The concept for this IMPI is described in Fig. 6(a). A square wave bidirectional current charges/discharges a capacitor periodically to produce a triangular-shaped output voltage at node  $V_X$ . Accordingly, a phase-shifted square wave input current yields a phase-shifted triangular output.

The following idea emerges: When these phase-shifted  $V_X$  signals are interpolated; during the time frame where the interpolating  $V_X$  signals have identical slopes "*S*," the interpolated output retains the constant slope *S*. However, during the time frame where the  $V_X$  signals have opposite slope signs, *S* and -S, the interpolated output slope varies. Furthermore, if the interpolation weights are linear, the slope variation will also be linear and a function of *K*. This interpolation, therefore, gives rise to a family of piecewise linear (PWL) signals, which are linear-spaced in time in the constant slope regions. When evaluated by a comparator, linear phase interpolated outputs results. This interpolation at the output can be conveniently achieved by combining linear weighted square wave current at the inputs, forming the basis of this IMPI realization.

#### B. Architecture and Operation

Due to its superior phase linearity, the proposed IMPI relaxes the input phase separation requirement, allowing its operation with just quadrature inputs. Not requiring eight input phases reduces power consumption, implementation complexity, and phase mismatch in MPG. As shown in Fig. 6(b), DCD and I-Q correction block clean the phase errors on the quadrature inputs clocks. For phase selection, a 9-bit code is applied to a decoder which generates the enable (EN) signals for selecting slices within the PI. Out of 9 bit, 2 bit choose a quadrant, and the remaining 7 bit provide interpolation within the quadrant. PI core output passes through a series capacitor,  $C_{\text{ser}}$ , to a C2C converter consisting of a resistor-biased inverter followed by a buffer. Two such PI cores produce differential outputs through proper slice selection by EN-bits. Crosscoupled inverters are connected between the buffer chains in complementary C2C paths for maintaining the outputs in differential. The PI core contains two IMPI\_2x blocks and two IMPIs constitute each IMP\_2x block. An IMPI contains two tunable current source-sink duos,  $I_1$  and  $I_2$ , which are realized as slice-stacks and receive complementary EN-bits.

1) IMPI\_2x Structure, Quadrant Switching: The IMPI\_2x block thus comprises four slice-stacks connected to quadrature inputs. EN-bits configure these four stacks into two IMPIs where one is active, and another is inactive. The active and inactive IMPIs are dynamically reconfigured as per the



Fig. 6. High-speed IMPI (this work). (a) Conceptual operation. (b) Overall architecture. (c) Half-slice design comparison. (d) IMPI\_2x structure and its dynamic configuration for quadrant switching.

quadrant location of the expected output phase [see Fig. 6(d)]. In quadrant-I, the active IMPI is formed by stacks connected to 0° and 90°; in quadrant-II, the stacks connected to 90° and 180° are used, and a similar operation follows for the other quadrants. Therefore, the EN-bits are coded to select: 1) appropriate stacks for choosing a quadrant and 2) slices within the stacks for interpolation. This scheme obviates the requirement of a multiplexer (MUX) between quadrature inputs and the PI core. Therefore, the phase nonlinearity originating from the propagation delay mismatch of different phases at the MUX outputs is eliminated. Although using MUX would reduce the core IMPI area since the slice stacks can be reused for different phases. However, since the presented PI already occupied a small area, phase linearity improvement measures were prioritized over further area reduction.

An IMPI half-slice forms the unit of these slice stacks. A high output impedance is required for the half-slice to work as a good current source. Two architectures were evaluated for the half-slice design [see Fig. 6(c)]. The first architecture obtains high output impedance by transistor stacking, while the second uses both the resistor and transistor stacking. While the first architecture provides better linearity, the second produces less jitter and has lower INL change from the process

variation with good linearity and was chosen for this IMPI. In a half-slice, the input clocks are connected to transistors  $M_1$  and  $M_2$  while  $M_5$  and  $M_6$  are connected to rail-to-rail EN-bits for switching the slice ON/OFF.

The finFET transistors closer to the output node [i.e.,  $M_5$  and  $M_6$  in Fig. 6(c)] operate mostly in saturation. On the other hand, other transistors, such as  $M_1$  and  $M_2$ , operate in the triode region, providing source degeneration for the devices in saturation and increase the  $r_{out}$  at the stack output [26]. The highly cascoded architecture-I realizes a larger  $r_{\rm out} \approx [(4r_{\rm deg}) \cdot g_m r_o]/2$  than the  $r_{\rm out} \approx$  $[R + r_{\text{deg}} \cdot g_m r_o]/2$  in architecture-II, where  $r_{\text{deg}}$  is the resistor realized by a transistor providing degeneration (such as  $M_1/M_2$ ), and  $g_m$  and  $r_o$  represent the transconductance and output resistance of a transistor in saturation (such as  $M_5/M_6$ ). Thus, architecture-I realizes a better current source than architecture-II, which results in higher phase linearity. It should be noted that the transistors in architecture-I need to be appropriately upsized (than architecture-II) for obtaining the same  $I_u$ . However, since architecture-I is an all-transistorbased implementation, it suffers from a larger mismatch in slices resulting in a higher INL<sub>pp</sub> variation ( $1.8 \times$  higher) than architecture-II.



Fig. 7. High-speed IMPI. (a) IMPI slice. (b) Equivalent architecture with current sources and sinks. (c) Input and output waveforms. (d) Regions of operation. (e) Slope as a function of K in variable slope regions.

Two such half-slices connected at their output create an IMPI slice in Fig. 7(a). Conceptually, this IMPI can be seen as two slice-stack-based current source-sink duos whose (M-K) and K slices are enabled through EN-bits to achieve an interpolation factor of K/M [see Fig. 7(b)].

2) Regions of Operation: Similar to other IMPIs, this IMPI also undergoes variable and constant slope regions as described next. In region-I,  $[V_{\Phi_{in1}}, V_{\Phi_{in2}}] = [1, 1]$ , NMOS in both the half-slices are ON, and thus constant output slope is generated. In region-II,  $[V_{\Phi_{in1}}, V_{\Phi_{in2}}] = [0, 1]$ , both the top-left PMOS and the bottom-right NMOS are ON, generating a variable slope output. Similarly, constant slope occurs in region-III followed by variable slope in region-IV. The IMPI configuration in the four regions is shown in Fig. 7(c) and (d). In summary, two constant slope and two variable slope regions are generated in a  $T_{period}$  without using any control signals for region discrimination. Also, no retention or reset regions occur for this IMPI.

3) Slope as a Function of K in Variable Slope Regions: In the variable-slope regions, slopes are a function of K. As illustrated in Fig. 7(e), a current  $(M-2K) \cdot I_u$  flows into the capacitor in region-II. It can be observed that as K is linearly varied, the  $S_{code}$  also changes linearly by the (2K/M) factor. These linear increments in slope lead to linear voltage increments at the end of region-II forming linearspaced voltage pedestals. Constant slopes starting from voltage pedestals in region-III will be linear-spaced in time, providing high phase linearity.

# C. Design Considerations

This PI is designed such that the triangular output voltage swing is ~60% VDD for linearity considerations [see Fig. 8(a)]. Our IMPI half slice can be represented as a current  $I_u$  that periodically charges and discharges an  $R_p || C$  load [unlike Fig. 6(a)], where  $R_p$  and C are the slice output resistance and output capacitance, respectively, and  $C = C_O/M$ . For a specific  $R_p C$  product, assume that the output voltage V rises from  $0.2I_u R_p$  to  $0.8I_u R_p$  in  $T_{period}/2$ . Applying Kirchhoff's Current Law, and assuming  $[I_u - (V/R_p)] = k$ , we can write

$$\int_{0}^{T_{\text{period}}/2} \left(-\frac{1}{R_{p}C}\right) dt = \int_{0.8I_{u}}^{0.2I_{u}} \frac{1}{k} dk.$$
 (2)

On solving (2), we get the following expression:

$$R_p C = \frac{T_{\text{period}}}{2 \cdot \ln 4} \approx 0.36 \cdot T_{\text{period}}.$$
 (3)

Thus, the  $R_pC$  product should be roughly 36% of the clock time period for attaining 60% VDD swing, e.g., 25.7 ps for a 14-GHz clock.

The IMPI design methodology is as follows. We aim to construct the triangular output waveform at frequency  $f = 1/T_{\text{period}}$  with a swing  $V_{\text{pp}} = \Delta V = 60\%$  of VDD. To obtain



Fig. 8. High-speed IMPI. (a) Output swing considerations. (b) Reducing DCD generation from the PN strength-mismatch.

the lowest power, the smallest current, I, and smallest capacitor,  $C_O$ , should be used for achieving  $\Delta V$  swing where I is the current in the constant slope region;  $I = M \cdot I_u$ . The IMPI power dissipation, P, is given by

$$P = C_O \cdot \text{VDD} \cdot \Delta V \cdot f. \tag{4}$$

To minimize the power at a particular frequency, the smallest possible  $C_O$  should be used for achieving  $\Delta V$ , which is realized using only the output parasitics of the IMPI slices and the C2C input load. Then the appropriate  $I_u$  value is chosen to achieve the  $\Delta V$  swing using a cascode transistor and then increasing the values of R in Fig. 6(c). Increasing the R decreases  $I_u$ , thus reducing the IMPI output swing. For the R value for which the output swing reaches  $\sim \Delta V$ , the overall PI (IMPI and C2C combination) provides optimally good linearity and is chosen for the IMPI design.

1) PI Output Swing Considerations: A voltage swing control is not implemented. Consequently, the swing changes by  $\pm 15\%$  VDD with the process variation (SS/FF corners), but it does not lead to significant phase linearity degradation, as explained next.

The overall phase linearity is achieved by optimizing phase linearity both from the PI core and the C2C. Large swings exacerbate PI linearity, and small swings aggravate C2C AM–PM distortion, as elaborated further. As PI output swing increases, the transistors go deeper into the triode regions, and the voltage slope variation increases, becoming worse at the swing extremes. This slope variation affects the linearity of slopes in the variable-slope regions, which are situated at the swing extremes, producing phase nonlinearity.

The C2C serves as a comparator for PI outputs. For the maximum linearity, the interpolated PI waveforms should have the same shapes within the critical comparison window [22]. Consequently, for the PI swings smaller than the comparison window, amplitude variations (occurring in the variable slope regions) cause AM–PM distortion from C2C, leading to phase nonlinearity. The PI phase linearity improves when the output voltage swing decreases, but the C2C AM–PM nonlinearity worsens. Alternatively, when the PI output swing increases, the PI phase linearity deteriorates, but C2C AM–PM nonlinearity reduces. The C2C AM–PM nonlinearity reduces because, for

the same percentage variation in IMPI amplitude swings, a higher IMPI swing results in considerably less C2C AM–PM distortion [3]. Thus, this trade-off between PI and C2C serves as a compensation method. Notably, implementing a swing calibration technique will improve the PI phase linearity with process variations.

The maximum peak-to-peak amplitude variation for ideal PWL signals is half the maximum signal swing value. However, the amplitude variation for the actual PWL signals is comparably lower, and the resultant C2C AM–PM nonlinearity is alleviated because: 1) the pointed waveform tips at extremes are tapered off due to limited bandwidth at IMPI output, reducing the amplitude variation, and 2) the amplitude variation among the fundamental harmonic of PWL signals is even small.

2) Reducing DCD Generation From PN Strength-Mismatch: In a VMPI, a strength mismatch between PN leads to asymmetrical rise-fall times, resulting in DCD. In contrast, for this IMPI, a stronger (weaker) PMOS than NMOS results in an increase (decrease) of the PI average output level, and the waveform moves up (down), as shown in Fig. 8(b). The AC coupling capacitor,  $C_{ser}$ , discards the average output (and its DC deviation), preventing DCD generation. Likewise, the average output voltage change due to SF (FS) corner is also discarded. SF (FS) corner degrades the linearity slightly for the rising (falling) edge while improving the linearity for the falling (rising) edge. Cross-coupling inverters between the complementary PI chains even out this differential linearity change between differential edges. Moreover, since the resistor-biased inverter evaluates the edges by its own  $V_{\rm th}$ , the process variation of  $V_{\rm th}$  is also not an issue since the average PI output always aligns with inverter  $V_{\text{th}}$  at its input node,  $V_{C_{\text{ser}}}$ .

# D. Dynamic Operation of PI

In plesiochronous clocking, the PI rotates to accumulate the frequency difference  $\Delta f$  between the TX and the RX [see Fig. 9(a)]. To achieve this  $\Delta f$ , the codes are updated to increment/decrement the output phase at a frequency of FreqPI\_update. For an *N*-bit PI, for a single LSB phase jump/update,  $\Delta f = \text{FreqPI}_{update}/2^{N}$ . More LSBs need to jump

MISHRA et al.: IMPROVING LINEARITY IN CMOS PIs



Fig. 9. High-speed IMPI. (a) Dynamic PI operation for plesiochronous clocking. (b) Alleviating dynamic nonlinearity from the PN mismatch.

per PI update to accumulate higher ppm, and  $\Delta f$  is =  $N_{\text{LSB}}$  · Freq<sub>PI\_update</sub>/2<sup>N</sup>, where  $N_{\text{LSB}}$  is LSBs jumped/update.

1) Linearity Degradation Mechanisms for Static and Dynamic PI Operation and Its Mitigation: Linearity degradation occurs through three prominent mechanisms in a PI. The mechanisms and mitigation techniques employed are as follows.

a) Phase errors in input quadrature signals: These errors are removed by: 1) I-Q correction [27]; 2) DCD correction [28]; and 3) path matching of input phases in layout. Since the quadrature inputs are available, only a small delay tuning range in the Q path is required to mitigate the I-Qphase error in the input clocks (a slight difference from [27]). The I-Q correction loop runs in the background and reduces the quadrature phase error to <150 fs. The PI inputs are railto-rail signals having rise/fall times of about 18% of  $T_{period}$ . Increasing the rise/fall time by more than 25% of  $T_{period}$ degrades the phase linearity because the constant and variable region boundaries become less distinct, and the regions overlap increases.

*b) Phase nonlinearity in the PI:* To address it, a high linearity IMPI technique is implemented in this work.

In an ideal scenario where the IMPI works as excellent current sources and all the slices are identical (with no mismatch), the interpolation error will be zero for linear weights. However, some phase nonlinearity does exist for this IMPI. The variation in the slope from the desired linear slope shapes in the variable and constant slope regions is the primary source of its phase nonlinearity. A perfect IMPI operation requires ideal current sources having infinite output resistance,  $r_{out}$ , and capable of sustaining arbitrary large voltage swings. However, imperfect current sources realized with stacked transistors-based implementation suffer from a finite  $r_{out}$ , variation in  $r_{out}$  with signal swing, and transition into triode regions for considerably large signal swings. Furthermore, in a stacked implementation, the charging and discharging of all internal source/drain nodes also affect the linear slope shapes to some extent.

The clock inputs at the gates of  $M_1$  and  $M_2$  are isolated from the  $V_X$  through a transistor and a series resistor. Therefore,  $M_1/M_2$   $C_{gd}$  coupling does not affect the PI output linearity. Furthermore, the inactive slice-stacks in the IMPI\_2x are in the opposite phase to the active slice-stacks and cancel some of the coupling from the active stacks' inputs to their common output node  $V_X$ . Also, since the IMPI clock inputs are strongly driven (rail-to-rail) by the inverter buffers, the inputs are largely unaffected by the weak coupling feedback from the PI output.

*c) Phase errors in EN-bits:* These are mitigated by: 1) minimizing EN-bits path length, achieved by placing the decoder close to the PI core; 2) resampling EN-bits; and 3) path matching of the EN-bits in layout.

A flip-flop clocked by the PI update clock is interfaced with the EN-bits before reaching the IMPI core. The flip-flop outputs are therefore cleaned of the transition time mismatch among the EN-bits due to reasons such as circuit-delay mismatch or path-length mismatch. Next, the path length of EN-bits from the flip-flop output to the IMPI slices are matched in the layout to the best extent possible.

The EN-signals are relatively low-speed signals switching at 1 GHz and therefore have fast rise/fall times  $\sim 10\%$  of  $T_{\text{period}}$ . A higher rise/fall time is beneficial for phase linearity if EN-bits are slightly mismatched in their transition time. However, resampling significantly relaxes the EN-bits rise/fall times considerations for the PI dynamic linearity.

While the techniques mentioned in Sections III-D1a and III-D1b alleviate static nonlinearity, the technique Section III-D1c is also required with Sections III-D1a and III-D1b for mitigating dynamic nonlinearity.

2) Alleviating Nonlinearity From PN Mismatch: As discussed, the PN mismatch affects the average PI output. The PI slices are turned ON and OFF when the PI code is updated.



Fig. 10. INL/DNL measurement of the 9-bit IMPI at 13.3 GHz, and the spectrum of the PI output with 256-ppm clock modulation at 1-GHz update showing rotation spurs.

The cumulative PN mismatch of the ON slices determines the average PI output. Three cases (1), (2), and (3) having different PN mismatch profiles of the slice-stacks connected to  $V_{\Phi_{in1}}$  and  $V_{\Phi_{in2}}$  are shown in Fig. 9(b) to explain this effect clearly. As the PI rotates, the PN mismatch manifests as code-dependent average voltage variation, creating an AC signal of small magnitude composed of the harmonics of  $\Delta f$ . This low-frequency signal adds to the high-frequency PI output and periodically modulates its threshold-crossing, leading to DCD at the node  $V_{pi_se}$ . The waveform shape of this low-frequency signal depends on the PN mismatch pattern and is different for each case. Notably, the mismatch pattern consisting of interspersed PMOS-strong and NMOSstrong slices [as in case (3)] predominantly generates higher harmonics of  $\Delta f$  with rotation, albeit of lower amplitude due to the frequent PN mismatch cancellation. The waveform in case (4) represents a random mismatch scenario. All four cases see the same high pass CR filter created by  $C_{ser}$ and  $R_{\rm fb}$  (providing a 3-dB bandwidth of ~150 MHz at the node  $V_{C \text{ ser}}$ ), suppressing these low-frequency signals while allowing the high-frequency PI output clock to pass through with minimal attenuation, as illustrated in Fig. 9(b). The R in the CR filter is equal to  $R_{\rm fb}/(1 + A)$ , where -A is the open loop voltage gain of the inverter. Thus, dynamic DCD and resultant phase errors in outputs are prevented.

# **IV. MEASUREMENT RESULTS**

For the DNL measurements, the control code is swept, and the output is measured against a constant reference clock. A DNL<sub>pp</sub> of  $1.4^{\circ}$  and an INL<sub>pp</sub> of  $2.4^{\circ}$  is obtained, as shown in Fig. 10. The static linearity measurements were performed using a Keysight N1000A DCA-X wide-band oscilloscope. The TX output from one lane is provided at the DCA inputs, and another lane's TX output is attached to the trigger input of the DCA. The PI code for the DCA inputs is swept, and its zero-crossing time is measured against the trigger reference kept at a constant phase. An averaging of 256 times per PI code was performed to find the mean zero-crossing time difference. For dynamic nonlinearity measurements, the rotation spurs are measured when the PI is operating at 13.3 GHz. The PI codes are updated at 1 GHz to produce a 256-ppm offset, which corresponds to 3.4 MHz. The fourth harmonic rotation spur at 13.6-MHz offset is -56.21 dBc which shows excellent IMPI linearity for interpolating within a quadrant. The integrated rotation spur (IRS) in dBc can be found using the following equation:

IRS = 
$$10 \cdot \log_{10} \sum_{n=1}^{n=N} 10^{\left(\frac{A_n}{10}\right)} + 3$$
 (5)

where  $A_n$ 's are the spurs in dBc in one sideband, and N is the number of spurs (see the Appendix). The IRS is -42.6 dBc in the measurements. A maximum frequency offset of 256 ppm was fixed at the CDR architecture level, which limited the maximum ppm to 256 in measurements and is not a fundamental limitation for this IMPI.

Output spurs were measured for eight lanes showing laneto-lane variation in Fig. 11(a) and (b). The worst spur spans from -45.5 to -52.3 dBc. The IRS is at -39.5 dBc in the worst case, and the best is -43 dBc. The standard deviation of the INL<sub>pp</sub> and DNL<sub>pp</sub> is 50 fs and 15 fs, respectively, for Monte-Carlo simulations for 100 runs. The maximum measured integration spur variation of 120 fs is obtained from the measurements of more than 100 chips. The measured PI phase noise is shown in Fig. 12, and the RMS-integrated jitter from 3-MHz to 3-GHz integration bandwidth is 71 fs. The PI is designed in a 5-nm finFET process, and it occupies an area of 0.006 mm<sup>2</sup>. The die micrograph is shown in Fig. 13.

This PI is designed to work at 14 GHz. It also works at 9 GHz but with some INL degradation (INL<sub>pp</sub> ~ 5°) because the IMPI swing increases at a lower frequency, pushing the IMPI transistors deeper into triode regions and deteriorating its functionality as a good current source. To operate at a lower frequency while maintaining a high phase linearity, a capacitor bank can be implemented at node  $V_X$  to maintain a near-constant value of  $\Delta V$ . However, it was not implemented in this design, as supporting a lower frequency was not required.

JSSC'17 [7] ISSCC'18 [1] ISSCC'19 [2] JSSC'21 [3] ISSCC'22 [17] This work PI architecture CMPI VMPI Injection-locked CMPI CMPI IMPI (Offset combination) phase rotator # of input phase needed 8 8 8 4 Δ 16nm FinFET 65nm CMOS Technology 28nm FDSO 7nm FinFFT 65nm CMOS 5nm FinFET Supply Voltage (V) 1 1.2/0.88 1.2 1.2 1.2 0.75 Resolution (# Bits) 8 7 7 7 7 2.4 @ 13.3 GHz 4.5 1.5 - 2.6 4 @ 11 GHz INL<sub>pp</sub> (°) 5.1 5 4.1 @ 11 GHz DNL<sub>pp</sub> (°) 3.4 3.2 1.4 @ 13.3 GHz 2.3 1.4 - 26 @ 14GHz 7.2 @ 11GHz 24.8 @ 16GHz 11.4 @ 7GHz 7.35 @ 7GHz 9.5 @ 7GHz Power (mW) Frequency (GHz) 2 - 11 4 - 16 2 - 7 5-8 3.5 - 11 9 - 14 254 @ 7GHz Integrated Jitter (fsrms) 139\* @ 8GHz 143 @ 16GHz 83.9 @ 7GHz 58.5 @ 7GHz 71 @ 13.3GHz (10 kHz - 1GHz) (100 kHz - 8GHz) (10 kHz - 1GHz) (3 MHz - 3GHz) (4 MHz - 3GHz) (10 kHz - 1GHz) Integrated rotation spurs N/A N/A -39.4 @ 7GHz -33.9 @ 7GHz -41.7 @ 7GHz -42.6 @ 13.3GHz (dBc) 1301ppm 1000ppm -1429ppm 256ppm 0.022 0.105\*\* 0.0216 0.024 Area (mm<sup>2</sup>) 0.0043 0.006

 TABLE II

 Comparing High-Speed IMPI With State-of-the-Art PIs

\*Only ILRO jitter \*\*Includes LCVCO and ILOSC



Fig. 11. Measurements for different lanes showing (a) worst spurs and (b) integrated spurs.



Fig. 12. Measured phase noise of the PI. RMS-integrated jitter (3-MHz to 3-GHz) is 71 fs @ 13.3 GHz.

The role of resolution and linearity for the DJ of an *N*-bit PI is elucidated. The normalized PI DJ with respect to  $T_{\text{period}}$  is defined as *F* (in degree), which is a constituent of the PI jitter budget. The PI worst case DJ, PI<sub>DJ\_wc</sub>, is a sum of LSB time period,  $T_{\text{LSB}}$ , and the absolute value of peak-to-peak INL, INL<sub>pp</sub> [29]

$$F = \frac{\text{PI}_{\text{DJ}}(\text{fs})}{T_{\text{period}}(\text{fs})} \times 360^{\circ}$$
(6)

$$PI_{DJ_wc} = T_{LSB} + |INL_{pp}|$$

$$F_{wc} = \frac{PI_{DJ_wc}(fs)}{T_{period}(fs)} \times 360^{\circ}$$

$$= \left(\frac{T_{LSB}(fs)}{T_{LSB}(fs)} + \frac{|INL_{pp}(fs)|}{T_{period}(fs)}\right) \times 360^{\circ}$$
(7)

$$= \left(\frac{1}{2^N} + \frac{|\text{INL}_{pp}(\text{fs})|}{T_{period}(\text{fs})}\right) \times 360^\circ.$$
(8)



Fig. 13. Die micrograph in 5-nm finFET process.



Fig. 14. Worst case DJ and  $F_{wc}$  versus clock time period.

The worst case normalized jitter,  $F_{wc}$  (in degree), in (8) is formed by two constituent terms. The first term,  $T_{LSB}(fs)/T_{period}(fs)$ , indicates the quantization error  $(1/2^N)$ , and the second term,  $|INL_{pp}(fs)|/T_{period}(fs)$ , signifies the nonlinearity contribution toward  $T_{period}$ . In this work, the overall worst case normalized jitter is reduced from: 1) the first term by increasing N and implementing a 9-bit resolution and 2) the second term by implementing a high linearity IMPI technique. Fig. 14 shows the comparison of the worst case jitter versus time period for the state-of-the-art PIs. This IMPI achieves the smallest PI<sub>DJ wc</sub> of 0.66 ps and the lowest  $F_{wc}$  of 3.1°.

Table II compares this IMPI with prior works. Briefly, this PI works with only four input phases, provides a higher

resolution of 9 bits, and achieves an excellent fractional  $INL_{pp}/DNL_{pp}$ , with low noise and low IRS while consuming a small area.

# V. CONCLUSION

Although an IMPI is fundamentally better in phase linearity than a CMPI/VMPI, its prior arts face challenges for implementation complexity and high-frequency operation, limiting its widespread adoption. This IMPI overcomes the limitations of IMPI prior arts by supporting the high-frequency and dual-edge operation, and eliminating control, calibration, and biasing circuits. Further, it avoids slew, harmonic filtering, and tuning circuits usually required for CMPI/VMPI. The presented architecture lends to simple, low power, high resolution, and compact implementation while providing improved DCD performance and low jitter essential for small BER as the sampling time margin shrinks with higher data rates. The superior static and dynamic nonlinearity performance of this IMPI in a 5-nm finFET places it at the forefront for implementation in advanced technology nodes.

# Appendix

# DERIVATION OF IRS

Let S be the sinusoidal carrier signal of amplitude  $B_S$ , and  $S1, S2, \ldots, SN$  are the spurs in one sideband representing sinusoids of amplitude  $B_{S1}, B_{S2}, \ldots, B_{SN}$ , respectively. The power of a sinusoidal carrier signal is given by  $B_S^2/2$ , which in dB can be written as

$$P_{\rm dB_S} = 10 \log_{10} \left( B_S^2 / 2 \right). \tag{9}$$

Similarly, the power in the spur signal, Sn, is given by

$$P_{\rm dB_{Sn}} = 10 \log_{10} \left( B_{\rm Sn}^2 / 2 \right). \tag{10}$$

Power in dBc of the spur Sn is therefore

$$A_n = P_{\rm dB_{Sn}} - P_{\rm dB_S} = 10 \log_{10} \left( B_{\rm Sn}^2 / B_S^2 \right)$$
(11)

which can be rearranged to

$$\frac{B_{\rm Sn}^2}{B_{\rm S}^2} = 10^{\left(\frac{A_n}{10}\right)}.$$
 (12)

The total power of all spurs,  $P_t$ , can be written as

$$P_t = \frac{B_{S1}^2}{2} + \frac{B_{S2}^2}{2} + \dots + \frac{B_{SN}^2}{2} = \sum_{n=1}^N \frac{B_{Sn}^2}{2}$$
(13)

which, in dB, is

$$P_{\rm dB_{\it t}} = 10\log_{10}\sum_{n=1}^{N}\frac{B_{\rm Sn}^2}{2} \tag{14}$$

and, in dBc, is

$$P_{\text{dBc}_t} = P_{\text{dB}_t} - P_{\text{dB}_S} = 10 \log_{10} \sum_{n=1}^N \frac{B_{\text{Sn}}^2}{B_S^2}.$$
 (15)

Substituting (12) in this equation gives

$$P_{\text{dBc}_t} = P_{\text{dB}_t} - P_{\text{dB}_s} = 10 \log_{10} \sum_{n=1}^N 10^{\left(\frac{A_n}{10}\right)}.$$
 (16)

Assuming the same spur levels in the other sideband, we express IRS as  $P_{\text{dBc}_t} + 3$  dB, therefore, leading to (5).

# ACKNOWLEDGMENT

The authors would like to thank G. Deliyannides, W. Lye, A. Masnadi, P. Masoumi, Y. Meng, S. Lightbody, at Maxlinear, Burnaby, BC, Canada and Z. Ning at Maxlinear, Carlsbad, CA, USA, for their support for this work. They would also like to thank M. Madiseh at Cisco for guidance, NSERC and Intel for financial support, CMC Microsystems for access to tools, Rohde & Schwarz for Phase Noise Analyzer, and Mentor Graphics for Analog FastSPICE.

#### REFERENCES

- S. Chen et al., "A 4-to-16 GHz inverter-based injection-locked quadrature clock generator with phase interpolators for multi-standard I/Os in 7 nm FinFET," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2018, pp. 390–392.
- [2] Y.-C. Huang and B.-J. Chen, "An 8b injection-locked phase rotator with dynamic multiphase injection for 28/56/112 Gb/s SerDes application," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2019, pp. 486–488.
- [3] Z. Wang, Y. Zhang, Y. Onizuka, and P. R. Kinget, "Multi-phase clock generation for phase interpolation with a multi-phase, injection-locked ring oscillator and a quadrature DLL," *IEEE J. Solid-State Circuits*, vol. 57, no. 6, pp. 1776–1787, Jun. 2022.
- [4] S. Shekhar et al., "Strong injection locking in low-Q LC oscillators: Modeling and application in a forwarded-clock I/O receiver," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 56, no. 8, pp. 1818–1829, Aug. 2009.
- [5] T. O. Dickson et al., "A 1.8 pJ/bit 16×16 Gb/s source-synchronous parallel interface in 32 nm SOI CMOS with receiver redundancy for link recalibration," *IEEE J. Solid-State Circuits*, vol. 51, no. 8, pp. 1744–1755, Aug. 2016.
- [6] M. Pozzoni et al., "A multi-standard 1.5 to 10 Gb/s latch-based 3-tap DFE receiver with a SSC tolerant CDR for serial backplane communication," *IEEE J. Solid-State Circuits*, vol. 44, no. 4, pp. 1306–1315, Apr. 2009.
- [7] E. Monaco, G. Anzalone, G. Albasini, S. Erba, M. Bassi, and A. Mazzanti, "A 2–11 GHz 7-bit high-linearity phase rotator based on wideband injection-locking multi-phase generation for high-speed serial links in 28-nm CMOS FDSOI," *IEEE J. Solid-State Circuits*, vol. 52, no. 7, pp. 1739–1752, Jul. 2017.
- [8] P. K. Hanumolu, V. Kratyuk, G.-Y. Wei, and U.-K. Moon, "A subpicosecond resolution 0.5–1.5 GHz digital-to-phase converter," *IEEE J. Solid-State Circuits*, vol. 43, no. 2, pp. 414–424, Feb. 2008.
- [9] A. Cevrero et al., "A 60 Gb/s 1.9 pJ/bit NRZ optical-receiver with low latency digital CDR in 14 nm CMOS FinFET," in *Proc. IEEE Symp. VLSI Circuits*, Jun. 2017, pp. C320–C321.
- [10] A. K. Mishra, Y. Li, P. Agarwal, and S. Shekhar, "A 9b-linear 14 GHz integrating-mode phase interpolator in 5 nm FinFET process," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2022, pp. 1–3.
- [11] L. Ye, J. Chen, L. Kong, E. Alon, and A. M. Niknejad, "Design considerations for a direct digitally modulated WLAN transmitter with integrated phase path and dynamic impedance modulation," *IEEE J. Solid-State Circuits*, vol. 48, no. 12, pp. 3160–3177, Dec. 2013.
- [12] R. Kreienkamp, U. Langmann, C. Zimmermann, T. Aoyama, and H. Siedhoff, "A 10-Gb/s CMOS clock and data recovery circuit with an analog phase interpolator," *IEEE J. Solid-State Circuits*, vol. 40, no. 3, pp. 736–743, Mar. 2005.
- [13] L.-M. Lee and C.-K. Yang, "Phase correction of a resonant clocking system using resonant interpolators," in *Proc. IEEE Symp. VLSI Circuits*, Jun. 2008, pp. 170–171.
- [14] H. Won et al., "A 0.87 W transceiver IC for 100 gigabit Ethernet in 40 nm CMOS," *IEEE J. Solid-State Circuits*, vol. 50, no. 2, pp. 399–413, Feb. 2015.
- [15] G. R. Gangasani et al., "A 16-Gb/s backplane transceiver with 12-tap current integrating DFE and dynamic adaptation of voltage offset and timing drifts in 45-nm SOI CMOS technology," *IEEE J. Solid-State Circuits*, vol. 47, no. 8, pp. 1828–1841, Aug. 2012.

- [16] P. A. Francese et al., "A 16 Gb/s 3.7 mW/Gb/s 8-tap DFE receiver and baud-rate CDR with 31 kppm tracking bandwidth," *IEEE J. Solid-State Circuits*, vol. 49, no. 11, pp. 2490–2502, Nov. 2014.
- [17] Z. Wang and P. R. Kinget, "A 65 nm CMOS, 3.5-to-11 GHz, less-than-1.45 LSB-INL<sub>pp</sub>, 7b twin phase interpolator with a wideband, low-noise delta quadrature delay-locked loop for high-speed data links," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2022, pp. 292–294.
- [18] C.-F. Liang, S.-C. Hwu, and S.-I. Liu, "A 10 Gbps burst-mode CDR circuit in 0.18 μm CMOS," in *Proc. IEEE Custom Integr. Circuits Conf.* (CICC), Sep. 2006, pp. 599–602.
- [19] M. Erett et al., "A 0.5–16.3 Gbps multi-standard serial transceiver with 219 mW/channel in 16-nm FinFET," *IEEE J. Solid-State Circuits*, vol. 52, no. 7, pp. 1783–1797, Jul. 2017.
- [20] S. Kumaki, A. H. Johari, T. Matsubara, I. Hayashi, and H. Ishikuro, "A 0.5 V 6-bit scalable phase interpolator," in *Proc. IEEE Asia Pacific Conf. Circuits Syst.*, Dec. 2010, pp. 1019–1022.
- [21] M.-S. Chen, A. A. Hafez, and C.-K. K. Yang, "A 0.1–1.5 GHz 8-bit inverter-based digital-to-phase converter using harmonic rejection," *IEEE J. Solid-State Circuits*, vol. 48, no. 11, pp. 2681–2692, Nov. 2013.
- [22] J. Z. Ru, C. Palattella, P. Geraedts, E. Klumperink, and B. Nauta, "A high-linearity digital-to-time converter technique: Constant-slope charging," *IEEE J. Solid-State Circuits*, vol. 50, no. 6, pp. 1412–1423, Jun. 2015.
- [23] A. Agrawal, J. F. Bulzacchelli, T. O. Dickson, Y. Liu, J. A. Tierno, and D. J. Friedman, "A 19-Gb/s serial link receiver with both 4-tap FFE and 5-tap DFE functions in 45-nm SOI CMOS," *IEEE J. Solid-State Circuits*, vol. 47, no. 12, pp. 3220–3231, Dec. 2012.
- [24] T. O. Dickson et al., "A 1.4 pJ/bit, power-scalable 16 × 12 Gb/s sourcesynchronous I/O with DFE receiver in 32 nm SOI CMOS technology," *IEEE J. Solid-State Circuits*, vol. 50, no. 8, pp. 1917–1931, Aug. 2015.
- [25] S. Sievert et al., "A 2 GHz 244 fs-resolution 1.2 ps-peak-INL edge interpolator-based digital-to-time converter in 28 nm CMOS," *IEEE J. Solid-State Circuits*, vol. 51, no. 12, pp. 2992–3004, Dec. 2016.
- [26] A. L. S. Loke et al., "Analog/mixed-signal design challenges in 7-nm CMOS and beyond," in *Proc. IEEE Custom Integr. Circuits Conf.* (CICC), Apr. 2019, pp. 1–8.
- [27] A. Cevrero et al., "A 100 Gb/s 1.1 pJ/b PAM-4 RX with dual-mode 1-tap PAM-4/3-tap NRZ speculative DFE in 14 nm CMOS FinFET," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2019, pp. 112–114.
- [28] T. Ali et al., "A 180 mW 56 Gb/s DSP-based transceiver for high density IOs in data center switches in 7 nm FinFET technology," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2019, pp. 118–120.
- [29] Z. Wang and P. R. Kinget, "A very high linearity twin phase interpolator with a low-noise and wideband delta quadrature DLL for high-speed data link clocking," *IEEE J. Solid-State Circuits*, early access, Aug. 18, 2022, doi: 10.1109/JSSC.2022.3197061.



Amit Kumar Mishra (Member, IEEE) received the B.Tech. degree in electronics and communication engineering from the Indian Institute of Information Technology, Jabalpur, India, in 2009, the M.Tech. degree from the Academy of Scientific and Innovative Research, New Delhi, India, in 2011, and the Ph.D. degree in electrical and computer engineering from The University of British Columbia, Vancouver, BC, Canada, in 2022.

From 2011 to 2015, he was a Scientist with CSIR-CEERI, Pilani, India, and worked on the design of

CMOS sensor signal conditioning and RF circuits. In 2019, he was an Intern with MaxLinear, Inc., Burnaby, BC, Canada, where he worked on high-speed clocking circuits. His research interests include analog and mixed-signal circuits for wireless transceivers and high-speed electrical and optical links.



**Yifei Li** (Member, IEEE) received the B.S. degree in microelectronics from the Harbin Institute of Technology, Harbin, China, in 2010, the M.S. degree in electrical engineering from Korea University, Seoul, South Korea, in 2012, with a focus on high-speed I/O, and the Ph.D. degree in electrical engineering from Iowa State University, Ames, IA, USA, in 2017, where he worked on concurrent multi-band RF power amplifiers.

Since 2017, he has been with MaxLinear, Inc., Carlsbad, CA, USA, where he has worked on clock

generation/distribution, PLL, TIA, and data converters for optical communication SoCs. His research interests include RF/mixed-signal circuits and systems for high-speed communications.



**Pawan Agarwal** (Member, IEEE) received the B.Tech. and M.Tech. degrees in electrical engineering from IIT Madras, Chennai, India, in 2009, with a thesis on data-converters, and the Ph.D. degree from Washington State University, Pullman, WA, USA, in 2017, with a focus on mm-wave phased arrays for small-cell applications and biomedical implantable systems.

He was with Applied Micro, Pune, India, from 2009 to 2011, and Applied Micro, Sunnyvale, CA, USA, in 2012, where he designed frequency

synthesizers and serializers for 100G Ethernet. He designed extremeperformance VCO for mm-wave Backhaul links in 2014 at Maxlinear, Inc., Carlsbad, CA, USA. He is currently developing transmitters for the next-generation cable modems at Maxlinear, Inc., and previously designed clocking, transmitters, drivers, and TIAs for 1.6T/800G Datacenters connectivity and power amplifiers for 5G communication. He has authored assorted IEEE articles.

Dr. Agarwal was a recipient of the IEEE Microwave Theory and Techniques Society (MTT-S) Graduate Fellowship Award, the MTT-S IMS Student Paper Competition Award, the Best Paper and Poster Award(s) from SRC Techcon and CDADIC, and the Washington State University Voiland College of Engineering and Architecture Outstanding Teaching Assistant Award. He humbly serves as a technical reviewer and a committee member for several IEEE journals and conferences.



Sudip Shekhar (Senior Member, IEEE) received the B.Tech. degree from IIT Kharagpur, Kharagpur, India, in 2003, and the Ph.D. degree from the University of Washington, Seattle, WA, USA, in 2008.

From 2008 to 2013, he was with the Circuits Research Laboratory, Intel Corporation, Hillsboro, OR, USA, where he worked on high-speed I/O architectures. He is currently an Associate Professor of electrical and computer engineering with The University of British Columbia, Vancouver, BC,

Canada. His current research interests include circuits for electrical and optical interfaces, frequency synthesizers, and wireless transceivers.

Dr. Shekhar was a recipient of the 2022 Schmidt Science Polymath Award, the 2022 UBC Killam Teaching Prize, the 2019 Young Alumni Achiever Award by IIT Kharagpur, and the 2010 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS Darlington Best Paper Award and a co-recipient of the 2015 IEEE Radio frequency IC Symposium Student Paper Award. He serves on the Technical Program Committee of the IEEE International Solid-State Circuits Conference (ISSCC) and served as a Distinguished Lecturer for the IEEE Solid-State Circuits Society from 2021 to 2022.