九州大学学術情報リポジトリ Kyushu University Institutional Repository

Development of Low-Power and High-Speed On-Chip Clock Distribution System

一橋,正寬

https://doi.org/10.15017/4060191

出版情報:九州大学, 2019, 博士(工学), 課程博士 バージョン: 権利関係:



Graduate School of Information Science and Electrical Engineering

Kyushu University, Fukuoka, Japan.

### Development of Low-Power and High-Speed On-Chip Clock Distribution System

by

Masahiro Ichihashi

A thesis submitted to the Graduate School of Information Science and Electrical Engineering in partial fulfilment of the requirements for the degree of

Doctor of Engineering

Kyushu University, Fukuoka, Japan.

2020

#### DEPARTMENT OF ELECTRONICS

# GRADUATE SCHOOL OF INFORMATION SCIENCE AND ELECTRICAL ENGINEERING

#### KYUSHU UNIVERSITY

Fukuoka, Japan

Thesis Title:

Development of Low-Power and High-Speed On-Chip Clock Distribution System

Prepared by: Masahiro Ichihashi

Supervisor : Professor Haruichi Kanaya, Dr. Eng.

Co-supervisor 1: Professor Kuniaki Yoshitomi, Dr. Eng.

Co-supervisor 2: Professor Kazutoshi Kato, Dr. Eng.

Date: March 2020



Graduate School of Information Science and Electrical Engineering

Kyushu University, Fukuoka, Japan.

#### To Whom It May Concern,

We hereby certify that this copy is a typical copy of the original Dr. Eng. (Doctor of Engineering) thesis of

#### Mr. Masahiro Ichihashi

Dissertation Title:

#### DEVELOPMENT OF LOW-POWER AND HIGH-SPEED

#### **ON-CHIP CLOCK DISTRIBUTION SYSTEM**

Supervisor,

Prof. Haruichi Kanaya, Dr. Eng.
Department of Electronics,
Graduate School of Information Science and Electrical Engineering,
KYUSHU UNIVERSITY
March, 2020.

### Acknowledgements

Firstly, I would like to express my sincere gratitude to my supervisor, Professor Haruichi Kanaya for the continuous support of my Ph.D research, for his patience and efforts to provide the best research environment. His guidance helped me in all the time of research and writing of this dissertation. His hospitality will be one of the most memorable experiences in my life.

Besides, I would also like to show my greatest appreciation to Masayuki Katakura in Sony LSI Design Inc. for his insightful comments, guidances and encouragements from various perspectives.

My sincere thanks also goes to all my laboratory members and friends, who provided me an opportunity to join their team. Many thanks to Mr. Shogo Harada who have always supported me in the designs, chip measurements in my research.

This dissertation would never have been completed without the sponsor from the Grant-in-Aid for Scientific Research from the Japan Society for the promotion of Science and support from the VLSI Design and Education Center (VDEC), The University of Tokyo, in collaboration with Cadence Design System, Inc., and Keysight Technologies.

Last but not the least, I would like to thank my family: my wife, son and daughter for supporting me throughout my life. Without their love and persistent helps, this dissertation would not have been possible.

### Abstract

Digital products such as smartphone, tablet, laptop, camera etc. are vital to modern societies. The integrated devices such as microprocessor, memory, transceiver and image sensor etc., operate based on some fundamental clocks. Hence, it is not too much to say that the fundamental clocks determine the overall performance of these digital devices. On-chip clock distribution systems play vital roles since it distributes the fundamental clocks to the whole chip. For instance, a global tree structure distributes the system clock to the whole chip and a differential signaling structure which is main scope of this dissertation distributes the I/O clock for high-speed serial links. However, these are the most power hungry block as well. A large portion of total power from 25% to 70% are dissipated by on-chip clock distributions in the case of microprocessors. With the increase of operating speed and length in recent digital systems, low-power and high-speed operation is becoming further challenging task due to the increase of jitter and power caused by multiple repeater stages. The goal of this dissertation is to propose the innovative solutions to overcome the above mentioned problems of on-chip clock distribution for high-speed serial links. The proposed bufferless LC resonant clock architecture directly drives on-chip clock distribution line without any buffers and repeaters. Thanks to the bufferless structure, the performance of the clock distribution is determined by the LC oscillator only. The proposed architecture is composed of three key features; Inductor, LC oscillator and on-chip transmission line design. The proposed inductor maximizes the performance of LC oscillator. The proposed LC oscillator mitigates the tradeoff of high-frequency, low-power operation and allows bufferless architecture. The proposed fully calculation-based on-chip transmission line modeling and optimization are able to find the optimized parameters such as metal width, space etc. instantaneously without any SPICE and EM (Electro-Magnetic) simulations. The proposed bufferless architecture which is directly connected to a 10-mm on-chip clock distribution line is fabricated in TSMC 0.18-µm 1-poly 6-metal CMOS process. The experimental results achieved 2.8-GHz oscillation frequency, 3.3-mA current consumption, -112.8 dBc/Hz phase noise which is comparable to the other state-of-theart *LC* oscillators in spite of the absence of buffers and repeaters.

### Contents

| Acknow  | wledgementsi                                           |
|---------|--------------------------------------------------------|
| Abstrac | et ii                                                  |
| Conten  | iii                                                    |
| List of | Figuresvi                                              |
| List of | Tablesix                                               |
| List of | Abbreviationsx                                         |
|         |                                                        |
| Chapte  | er 1 Introduction1                                     |
| 1.1     | Background and Motivation1                             |
| 1.2     | Research Objectives2                                   |
| 1.3     | Thesis Outline                                         |
|         |                                                        |
| Chapte  | er 2 On-Chip Clock Distribution Systems                |
| 2.1     | High-Speed Serial Link System                          |
| 2.2     | Inverter Chain                                         |
| 2.3     | Small Swing7                                           |
| 2.4     | Transmission Line Signaling8                           |
| 2.5     | <i>LC</i> Resonance                                    |
| 2.6     | The Proposed Structure9                                |
| 2.7     | Chapter Summary14                                      |
| Refe    | prences                                                |
|         |                                                        |
| Chapte  | er 3 Design of High-Frequency, Low-Coupling Inductor16 |
| 3.1     | Introduction                                           |
| 3.      | 1.1 Background16                                       |
| 3.      | 1.2 Previous Work17                                    |
| 3.      | 1.3 Objectives and Scope of This Study17               |
| 3.2     | Inductor Design17                                      |
| 3.3     | Simulation Results                                     |

| 3.4    | Mea  | asurement Results                         | .23 |
|--------|------|-------------------------------------------|-----|
| 3.4.   | 1    | Inductance, Quality Factor and Resistance | .23 |
| 3.4.   | 2    | Coupling Coefficient                      | .25 |
| 3.5    | Cha  | pter Summary                              | .28 |
| Refere | ence | ·S                                        | .29 |

| Chapter 4 | On-chip Transmission Line Modeling and Optimization      | 31 |
|-----------|----------------------------------------------------------|----|
| 4.1 In    | troduction                                               | 32 |
| 4.1.1     | Background                                               | 32 |
| 4.1.2     | Previous Work                                            | 33 |
| 4.1.3     | Objectives and Scope of This Study                       | 34 |
| 4.2 A     | Simplified, Fully Calculation-based RLC-model            | 35 |
| 4.2.1     | Capacitance Estimation                                   | 35 |
| 4.2.2     | Resistance Estimation                                    | 36 |
| 4.2.3     | Inductance Estimation                                    | 36 |
| 4.2.4     | Simplification Methodology                               | 37 |
| 4.2.5     | Calculation of Transmission Line Model                   | 40 |
| 4.3 M     | lodel Accuracy                                           | 42 |
| 4.3.1     | Test Bench Setup                                         | 42 |
| 4.3.2     | Results of Comparison and Analysis                       | 44 |
| 4.3.3     | Application to Advanced Processes and Higher Frequencies | 47 |
| 4.4 O     | ptimization Methodology                                  | 47 |
| 4.4.1     | Optimization                                             | 47 |
| 4.4.2     | Transient Analysis                                       | 51 |
| 4.5 M     | leasurement Results                                      | 51 |
| 4.6 Cl    | hapter Summary                                           | 54 |
| Reference | ces                                                      | 55 |

| 5.1 I                                                                    | ntroduction                                                                                                                                                                                                                                          | 59                                           |
|--------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------|
| 5.1.1                                                                    | Background                                                                                                                                                                                                                                           | 59                                           |
| 5.1.2                                                                    | Previous Work                                                                                                                                                                                                                                        | 62                                           |
| 5.1.3                                                                    | Objectives and Scope of This Study                                                                                                                                                                                                                   | 62                                           |
| 5.2 0                                                                    | Sircuit Descriptions and Characteristics                                                                                                                                                                                                             | 63                                           |
| 5.2.1                                                                    | Theory of Proposed LC Oscillator                                                                                                                                                                                                                     | 63                                           |
| 5.2.2                                                                    | Oscillation Frequency                                                                                                                                                                                                                                | 65                                           |
| 5.2.3                                                                    | Frequency Sensitivity and Voltage Swing                                                                                                                                                                                                              | 65                                           |
| 5.2.4                                                                    | Phase Noise                                                                                                                                                                                                                                          | 69                                           |
| 5.3 1                                                                    | est Chip Implementation                                                                                                                                                                                                                              | 70                                           |
| 5.4 F                                                                    | ost Simulation Results                                                                                                                                                                                                                               | 73                                           |
| 5.4.1                                                                    | Summary of Post-Layout Simulation Results                                                                                                                                                                                                            | 73                                           |
| 5.4.2                                                                    | Considerations on Post-Layout Simulation                                                                                                                                                                                                             | 74                                           |
| 5.5 S                                                                    | ystem Comparison                                                                                                                                                                                                                                     | 75                                           |
| 5.6 0                                                                    | hapter Summary                                                                                                                                                                                                                                       | 78                                           |
| Referen                                                                  | ices                                                                                                                                                                                                                                                 | 79                                           |
|                                                                          |                                                                                                                                                                                                                                                      |                                              |
| Chapter                                                                  | 5 A Low-Power, High-Speed Bufferless Clock Distribution System.                                                                                                                                                                                      | 81                                           |
| C <b>hapter</b><br>6.1 (                                                 | 5 A Low-Power, High-Speed Bufferless Clock Distribution System.                                                                                                                                                                                      | 81                                           |
| Chapter 6.1 C<br>6.2 T                                                   | <b>5 A Low-Power, High-Speed Bufferless Clock Distribution System</b> .<br>Dejectives and Scope of This Study<br>est Chip Implementation                                                                                                             | 81<br>81<br>82                               |
| Chapter<br>6.1 (<br>6.2 T<br>6.3 N                                       | <b>5 A Low-Power, High-Speed Bufferless Clock Distribution System</b> .<br>Dejectives and Scope of This Study<br>'est Chip Implementation<br>Ieasurement Results and Analysis                                                                        | 81<br>81<br>82<br>83                         |
| Chapter<br>6.1 (<br>6.2 T<br>6.3 M<br>6.3.1                              | <b>5 A Low-Power, High-Speed Bufferless Clock Distribution System</b> .<br>Objectives and Scope of This Study<br>'est Chip Implementation<br>Ieasurement Results and Analysis<br>Measurement Setup                                                   | 81<br>81<br>82<br>83<br>83                   |
| Chapter (<br>6.1 C<br>6.2 T<br>6.3 M<br>6.3.1<br>6.3.2                   | <b>5 A Low-Power, High-Speed Bufferless Clock Distribution System</b> .<br>Objectives and Scope of This Study<br>'est Chip Implementation<br>Ieasurement Results and Analysis<br>Measurement Setup<br>Measurement and Simulation Results             | 81<br>81<br>82<br>83<br>83<br>83             |
| Chapter (<br>6.1 (<br>6.2 T<br>6.3 N<br>6.3.1<br>6.3.2<br>6.3.3          | <b>5 A Low-Power, High-Speed Bufferless Clock Distribution System</b> .<br>Dejectives and Scope of This Study<br>Sest Chip Implementation<br>Ieasurement Results and Analysis<br>Measurement Setup<br>Measurement and Simulation Results<br>Analysis | 81<br>81<br>82<br>83<br>83<br>83<br>85<br>86 |
| Chapter (<br>6.1 (<br>6.2 T<br>6.3 N<br>6.3.1<br>6.3.2<br>6.3.3<br>6.4 ( | 6 A Low-Power, High-Speed Bufferless Clock Distribution System.<br>Dejectives and Scope of This Study<br>'est Chip Implementation<br>Ieasurement Results and Analysis<br>Measurement Setup<br>Measurement and Simulation Results<br>Analysis         | 81<br>81<br>82<br>83<br>83<br>85<br>86<br>86 |

| Chapte | r 7 Conclusions and Future Works | 91 |
|--------|----------------------------------|----|
| 7.1    | Conclusions                      | 91 |
| 7.2    | Future Works                     | 93 |

## **List of Figures**

| Fig. 2.1 High-speed serial link system                                                                 |
|--------------------------------------------------------------------------------------------------------|
| Fig. 2.2 Inverter chain architecture7                                                                  |
| Fig. 2.3 Small swing architecture7                                                                     |
| Fig. 2.4 Transmission line architecture                                                                |
| Fig. 2.5 <i>LC</i> resonance architecture9                                                             |
| Fig. 2.6 The proposed bufferless architecture10                                                        |
| Fig. 2.7 Research objective of $f_{SR}$ (self-resonant-frequency)11                                    |
| Fig. 2.8 Research objective of magnetic coupling11                                                     |
| Fig. 2.9 Research objective of symmetry for 8-shaped differential inductor11                           |
| Fig. 2.10 Research objectives of on-chip transmission line modeling and optimization.13                |
| Fig. 2.11 Research objectives of <i>LC</i> oscillator14                                                |
|                                                                                                        |
| Fig. 3.1 Differential <i>LC</i> oscillator16                                                           |
| Fig. 3.2 The proposed 8-shaped differential inductor                                                   |
| Fig. 3.3 Current distributions and magnetic fields                                                     |
| Fig. 3.4 Measurement configurations                                                                    |
| Fig. 3.5 Comparison between Fig. 3.4 (a) and Fig. 3.4 (b)21                                            |
| Fig. 3.6 Phase difference at Fig. 3.4 (c) configuration21                                              |
| Fig. 3.7 L, Q results at Fig. 3.4 (b) and Fig. 3.4 (d) configurations                                  |
| Fig. 3.8 Photograph of the proposed 8-shaped differential inductor and measurement configurations      |
| Fig. 3.9 Measurement and EM-simulation results                                                         |
| Fig. 3.10 Photograph of the four pair of 8-shaped differential inductor and measurement configurations |
| Fig. 3.11 Mutual inductance of transformer                                                             |
| Fig. 3.12 Comparison of <i>K</i> between EM-simulation and measurement results27                       |

| Fig. 4.1 Border line between <i>RC</i> and <i>RLC</i> -model                                       | 32 |
|----------------------------------------------------------------------------------------------------|----|
| Fig. 4.2 High-speed I/O clock distribution without repeaters                                       | 34 |
| Fig. 4.3 Capacitance model                                                                         | 36 |
| Fig. 4.4 Conversion process from a 5-wire GSGSG physical model to an equivalent single-ended model | 39 |
| Fig. 4.5 Comparison results between Fig. 4.4 (b) and Fig. 4.4 (e)                                  | 40 |
| Fig. 4.6 Proposed equivalent distributed <i>RLC</i> -model                                         | 40 |
| Fig. 4.7 Test bench for characterization of transmission line                                      | 43 |
| Fig. 4.8 Comparison results of Rall, Leff, Call, Tpd, Zin, Icc, V01, V31                           | 45 |
| Fig. 4.9 Optimization flowchart                                                                    | 49 |
| Fig. 4.10 Optimization result calculated by VBA programming                                        | 49 |
| Fig. 4.11 EM-simulation results in Case00 and Case05                                               | 50 |
| Fig. 4.12 Transient analysis for V31                                                               | 51 |
| Fig. 4.13 Photograph of the fabricated test chip                                                   | 52 |
| Fig. 4.14 Measurement setup                                                                        | 52 |

| Fig. 5.1 Conventional repeater-based clock distribution                       | 60 |
|-------------------------------------------------------------------------------|----|
| Fig. 5.2 Proposed directly driving (i.e. bufferless) clock distribution       | 61 |
| Fig. 5.3 Series to parallel conversion of inductor                            | 62 |
| Fig. 5.4 Proposed bufferless <i>LC</i> oscillator                             | 64 |
| Fig. 5.5 Simulation result of oscillation frequency at Fig. 5.4 (b) condition | 65 |
| Fig. 5.6 Test circuit to compare conventional and proposed structure          | 66 |
| Fig. 5.7 $\Delta C_L$ dependency of $f_{p1}$ at $\alpha_{TAP}=0.5$            | 67 |
| Fig. 5.8 $\Delta C_L$ dependency of $V_{out}$ at $\alpha_{TAP}=0.5$           | 67 |
| Fig. 5.9 $\alpha_{TAP}$ dependency at $\Delta C_L = 2.0 \text{pF}$            | 69 |
| Fig. 5.10 Phase noise at $\Delta C_L = 2 \text{pF}$                           | 69 |

| Fig. 5.11 Block diagram of the test chip      | 70 |
|-----------------------------------------------|----|
| Fig. 5.12 <i>LC</i> oscillator core circuit   | 71 |
| Fig. 5.13 Level shift (LS) circuit            | 71 |
| Fig. 5.14 50Ω-buffer (BUF) circuit            | 71 |
| Fig. 5.15 Chip layout                         | 72 |
| Fig. 5.16 Transient waveforms                 | 74 |
| Fig. 5.17 Test circuits for system comparison | 76 |

| Fig. 6.1 Circuit structure of the test chip                                         |                     |
|-------------------------------------------------------------------------------------|---------------------|
| Fig. 6.2 Photograph of the test chip                                                | 83                  |
| Fig. 6.3 Measurement setup                                                          | 84                  |
| Fig. 6.4 Probe station                                                              | 84                  |
| Fig. 6.5 Plots of the output spectrum and phase noise with LNA                      | 85                  |
| Fig. 6.6 The S11 smith chart of the 10-mm transmission line (input: 50 $\Omega$ , o | output: open)<br>88 |

### **List of Tables**

| Table 3.1 Design parameters of our proposed inductor                    | 20 |
|-------------------------------------------------------------------------|----|
| Table 3.2 Combinations of aggressor and victim in four pair of inductor | 26 |
|                                                                         |    |
|                                                                         |    |
| Table 4.1 Design parameters                                             | 35 |
| Table 4.2 Physical structures                                           | 43 |
| Table 4.3 Summary of absolute error of Fig. 4.8 (d) – (h)               | 46 |
| Table 4.4 Target specifications and constraints                         | 48 |
| Table 4.5 Measurement results of Case 00 and Case 08                    | 53 |
|                                                                         |    |

| Table 5.1 Theoretical differences between proposed and conventional | 64 |
|---------------------------------------------------------------------|----|
| Table 5.2 Design parameters                                         | 72 |
| Table 5.3 Summary of simulation results                             | 73 |
| Table 5.4 Process parameters extracted by ring-oscillator           | 76 |
| Table 5.5 System comparison results                                 | 77 |

| Table 6.1 Comparison between measurement and simulation results                    | 85 |
|------------------------------------------------------------------------------------|----|
| Table 6.2 Comparison results for the FoM and FoM <sub>A</sub> from various studies | 87 |

## **List of Abbreviations**

| AC   | - | Alternating Current                     |  |
|------|---|-----------------------------------------|--|
| CMOS | - | Complementary Metal-Oxide Semiconductor |  |
| CMRR | - | Common Mode Rejection Ratio             |  |
| DC   | - | Direct Current                          |  |
| DCO  | - | Digitally Controlled Oscillator         |  |
| EM   | - | Electro Magnetic                        |  |
| FO   | - | Fan Out                                 |  |
| LO   | - | Local Oscillator                        |  |
| PA   | - | Power Amplifier                         |  |
| PN   | - | Phase Noise                             |  |
| PSRR | - | Power Supply Rejection Ratio            |  |
| RF   | - | Radio Frequency                         |  |
| VCO  | - | Voltage Controlled Oscillator           |  |

## **Chapter 1**

### Introduction

#### **1.1 Background and Motivation**

This dissertation investigates on the development of low-power, high-speed onchip clock distribution architecture in VLSI systems. On-chip clock distribution is one of the most important functions in determining the overall performance. For instance, a global tree structure distributes the system clock to the whole chip and a differential signaling structure which is main scope of this dissertation distributes the I/O clock for high-speed serial links. However, with the recent increase of bandwidth and chip area, an on-chip clock distribution is becoming the power-hungry block as well. A large portion (25%~70%) of the total power can be dissipated in the case of microprocessors due to the large swing and capacitive load according to the recent researches.

Many works have been investigated for on-chip clock distributions such as inverter chain, small swing, transmission line and *LC* resonance. The repeater-based structures such as inverter chain, small swing are suffering from a huge amount of power and jitter with the increase number of repeater stages. Transmission line structure can potentially reduce the power and jitter due to the absence of the repeaters. However, the design methodology becomes complicated to extract the parasitic inductance which requires time-consuming Elector-Magnetic (EM) simulation. *LC* resonance structures can achieve a good power efficiency owing to the charge recycling mechanism. However, some buffer stages are still required due to the high frequency sensitivity of

conventional *LC* oscillators. Hence, conventional on-chip clock distribution structures have not completely solved the fundamental tradeoff between low-power and high-speed operation yet. A new innovative structure has been desirable.

In this dissertation, we propose a new bufferless on-chip clock distribution architecture which makes full use of both transmission line and *LC* resonance. The proposed structure can directly drive a 10-mm on-chip clock distribution line at 3-GHz from a *LC* oscillator without any buffers and repeaters. The following two major problems must be solved to realize the proposed system. First, the frequency sensitivity of *LC* oscillator must be reduced to directly drive a large capacitive load on the clock distribution line. Second, a simple methodology for on-chip transmission line modeling and optimization are necessary to find the best interconnect parameters.

#### **1.2 Research Objectives**

The goal of this dissertation is to develop a low-power, high-speed on-chip clock distribution architecture, which has the following key features that can be summarize as:

- (1) A high-frequency, high-symmetry and low-coupling differential inductor
  - The performance of *LC* oscillators is mainly determined by inductors. Although conventional differential inductors are -symmetry and low-coupling, the difficulty of high-frequency operation is the major drawback. The proposed differential inductor can achieve almost twice self-resonant frequency while keeping the high-symmetry and low-coupling features.
- (2) A simple methodology for on-chip transmission line modeling and optimization A simplified *RLC*-distributed model and optimization methodology without EMsimulation have been desirable for on-chip transmission line design. The proposed modeling methodology converts a five-wire of GSGSG physical structure to single-ended *RLC*-distributed model. This simplification makes an adoption of basic transmission line theory possible. The proposed optimization methodology can find the smallest metal width, space and structure that achieves

the lowest power consumption from the given target specifications such as propagation delay and output swing which significantly improves design time and quality.

(3) A low frequency sensitivity *LC* oscillator

Conventional *LC* oscillators have a fundamental tradeoff between low-power and high-frequency operation due to the high frequency sensitivity. The proposed *LC* oscillator can mitigate this tradeoff owing to the shared *LC* resonance mode between frequency tuning capacitor and the load of on-chip clock distribution line. This low frequency sensitivity feature makes bufferless on-chip clock distribution system possible.

#### 1.3 Thesis Outline

The dissertation is comprised of seven chapters to define and explain on the development of low-power, high-speed on-chip clock distribution line system.

Chapter 1 provides the background, motivation and objectives for this research.

Chapter 2 presents a brief introduction of existing on-chip clock distribution line systems for high-speed serial links such as inverter chain, low output swing, transmission line and *LC* resonance.

Chapter 3 proposes a novel high-frequency, low-coupling differential inductor with patterned ground shield. The concrete design methodology and comparison between EM-simulation and experimental results are presented. It also shows the experimental results of far-field magnetic coupling effect.

Chapter 4 proposes a fully calculation-based on-chip transmission line modeling and optimization methodology. The theory of simplification process and the accuracy of calculated model along with EM-simulation and experimental results are presented. The optimization algorithm using the proposed simplified model is also introduced. Chapter 5 proposes the theory of low frequency sensitivity *LC* oscillator. The comparison results between the theoretical calculation and SPICE simulation are discussed. It also compares and discusses the architecture of on-chip clock distribution line between conventional repeater-based and the proposed *LC* oscillator-based design.

Chapter 6 presents an experimental implementation of integrated system by organizing a bufferless LC oscillator that is directly connected to a 10-mm on-chip clock distribution line. It also compares the Figure-of-Merit (FoM) with the other state-of-the-art LC oscillators.

Chapter 7 concludes the research works. The contributions and future works are summarized.

## Chapter 2

### **On-Chip Clock Distribution Systems**

This section briefly reviews existing on-chip clock distribution systems for highspeed serial links such as inverter chain, dynamic small swing, static small swing, transmission line and *LC* resonance as backgrounds. Based on the brief study of pros and cons for existing systems, it also introduces the basic idea of our proposed on-chip clock distribution system.

#### 2.1 High-Speed Serial Link System

Fig. 2.1 shows an image of a high-speed serial link system. In Fig. 2.1, Tx and Rx represent transmitter and receiver, respectively. These chips operate based on some fundamental clocks. There are two different kinds of on-chip clock distribution systems. One is a global clock distribution which is used for system clocking as shown in Fig. 2.1 (A) and the other is high-speed I/O clock distribution which is used for high-speed serial links as shown in Fig. 2.1 (B). The parallel data are converted to serial data at Fig. 2.1 (C) due to the limited number of I/O pins and output in differential signaling mode by transmitter at Fig. 2.1 (D). This differential signals are transmitted through a package and board as shown in Fig. 2.1 (E) and receiver blocks amplify the received data from small swing to  $V_{DD}$  swing at Fig. 2.1 (F). After that, these serial data are converted to parallel data at Fig. 2.1 (H). Thus, high-

#### Chapter 2

speed I/O clock distribution shown in Fig. 2.1 (B) requires the highest frequency due to the parallel to serial conversion. In this dissertation, we will focus on this high-speed I/O clock distribution and oscillator design.



Fig. 2.1 High-speed serial link system

#### 2.2 Inverter Chain

Fig. 2.2 shows inverter chain on-chip clock distribution system [1]. Although this system is very simple, there are several problems as follows. First, since power consumption is proportional to power supply  $V_{DD}^2$ , the large output swing  $(0 \sim V_{DD})$  makes power consumption worse. Second, the high-speed operation is difficult due to the large swing. Third, the power efficiency is low since the current is wasted during logic high to low transition. Forth, jitter is large due to the increase number of repeaters at high-speed, long distance condition. Fifth, it is vulnerable to both common mode and power supply noise due to the single-ended structure. Thus, this inverter chain structure is not suitable for high-speed, long distance on-chip clock distribution.



Fig. 2.2 Inverter chain architecture

#### 2.3 Small Swing

Fig. 2.3 shows a small swing signaling structure. This structure is typically realized by CML (Current Mode Logic) [1]. Since the circuit is organized by differential topology, good CMRR (Common Mode Rejection Ratio) and PSRR (Power Supply Rejection Ratio) can be obtained. In addition, the reduced swing makes high-frequency operation possible. However, the power consumption caused by the constant current is a major drawback. Thus, this structure is not suitable for long distance clock distribution. Furthermore, the minimum swing is limited by the sensitivity of receivers. Therefore, it is important that the total power consumption for both repeaters and receivers must be taken into consideration.



Fig. 2.3 Small swing architecture

#### 2.4 Transmission Line Signaling

Fig. 2.4 shows the on-chip clock distribution system using a transmission line [2], [3]. This system can be applied when the wavelength is not negligible against the length of on-chip clock distribution line. Due to the recent high-speed and long distance trend, this structure is becoming more important these days. The advantages of transmission line are much better jitter performance due to the absence of repeaters and possibility of low-power operation if appropriate characteristic impedance and termination are selected. However, a time-consuming electro-magnetic (EM) simulation is necessary to extract parasitic inductance. In addition, an on-chip transmission line design requires many parameter optimizations such as width, space, height and frequency etc. The design methodology becomes much more complicated than conventional *RC*-based design. Hence, a simple methodology of on-chip transmission line modeling and optimization are necessary.



Fig. 2.4 Transmission line architecture

#### 2.5 LC Resonance

Fig. 2.5 shows the on-chip clock distribution system using a *LC* resonance [4], [5]. Thanks to the charge recycling, the driver just compensates the resistive loss caused by both the inductor and the wire of on-chip clock distribution line. Therefore, this structure can achieve very high power efficiency. However, since the resonant frequency  $f_{osc}$  is determined by (2.1), the required inductance *L* becomes too small at high  $C_L$  scenario. Practically, an accurate small *L* is difficult to design due to the parasitic

inductance. Therefore, some repeater stages are required at high-frequency, long distance situation so as not to have a high capacitance load. As a result, the existing *LC* resonance structures suffer from the power consumption and jitter due to the increase number of repeaters.

$$f_{osc} = \frac{1}{2\pi\sqrt{LC_L}} \tag{2.1}$$



Fig. 2.5 LC resonance architecture

#### 2.6 The Proposed Structure

Fig. 2.6 shows the proposed on-chip clock distribution architecture. The proposed architecture makes full use of both *LC* resonance and transmission line. The *LC* tank is shared between the oscillation tuning capacitor  $C_D$  and the load capacitance  $C_L$  of clock distribution line. The blue dotted line shows the image of current distribution. The effective load capacitance is reduced owing to the *LC* resonance at  $C_L$  side. Thanks to this reduced effective capacitance, we can take out repeaters on the clock distribution line. Hence, we can utilize transmission line signaling. Since the clock distribution line is directly driven by a *LC* oscillator, the waveform is just a simple sinusoidal wave. Therefore, the design can be much easier and robust than conventional pulse-based due to the absence of higher harmonics.



Fig. 2.6 The proposed bufferless architecture

To realize the proposed system, we will investigate on the following three key designs.

#### 1. Inductor design

The performance of the *LC* oscillator is mainly determined by an inductor. A differential LC oscillator needs a differential inductor. Although, a conventional differential inductor has good symmetry, it is difficult to achieve high-frequency and low-coupling features. Fig. 2.7 shows the image why a differential inductor is difficult to achieve high-frequency operation. The phase relationship for two adjacent wire is almost in-phase for single inductor whereas differential inductor's one is out-of-phase. Therefore, the parasitic capacitance for differential inductor is larger than single inductor. This effect makes high-frequency operation for differential inductor difficult. Next, Fig. 2.8 shows the image why low-coupling feature is important. For instance, in RF systems, the magnetic fields generated from LC oscillator interfere with other blocks such as LNA (Low Noise Amplifier), PA (Power Amplifier) etc. and sometimes it causes undesired modulation. To avoid this scenario, typically each block is located far away to get good isolation at the cost of layout area. Thus, we will investigate on the following structure as shown in Fig. 2.9. The proposed inductor places two identical single inductors in stepping order. This feature makes the self-resonant-frequency same as single inductor. In addition, the 8-shaped current distribution tries to cancel magnetic flux each other. However, we may lose the symmetry to a certain extent due to the

connectivity to form the 8-shaped current distribution. The goal of this study is to investigate if the proposed inductor can achieve the high-frequency, low-coupling and good symmetry as we expected.



Fig. 2.7 Research objective of  $f_{SR}$  (self-resonant-frequency)



Fig. 2.8 Research objective of magnetic coupling



Fig. 2.9 Research objective of symmetry for 8-shaped differential inductor

#### 2. On-chip transmission line modeling and optimization

The proposed bufferless architecture utilizes the transmission line signaling mode. However, major problem of transmission line design is the design complexity. The following procedures are necessary to perform one transmission line simulation. Step-1. layout operation. Step-2. s-parameter extraction by EM (Electro-Magnetic)simulation. Step-3. transient simulation by SPICE. There are several problems in this design procedure. First, since there are many parameters such as width, space and thickness etc. for transmission line design, the layout operations are time-consuming work if the parameter optimizations are necessary. Second, EM-simulation is usually expensive and takes time if the simulation area is large. Third, sometimes we have convolution errors in transient analysis from s-parameter to time-domain change and this is strongly dependent on SPICE simulator. Thus, we will investigate on the simple, accurate equivalent transmission line modeling and optimization without using EMsimulation as shown in Fig. 2.10. A 5-wire GSGSG physical structure is converted to an equivalent single-ended RLC-distributed model without using EM-simulation. The reason for the selection of GSGSG is to make the model accurate by adding GND to signal lines not only bottom but also side. Once a general 5-wire GSGSG structure is modeled, the same methodology can be easily applicable to other structures such as GSSG, SGS, SS. The single-ended model makes a basic transmission theory adoption possible. In addition, although conventional works only treat thick top metal layer, we will expand available metal layers to multiple layers. This is because top thick metal layer is commonly used for power supply mesh and it is not guaranteed that we can always utilize thick top metal layer for on-chip clock distribution line to avoid complexity of layout. Therefore, it is valuable to investigate what will happen if we utilize other metal layers such as metal-5, metal-4 etc. As for optimization, we will develop the algorithm to optimize delay, power consumption of driver and output swing from the given target specifications using the proposed equivalent model. The importance of delay and power are to satisfy the allowable timing budget and power of driver circuitry. The output swing is essential factor for level shift stages after clock distribution. Thus, the goal of this study is to develop the fully calculation-based design methodology of on-chip transmission line modeling and optimization.



Fig. 2.10 Research objectives of on-chip transmission line modeling and optimization

#### 3. LC oscillator design

The proposed bufferless architecture shares LC-tank between frequency tuning capacitor  $C_D$  and load capacitance  $C_L$  as shown in Fig. 2.6. However, this new architecture has not been researched by any papers yet. For instance, oscillation frequency of a conventional LC oscillator is determined by only one resonant loop which is determined by (2.1). On the other hand, the proposed one has at least two resonant loops. Also, ideal inductor's tap location is not clear. Thus, the goal of this study is to investigate on the basic theory of this new LC oscillator such as oscillation frequency, output swing and tap location etc. as shown in Fig. 2.11.



Fig. 2.11 Research objectives of LC oscillator

#### 2.7 Chapter Summary

In this chapter, we briefly introduced the pros and cons of existing on-chip clock distribution systems. The conventional repeater-based on-chip clock distribution systems are not suitable for high-frequency and long distance systems due to the increase of power consumption and jitter caused by multiple repeater stages. The transmission line approach can reduce the number of repeaters. However, the design methodologies become complicated. It raised the necessity of a simple methodology of on-chip transmission line modeling and optimization. The *LC* resonance can improve the power efficiency owing to the charge recycling mode. However, repeater stages are still required due to isolate the capacitive load from *LC* oscillator. It raised the necessity of low frequency sensitivity *LC* oscillators. Based on the backgrounds of existing systems, we also briefly introduced the basic idea of our proposed system which makes full use of both transmission line and *LC* resonance mode.

#### References

- K. Hu, et al., "Comparison of on-die global clock distribution methods for parallel serial links," *IEEE International Symposium on Circuits and Systems (ISCAS)*, 2009.
- [2] K. Banerjee and A. Mehrotra, "Analysis of on-chip inductance effects for distributed RLC interconnects," *IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems*, vol. 21, pp. 904-915, 2002.
- [3] P. Heydari, S. Abbaspour, and M. Pedram, "A comprehensive study of energy dissipation in lossy transmission lines driven by CMOS inverters," *Proceedings of the IEEE 2002 Custom Integrated Circuits Conference*, 2002.
- [4] J. Poulton, et al., "A 14-mW 6.25-Gb/s transceiver in 90-nm CMOS," *IEEE Journal of Solid-State Circuits*, vol. 42, pp. 2745-2757, 2007.
- [5] K. Fukuda, et al., "A 12.3-mW 12.5-Gb/s complete transceiver in 65-nm CMOS process," *IEEE Journal of Solid-State Circuits*, vol. 45, pp. 2838-2849, 2010.

## **Chapter 3**

### **Design of High-Frequency, Low-Coupling Inductor**

#### 3.1 Introduction

#### 3.1.1 Background

In recent high-speed digital and RF systems, on-chip inductor plays an important role. For instance, a differential *LC* oscillator as shown in Fig. 3.1 is used in high-speed serial links to achieve both high-frequency and low-jitter requirements [1]. A conventional differential *LC* oscillator uses a differential inductor for the symmetrical outputs. However, the low self-resonant-frequency ( $f_{SR}$ ) has been a major drawback due to the long routing of wire [2]. In RF transceiver systems, strong magnetic fields from power amplifier (PA) interfere with local oscillator (LO). This mechanism results in injection locking [3]. Hence, each inductor must be located far away to a certain extent at a cost of large chip area. Thus, a differential inductor that has symmetry, high-*f*<sub>SR</sub> and low-coupling has been desirable.



Fig. 3.1 Differential LC oscillator

#### **3.1.2 Previous Work**

According to Findley et al. [2], a high- $f_{SR}$  differential inductor is introduced. Although this inductor is high- $f_{SR}$ , low-coupling and symmetry features have not been achieved. On the other hand, some low-coupling and symmetry twisted inductors are introduced in papers [3] [4] [5]. However, high- $f_{SR}$  feature has not been achieved due to the long routing. Thus, there are no ideal differential inductors that can achieve symmetry, high- $f_{SR}$  and low-coupling features.

#### 3.1.3 Objectives and Scope of This Study

The goal of this study is to develop the differential inductor which has high- $f_{SR}$ , symmetry and low-coupling features for *LC* oscillator. In this chapter, we will propose a novel high-frequency, low-coupling 8-shaped differential inductor with Patterned Ground Shield (PGS) [6]. The PGS structure can maximize the Quality factor while improving the Electro-Magnetic (EM) simulation time owing to the reduced number of nodes [7] [8]. The 8-shaped structure makes the symmetry, low-coupling and high- $f_{SR}$  possible. The test chip is implemented in TSMC 0.18-µm 1-poly 6-metal CMOS process.

#### **3.2 Inductor Design**

The basic design of the proposed inductor is based on a single-inductor design. For substrate shielding, we adopted PGS for improvement of Quality factor, accuracy of EM-simulation and reduction of the number of elements. For metal structure, Metal 6, 5, 4 are shorted together to reduce DC resistance while keeping the required  $f_{SR}$ . We placed vias at metal edge only to mitigate the skin effect [9]. The proposed inductor is organized by placing two single inductors in stepping order as shown in Fig. 3.2. In Fig. 3.2, *P1-P2*, *P3* and *P4-P5* represent differential-input, center-tap and quarter-tap, respectively. Fig. 3.3 shows the comparison of the current distribution and magnetic fields between a conventional and the proposed differential inductor. The magnetic field of each inductor induced by the 8-shaped current distribution tries to cancel out far-field magnetic coupling each other. Hence, the proposed structure is able to achieve the symmetry, high- $f_{SR}$  and low-coupling features.



Fig. 3.2 The proposed 8-shaped differential inductor

The green-, pink-color lines and dots are poly, metal and via from (c) to (h)



Fig. 3.3 Current distributions and magnetic fields

#### 3.3 Simulation Results

Table 3.1 shows the detailed design parameters of the proposed inductor. We used Keysight "Momentum-RF" for EM-simulation tool and performed s-parameter analysis at Fig. 3.4 condition. The differential L and Q are derived from s-parameter to z-parameter conversion using (3.1) to (3.3). Fig. 3.5 shows the comparison results between single (Fig. 3.4 (a)) and proposed inductor (Fig. 3.4 (b)). The proposed inductor achieved almost twice higher L while having a same  $f_{SR}$  and Q compared with a single inductor. Fig. 3.6 shows the phase difference at Fig. 3.4 (c) configuration to check the symmetry of the proposed inductor. The phase difference is only 2.4 degree at 3-GHz and it is symmetrical enough for differential LC oscillator use. Fig. 3.7 shows the comparison results of L, Q at Fig. 3.4 (b) and (d) configurations. For Fig. 3.4 (d), the tap location has been chosen such that L at Fig. 3.4 (d) can be half of Fig. 3.4 (b) by using approximate equations introduced in the study of Mohan et al [10]. The simulation results showed good match with our expectations. Hence, the proposed differential inductor can be accurately and simply designed based on a single inductor design.

| Process         | TSMC 0.18-µm 1-poly, 6-metal |
|-----------------|------------------------------|
| Metal structure | M6(thick) // M5 // M4        |
| Metal width     | 4.5-μm                       |
| Metal space     | 1.5-μm                       |
| N. of turns     | 6                            |
| Shielding       | PGS (Metal1 + poly)          |
| Area            | 130 x 235 μm <sup>2</sup>    |
| L at 3-GHz      | 6.22-nH (differential)       |
| Q at 3-GHz      | 5.9                          |
| <i>fsr</i>      | 13.3-GHz                     |

Table 3.1 Design parameters of our proposed inductor

"//" represents short by vias.



Fig. 3.4 Measurement configurations

$$Z_{diff} = Z_{11} - Z_{12} - Z_{21} + Z_{22}$$
(3.1)

$$L = \frac{imag(Z_{diff})}{(2\pi f)}$$
(3.2)

$$Q = \frac{imag(Z_{diff})}{real(Z_{diff})}$$
(3.3)

Here,

| Z <sub>diff</sub> | : differential impedance | f    | : frequency |
|-------------------|--------------------------|------|-------------|
| imag              | : imaginary part         | real | : real part |



(a) Inductance





Fig. 3.5 Comparison between Fig. 3.4 (a) and Fig. 3.4 (b)



Fig. 3.6 Phase difference at Fig. 3.4 (c) configuration



(a) Inductance



(b) Quality factor

Fig. 3.7 L, Q results at Fig. 3.4 (b) and Fig. 3.4 (d) configurations
# 3.4 Measurement Results

# 3.4.1 Inductance, Quality Factor and Resistance

Fig. 3.8 shows the photograph of the fabricated proposed 8-shaped differential inductor and measurement configurations. The proposed inductor is measured using  $50\Omega$  probe station and Keysight 8722C network analyzer. Fig. 3.9 shows the comparison results of *L*, *Q*, *R* between measurement and EM simulation. The measurement results showed good match with EM-simulation results.



Fig. 3.8 Photograph of the proposed 8-shaped differential inductor and measurement configurations



(a) Inductance







(c) Resistance

Fig. 3.9 Measurement and EM-simulation results

#### 3.4.2 Coupling Coefficient

Fig. 3.10 shows the photograph of fabricated four pair of 8-shaped differential inductor and measurement configurations. Each inductor is isolated by 20.5-µm spacing. S-parameter is measured using Keysight E5071 4-port network analyzer. The coupling coefficient of mutual inductance (K) can be obtained by transformer basics as shown in Fig. 3.11. In Fig. 3.11, the voltage of port-1 and port-2 are defined as (3.4) and (3.5). In this configuration, if port-1 is input and port-2 is open, then  $i_2 = 0$  and  $v_1$ ,  $v_2$  can be expressed as (3.6), (3.7) respectively. Since  $v_2$  and  $i_1$  are known values, we can get mutual inductance M from (3.7). We can use the same manner in the case of port-1 is open, port-2 is input. Therefore, the coupling coefficient K can be defined as (3.8). Table 3.2 shows the number of combinations of K in this measurement setup. In Table 3.2, suffix of K represents labels of inductor. For instance,  $K_{12}$  represents K between IND-1 and IND-2. For EM-simulation, we used Integrand Software Inc. EMX. Fig. 3.12 shows comparison results of K between measurement and Electro-Magnetic (EM) simulation. From Fig. 3.12 (a), the maximum K is as low as 0.03 and measurement result showed good match with EM-simulation at  $\Delta x=20.5\mu$ m. Fig. 3.12 (b) shows the  $\Delta x$  dependency which is interpolated by EM-simulation. This result shows that far-field magnetic coupling is canceled out as we expected and  $\Delta x=60\mu m$  is sufficient enough for better isolation. However, there is an error at  $\Delta x=261.5\mu m$ . We believe this error is caused by accuracy of EM-simulation.

$$v_1 = L_1 \cdot \frac{di_1}{dt} + M \cdot \frac{di_2}{dt} \tag{3.4}$$

$$v_2 = L_2 \cdot \frac{di_2}{dt} + M \cdot \frac{di_1}{dt}$$
(3.5)

$$v_1 = L_1 \cdot \frac{di_1}{dt} \tag{3.6}$$

$$v_2 = M \cdot \frac{di_1}{dt} \tag{3.7}$$

$$K = \frac{M}{\sqrt{L_1 \cdot L_2}} \tag{3.8}$$



Fig. 3.10 Photograph of the four pair of 8-shaped differential inductor and measurement configurations



Fig. 3.11 Mutual inductance of transformer

Table 3.2 Combinations of aggressor and victim in four pair of inductor

| Parameters             | Aggressor | Victim |
|------------------------|-----------|--------|
| <i>K</i> <sub>12</sub> | IND-1     | IND-2  |
| $K_{13}^{}$            | IND-1     | IND-3  |
| $K_{14}^{-2}$          | IND-1     | IND-4  |
| K <sub>23</sub>        | IND-2     | IND-3  |
| <i>K</i> <sub>24</sub> | IND-2     | IND-4  |
| K <sub>34</sub>        | IND-3     | IND-4  |



(a) Comparison of K between EM-simulation and measurement results



(b) Dependency of distance for *K* 

Fig. 3.12 Comparison of K between EM-simulation and measurement results

# 3.5 Chapter Summary

In this chapter, we proposed a high- $f_{SR}$ , symmetrical and low-coupling 8-shaped differential inductor with PGS for differential *LC* oscillators. The 8-shaped structure organized by two single inductors in stepping order realizes a high- $f_{SR}$  same as single inductors and the symmetry same as differential inductors. Hence, the proposed inductor makes full use of both single and differential inductor's features. In addition, the proposed inductor can mitigate the far-field magnetic coupling owing to the 8-shaped current distribution. This feature not only improves the interference problems such as injection locking but also contributes to area-saving owing to the compact spacing. A PGS structure can maximize the *Q*-factor while improving the computational time and accuracy of EM-simulation due to the reduced elements. The measurement results showed good agreement with EM simulation.

# References

- Poulton J, Palmer R, Fuller AM, et al., "A 14-mW 6.25-Gb/s transceiver in 90-nm CMOS.," *IEEE Solid-State Circuits Society.*, vol. 42, pp. 2745-2757, 2007.
- [2] Findley P, Ali Rezvani G, Tao J., "Novel differential inductor design for high selfresonance frequency.," *IEDM Technical Digest. IEEE International Electron Devices Meeting*, 2004.
- [3] Nobuyuki Itoh; Hideaki Masuoka; Shin-ichi Fukase; Kenichi Hirashiki; Minoru Nagata, "Twisted Inductor VCO for Suppressing On-chip Interferences," Asia-Pacific Microwave Conference, 2007.
- [4] Nathan M. Neihart; David J. Allstot; Matt Miller; Pat Rakers, "Twisted inductors for low coupling mixed-signal and RF applications," *IEEE Custom Integrated Circuits Conference*, 2008.
- [5] Andrew Poon; Andrew Chang; Hirad Samavati; S. Simon Wong, "Reduction of Inductive Crosstalk Using Quadrupole Inductors," *IEEE Journal of Solid-State Circuits*, vol. 44, pp. 1756-1764, 2009.
- [6] Masahiro Ichihashi; Haruichi Kanaya, "A high-frequency, low-coupling 8-shaped differential inductor with patterned ground shield," *Microwave and Optical Technology Letters*, vol. 60, pp. 2704-2707, 2018.
- [7] Wong Y, "On-chip Spiral Inductors With Patterned Ground Shields For Si-based RF IC's," Symposium 1997 on VLSI Circuits, 1997.
- [8] Yim S, Chen T, K.K. O., "The effects of a ground shield on the characteristics and performance of spiral inductors," *IEEE Journal of Solid-State Circuits*, vol. 37, pp. 237-244, 2002.

- [9] Xiangming X, Li P, Cai M, Han B., "Design of Novel High-Q-factor multipath stacked on-Chip spiral inductors," *IEEE Trans Electron Devices*, vol. 59, pp. 2011-2018, 2012.
- [10] Mohan SS, del Mar Hershenson M, Boyd SP, Lee TH., "Simple accurate expressions for planar spiral inductances," *IEEE Journal of Solid-State Circuits*, vol. 34, pp. 1419-1424, 1999.

# **On-chip Transmission Line Modeling and Optimization**

With the increases of frequency and chip area in recent digital VLSI systems, the design methodologies for on-chip clock distribution lines have been changing from an *RC*-model to an *RLC*-model. Hence, they must be treated as transmission lines. However, an on-chip transmission line design needs time-consuming Electro-Magnetic (EM) simulation which is not suitable for recursive optimizations such as metal width, space, length, thickness and the number of layers etc. within the practical time range. Therefore, a new design methodology without using EM-simulation is necessary. In this chapter, we introduce a fully calculation-based on-chip transmission line modeling and optimization methodology. The proposed methodology has been verified using a 9-mm on-chip clock distribution line at 3-GHz in the TSMC 0.18-µm 1-poly 6-metal CMOS process. The proposed fully calculation-based model showed good agreement with both EM-simulation and experimental results. We also show that the proposed optimization methodology can find the smallest physical structure while achieving the lowest power consumption from the given target specifications such as delay and output swing.

# 4.1 Introduction

#### 4.1.1 Background

An on-chip clock distribution system is becoming one of the most power-hungry block in recent digital VLSI systems such as microprocessors, memories. For instance, a huge amount of power (from 25% to 70% in the chip) can be dissipated in the case of microprocessors [1], [2], [3]. As the frequencies and chip area increase, the effects of parasitic inductance have been prominent for the following reasons. First, a high signal slew rate makes the reactive impedance ( $j\omega L$ ) higher than the resistive impedance (R). Second, the reflection effects must be taken into consideration due to the relationship between the wavelength ( $\lambda$ ) and the length of the clock distribution line as shown in Fig. 4.1. Hence, on-chip clock distribution design must be treated as a transmission line.



Fig. 4.1 Border line between *RC* and *RLC*-model

However, an on-chip transmission line design has many practical difficulties as follows. (1) Time consuming EM-simulation is required for parasitic inductance extraction. EM-simulation extracts the s-parameter based on the given physical structure. Therefore, every time we try to run EM-simulation in different conditions, we must prepare different layouts. This recursive process is not suitable for optimization process despite recent improvements in computer processing. In addition, EM-simulators are usually expensive. (2) Some convergence problems from the extracted s-parameter to the time-domain exist in the convolution process for SPICE simulation. Since these problems are strongly dependent on the SPICE simulator, sometimes we misunderstand the results. (3) There are many parameters such as width, space, length, the number of layers and combinations for optimization. For these reasons, an accurate, simplified *RLC*-model and optimization methodology have been desirable for circuit designers to estimate the best physical structure and parameters from the given target conditions without any EM-simulations.

#### 4.1.2 Previous Work

Many works have been investigated on *RLC*-modeling and optimization. For *RLC*-models, some efficient and precise models have been reported [4] – [12]. However, most models use in part either EM or SPICE simulations. In addition, a research of interest is limited to a thick metal layer only even though multiple layers are available in the actual fabrication processes. Thus, there are currently no fully calculation-based *RLC*-models for multiple layers. For optimization methodologies, most of existing researches optimizes the delay and power consumption only although actual designs also need the large output swing [13] – [18]. The reason for the importance of large output swing is because receivers on the clock distribution line dissipate huge amounts of power to amplify the received signal from small swing to  $V_{DD}$  at level shift stages. Thus, no optimization methodologies exist to handle the delay, power and output swing at the same time.

#### 4.1.3 Objectives and Scope of This Study

The goals of this research is to develop the following two features: (1) to derive an accurate and simplified *RLC*-model for multiple interconnect layers (6 metal layers for this process) without any EM-simulations; (2) to derive an optimization methodology that can handle following three parameters at the same time using the proposed *RLC*model. Namely, delay, power and output swing.

The conditions for this research are as follows. A differential signaling with a five-wire GSGSG configuration in a 9-mm straight line, as shown in Fig. 4.2 is used as a clock distribution structure. This structure is commonly used in high-speed serial links which require the highest frequency in the chip due to the parallel to serial conversion. The length of 9-mm covers existing large chips and allows us to consider the transmission line effects. For the fabrication process, TSMC 0.18- $\mu$ m 1-poly 6-metal CMOS is used due to limited available options in our laboratory. For the target frequency, we chose around 3-GHz because the estimated maximum frequency in the worst condition at Fan Out (FO) = 4 is around 3~4 GHz under this 0.18- $\mu$ m process. Although both frequency and process are not the most up to date, they can easily be applicable to different specifications since this research is about *RLC*-modeling and optimization methodology. In addition, the relationship between this target frequency and length is within the *RLC* region, as shown in Fig. 4.1. Therefore, we can discuss transmission line effects. Table 4.1 summarizes the design parameters.



Fig. 4.2 High-speed I/O clock distribution without repeaters

| Parameters                        | Value                        |
|-----------------------------------|------------------------------|
| Structure                         | 5-wire GSGSG, 6-metal layers |
| Width                             | 3-µm                         |
| Space                             | 3-µm                         |
| Length                            | 9-mm                         |
| Frequency                         | 3-GHz                        |
| Signaling mode                    | Differential                 |
| Driver $Z_{dr}$                   | 50Ω                          |
| Termination <i>R</i> <sub>t</sub> | 50Ω                          |

Table 4.1 Design parameters

# 4.2 A Simplified, Fully Calculation-based *RLC*-model

## 4.2.1 Capacitance Estimation

Many works have been developed for calculating the line capacitance [19], [20]. Sakurai et al. gave a general capacitance formulae (4.1) - (4.2) for a parallel 3-lines with 1-GND assuming same dielectric and wire thickness as shown in Fig. 4.3 (a) [8]. The effective dielectric constant  $\varepsilon_{eff}$  for *N*-stacked layer as shown in Fig. 4.3 (b) can be expressed as (4.3) and the total capacitance is expressed as (4.4). The impact of dielectric-loss can be negligible under the condition of  $w_0 = 0.15 \mu$ m-15 $\mu$ m, frequency of several dozens of GHz in ULSI interconnects [21]. Therefore, we will ignore dielectric-loss in this modeling.

$$C_{gnd} = \varepsilon \left\{ 1.15 \left( \frac{w_0}{h_0} \right) + 2.80 \left( \frac{t_0}{h_0} \right)^{0.222} \right\}$$
(4.1)

$$C_{side} = \varepsilon \left\{ 0.03 \left(\frac{w_0}{h_0}\right) + 0.83 \left(\frac{t_0}{h_0}\right) - 0.07 \left(\frac{t_0}{h_0}\right)^{0.222} \right\} \left(\frac{s_0}{h_0}\right)^{-1.34}$$
(4.2)

$$\varepsilon_{eff} = \sum_{n=1}^{N} d_n / \sum_{n=1}^{N} D_n \quad \left( D_n = \frac{d_n}{\varepsilon_n} \right)$$
(4.3)

$$C_{all} = C_{gnd} + 2C_{side} + C_{tap} \tag{4.4}$$

Here

| $w_0$             | : width of wire                        | <i>s</i> <sub>0</sub> | : space of wire                  |
|-------------------|----------------------------------------|-----------------------|----------------------------------|
| $t_0$             | : thickness of wire                    | $h_0$                 | : height from wire to GND        |
| $\varepsilon_n$   | : dielectric constant for Nth layer    | $d_n$                 | : height from wire to wire       |
| $C_{all}$         | : total cap.                           | $C_{gnd}$             | : total wire cap. coupled to GND |
| C <sub>side</sub> | : total wire cap. coupled to side wire | $C_{tap}$             | : total wire-tap cap.            |



## 4.2.2 Resistance Estimation

DC resistance can be calculated by (4.5). Since the skin depth is longer than the thickest metal layer, we will ignore high frequency physical effects called skin effect and proximity effect in this modeling [21] - [24].

$$R_{all} = \frac{l_0}{\sigma w_0 t_0} = R_{sh} \frac{l_0}{w_0}$$
(4.5)

Here

| $w_0$           | : width of wire            | $t_0$            | : thickness of wire    |
|-----------------|----------------------------|------------------|------------------------|
| $l_0$           | : length of wire           | σ                | : conductivity of wire |
| R <sub>sh</sub> | : sheet resistance of wire | R <sub>all</sub> | : total resistance     |

# 4.2.3 Inductance Estimation

As for self and mutual inductance, accurate empirical equations (4.6) - (4.7) are reported [25], [26]. A coupling coefficient *K* is defined as (4.8).

$$L_{0} = \frac{\mu_{0}l_{0}}{2\pi} \left\{ ln \frac{2l_{0}}{w_{0} + t_{0}} + \frac{1}{2} + \frac{2}{9} \left( \frac{w_{0} + t_{0}}{l_{0}} \right) \right\}$$
(4.6)

$$M_{0} = \frac{\mu_{0} l_{0}}{2\pi} \left\{ ln \left( \frac{l_{0}}{d_{0}} + \sqrt{1 + \frac{l_{0}^{2}}{d_{0}^{2}}} \right) - \sqrt{1 + \frac{d_{0}^{2}}{l_{0}^{2}}} + \frac{d_{0}}{l_{0}} \right\} \quad (d_{0} = w_{0} + s_{0})$$

$$K = \frac{M_{0}}{L_{0}}$$

$$(4.8)$$

Here

| $W_0$ | : width of wire   | $t_0$   | : thickness of wire    |
|-------|-------------------|---------|------------------------|
| $l_0$ | : length of wire  | $\mu_0$ | : permeability         |
| $L_0$ | : self-inductance | $M_0$   | : mutual-inductance    |
| $d_0$ | : wire pitch      | K       | : coupling coefficient |

#### 4.2.4 Simplification Methodology

In this section, we will explain the conversion methodology from a five-wire GSGSG physical model to an equivalent single-ended *RLC*-model. For capacitance effects, we will consider only two adjacent traces since the capacitance coupling is "short-range" effect. For inductance effects, we will consider any traces since inductive coupling is a "long-range" effect [27]. For resistance effects, we only consider DC resistance as we explained in Section 4.2.2. Therefore, we begin the simplification process with *RL*-model at first. Finally, we will make *RLC*-model by adding the capacitance to *RL*-model. Fig. 4.4 shows the conversion process for an *RL*-model.

First, a five-wire GSGSG physical model, as shown in Fig. 4.4 (a), is converted to a five-wire equivalent circuit model shown in Fig. 4.4 (b). Here, R, L and K represent the total resistance, the total self-inductance and the coupling coefficient of mutual inductance, respectively. Second, Fig. 4.4 (b) is converted to a five-wire calculation model as shown in Fig. 4.4 (c). Here I and  $I_x$  represent the input current and the induced current, respectively.  $V_I$  to  $V_7$  represent the induced voltages. For instance,  $V_I$  is the induced voltage from line 2 to line 1.  $V_8$  represents the voltage caused by self-inductance.  $V_9$  and  $V_{10}$  represent the voltage drop due to the resistive loss of wire.  $V_i$  is the generated voltage at line 2 which is the signal source of this model. In these equations, "*s*" represents Laplace operator.

As for line 1 in Fig. 4.4 (c), we get (4.9) – (4.11).

$$-V_4 - V_9 + V_3 - V_2 + V_1 = 0 (4.9)$$

$$-sLI_x - RI_x + sK_{15}LI_x - sK_{14}LI + sK_{12}LI = 0 ag{4.10}$$

$$I_{\chi} = \frac{sL(K_{12} - K_{14})}{R + sL(1 - K_{15})} \cdot I$$
(4.11)

As for line 2 in Fig. 4.4 (c), we get (4.12) – (4.13).

$$V_i + V_5 - V_6 + V_7 - V_{10} - V_8 = 0 (4.12)$$

$$V_i = sLI_x(K_{25} - K_{12}) + I\{R + sL(1 - K_{24})\}$$
(4.13)

Solving (4.13), the total effective inductance  $L_{eff}$  on line 2 can be obtained by (4.14) where "*imag*" and "*f*" represent taking the imaginary part and frequency, respectively.

$$L_{eff} = imag\left(\frac{V_i}{I}\right) / (2\pi f) \tag{4.14}$$

Third, thanks to the simplification process above, Fig. 4.4 (c) can be simplified to an equivalent differential model as shown in Fig. 4.4 (d). Forth, Fig. 4.4 (d) can be further simplified to single-ended model as shown in Fig. 4.4 (e) owing to the symmetrical structure.

In summary, a five-wire GSGSG physical model shown in Fig. 4.4 (a) can be simplified to an equivalent single-ended model shown in Fig. 4.4 (e). Note that the total number of elements is reduced from 20 [Fig. 4.4 (b)] to only 2 [Fig. 4.4 (e)]. This significant reduction not only shortens the computational time but also allows us to use basic transmission line theory and gives intuitive insight. Fig. 4.5 shows the comparison result of effective inductance obtained by SPICE simulation between Fig. 4.4 (b) and Fig. 4.4 (e) under a certain condition. The comparison result exactly matched each other.

Thus, the validity of the proposed simplification methodology has been proved. Finally, a single-ended distributed *RLC*-model can be obtained by adding the wire capacitance to Fig. 4.4 (e) as shown in Fig. 4.6.



(a) 5-wire GSGSG physical model



(e) equivalent single-ended model (half circuit of (d))

Fig. 4.4 Conversion process from a 5-wire GSGSG physical model to an equivalent single-ended model

$$V_{1} = sK_{12}LI V_{2} = sK_{14}LI V_{3} = sK_{15}LI_{x} V_{4} = sLI_{x} V_{5} = sK_{24}LI V_{6} = sK_{25}LI_{x} V_{7} = sK_{12}LI_{x} V_{8} = sLI V_{9} = RI_{x} V_{10} = RI$$



Fig. 4.5 Comparison results between Fig. 4.4 (b) and Fig. 4.4 (e)



Fig. 4.6 Proposed equivalent distributed RLC-model

# 4.2.5 Calculation of Transmission Line Model

Thanks to the simplification in section 4.2.4, we can easily apply a basic transmission line theory. It is important to note that the accuracy is limited by equations (4.1) - (4.8), and this is outside the scope of this study.

A. Characteristic impedance  $Z_0$ 

 $Z_0$  defined in (4.15) is called characteristic impedance and r,  $L_{eff}$ ,  $C_{all}$ , s represent total resistance, total effective inductance, total capacitance, Laplace operator respectively.

$$Z_0 = \sqrt{\frac{r + sL_{eff}}{sC_{all}}} \tag{4.15}$$

# B. Propagation delay $T_{pd}$ , propagation constant $\gamma$

 $T_{pd}$  defined in (4.16) is called propagation delay and it is expressed by propagation constant  $\gamma$  defined in (4.17) where  $l_0$  represents length of transmission line.

$$T_{pd} = \frac{imag(\gamma l_0)}{2\pi f} \approx 2\pi f \sqrt{L_{eff} \cdot C_{all}}$$
(4.16)

$$\gamma = \sqrt{\frac{\left(r + sL_{eff}\right) \cdot \left(sC_{all}\right)}{l_0^2}} \tag{4.17}$$

# C. Input impedance $Z_{in}$

 $Z_{in}$  defined in (4.18) is called input impedance where  $Z_t$  and x represent termination impedance and x-coordinate respectively. In this case, x becomes  $l_0$ .

$$Z_{in} = \left\{ \frac{Z_t + Z_0}{2} e^{\gamma x} + \frac{Z_t - Z_0}{2} e^{-\gamma x} \right\} / \left\{ \frac{Z_t + Z_0}{2Z_0} e^{\gamma x} - \frac{Z_t - Z_0}{2Z_0} e^{-\gamma x} \right\}$$
(4.18)

#### D. Current consumption $I_{cc}$

 $I_{cc}$  defined in (4.19) means current consumption where  $Z_{dr}$  represents driver impedance.

$$I_{cc} = V_{00} / (Z_{dr} + Z_{in}) \tag{4.19}$$

# E. Voltage swing $V_s(x)$

 $V_s(x)$  defined in (4.20) is called standing wave which is addition of forward wave and backward wave.  $V_{01}$  defined in (4.21) represents the divided voltage of input signal  $V_{00}$ by driver impedance  $Z_{dr}$  and input impedance  $Z_{in}$ .  $\rho$  defined in (4.22) is called reflection coefficient.  $x_f$  and  $x_b$  defined in (4.23) and (4.24) mean x-coordinate of

forward wave and backward wave respectively and these are used in (4.20) to calculate the voltage at arbitrary point.  $M_{tap}$  and N represent tap location and the number of units.

$$V_s(x) = V_{01} \cdot \frac{e^{-\gamma x_f} + \rho e^{-\gamma (l_0 + x_b)}}{1 + \rho e^{-2\gamma l_0}}$$
(4.20)

$$V_{01} = V_{00} \cdot \frac{Z_{in}}{Z_{dr} + Z_{in}} \tag{4.21}$$

$$\rho = \frac{Z_t - Z_0}{Z_t + Z_0} \tag{4.22}$$

$$x_f = \frac{l_0(M_{tap} - 1)}{N}$$
(4.23)

$$x_b = \frac{l_0 \left( N + 1 - M_{tap} \right)}{N} \tag{4.24}$$

# 4.3 Model Accuracy

#### 4.3.1 Test Bench Setup

Fig. 4.7 shows the test bench to verify the characteristics of transmission line between calculation and EM-simulation. The test bench assumes 9-mm length, 3-GHz frequency,  $1V_{0p}$  input with a 50  $\Omega$  driver and 50  $\Omega$  termination as shown in Table 4.1. We applied this configuration to 15 different physical structures (Case 00 to Case 14) as shown in Table 4.2. As for calculation-based model, we will directly evaluate the equations presented in section 3.2 based on the Fig. 4.6 model. As for EM-simulationbased model, we will use Keysight Technologies "Momentum-RF" and then the generated s-parameters are verified by AC analysis at Fig. 4.7 condition. Note that the total length is divided into 1.5-mm unit to minimize the EM-simulation time.

| Parameters | Value              |
|------------|--------------------|
| Side wire  | 5-wire (GSGSG)     |
| Shielding  | Top: No Bottom: M1 |
| Case 00    | M6                 |
| Case 01    | M6//M5             |
| Case 02    | M6//M5//M4         |
| Case 03    | M6//M5//M4//M3     |
| Case 04    | M6//M5//M4//M3//M2 |
| Case 05    | M5                 |
| Case 06    | M5//M4             |
| Case 07    | M5//M4//M3         |
| Case 08    | M5//M4//M3//M2     |
| Case 09    | M4                 |
| Case 10    | M4//M3             |
| Case 11    | M4//M3//M2         |
| Case 12    | M3                 |
| Case 13    | M3//M2             |
| Case 14    | M2                 |

Table 4.2 Physical structures

- "M" represents Metal layer.

- "//" represents short by vias. It makes the effective resistance lower.

- "M6" is thick metal layer. Others are intermediate layer.



Fig. 4.7 Test bench for characterization of transmission line

#### 4.3.2 Results of Comparison and Analysis

Fig. 4.8 shows comparison results for total resistance  $R_{all}$ , total effective inductance  $L_{eff}$ , total capacitance  $C_{all}$ , propagation delay time  $T_{pd}$ , imput impedance  $Z_{in}$ , power consumption of 50 $\Omega$  driver  $I_{cc}$ , near-end voltage swing  $V_{01}$  and far-end voltage swing  $V_{31}$ . The x and left, right y-coordinate represent case number and absolute value, error against EM-simulation respectively. To obtain a better insight, we divided the Fig. 4.8 into four groups ("Low-R", "Low-R, Low-C", "High-R, Low-C", "High-R, High-C") as shown in Table 4.3. The reason of these RC-based separations are because L does not change much under the constant length condition. Note that L is mainly determined by the length as we can see from (4.6). Therefore, errors on calculation-based model should be caused by either R or C.

Table 4.3 (a) shows the accuracy of a fully calculation-based model against EMsimulation; the maximum error is 28.3%. Table 4.3 (b) shows the case when *RLC* errors of empirical equations are removed from Table 4.3 (a). We used extracted *RLC* instead of equation-based *RLC*. In this case, the maximum error has been reduced from 28.3% to 17.7%. Therefore, an error of 10.6% comes from the accuracy of empirical equations. Table 4.3 (c) shows the case when we take the resistive loss of shielding layer M1 into consideration. The maximum error has been reduced from 17.7% to 8.3%. In addition, if we avoid M2 layer (i.e. avoid "High-*C*" cases), the maximum error is reduced from 8.3% to 4.4%. This is because the real part of propagation constant increases at high frequency due to *LC* resonance.

To summarize, it has revealed that the following two considerations are important for better accuracy. First, avoid high capacitance case such as M2 as a signal, M1 as a shielding layer. This condition improves the accuracy significantly to ~15% which is accurate enough for the early design estimation purposes. Second, consider the resistive loss of the shielding layer to the unit model for further improvement.

















# (b) Total effective inductance: $L_{eff}$



(d) Propagation delay:  $T_{pd}$ 



(f) Current consumption:  $I_{cc}$ 



(g) Near-end output swing:  $V_{01}$  (h) Far-end output swing:  $V_{31}$ Fig. 4.8 Comparison results of  $R_{all}$ ,  $L_{eff}$ ,  $C_{all}$ ,  $T_{pd}$ ,  $Z_{in}$ ,  $I_{cc}$ ,  $V_{01}$ ,  $V_{31}$ 

45

| Table 4.3 Summary of absolute erro | or of Fig. 4.8 (d) – (h) |
|------------------------------------|--------------------------|
|------------------------------------|--------------------------|

| Parameters      | Low-R | Low-R, Low-C | High-R, Low-C | High-R, High-C |
|-----------------|-------|--------------|---------------|----------------|
| $T_{pd}$        | 6.0%  | 6.0%         | 13.4%         | 7.7%           |
| $Z_{in}$        | 4.4%  | 4.4%         | 11.6%         | 4.5%           |
| I <sub>cc</sub> | 2.3%  | 2.3%         | 8.5%          | 3.5%           |
| $V_{01}$        | 2.2%  | 2.2%         | 3.1%          | 1.4%           |
| $V_{31}$        | 21.6% | 12.1%        | 16.0%         | 28.3%          |

(a) Calculation / EM-simulation (Proposed model)

(b) Calculation / EM-simulation (Removed *RLC* error of empirical equations from (a))

| Parameters      | Low-R | Low-R, Low-C | High-R, Low-C | High-R, High-C |
|-----------------|-------|--------------|---------------|----------------|
| $T_{pd}$        | 0.8%  | 0.8%         | 0.6%          | 0.6%           |
| $Z_{in}$        | 5.9%  | 4.0%         | 1.9%          | 3.5%           |
| I <sub>cc</sub> | 0.9%  | 0.7%         | 1.9%          | 1.1%           |
| $V_{01}$        | 5.1%  | 3.3%         | 1.5%          | 3.2%           |
| V <sub>31</sub> | 16.3% | 9.5%         | 8.0%          | 17.7%          |

(c) Calculation / EM-simulation (Added resistive-loss of shielding layer M1 to (b))

| Parameters      | Low-R | Low-R, Low-C | High- <i>R</i> , Low- <i>C</i> | High-R, High-C |
|-----------------|-------|--------------|--------------------------------|----------------|
| $T_{pd}$        | 2.6%  | 1.7%         | 1.9%                           | 2.4%           |
| $Z_{in}$        | 2.0%  | 2.0%         | 2.6%                           | 2.5%           |
| I <sub>cc</sub> | 1.6%  | 0.9%         | 2.6%                           | 2.9%           |
| $V_{01}$        | 2.6%  | 2.6%         | 1.4%                           | 1.4%           |
| $V_{31}$        | 8.3%  | 4.4%         | 3.2%                           | 8.0%           |

# Definition of each group

| Group                         | Case number           | Comment                        |  |
|-------------------------------|-----------------------|--------------------------------|--|
| Low-R                         | Case 00-04            | M6 is included.                |  |
| Low- <i>R</i> , Low- <i>C</i> | Case 00-03            | M2 is not included from Low-R. |  |
| High-R, Low-C                 | Case 05-07, 09-10, 12 | M6 and M2 are not included.    |  |
| High-R, High-C                | Case 08, 11, 13, 14   | M2 is included.                |  |

Meaning of each parameter is as follows:

 $T_{pd}$ : Propagation delay

 $Z_{in}$ : Characteristic impedance

 $I_{cc}$ : Current consumption

 $V_{01}$ : Near-end output voltage

 $V_{31}$ : Far-end output voltage

## 4.3.3 Application to Advanced Processes and Higher Frequencies

Since this research assumed a specific condition such as Table 4.1, we must consider whether or not the proposed model can be applicable to advanced processes and much higher frequencies. The proposed model generates the transmission line model based on the physical information. Thus, it can be applicable to any process once process parameters are provided. In addition, since the empirical equations introduced in Section 4.2 are composed of the relative geometries such as  $w_0/s_0$ , the same accuracy should be obtained under process scaling. Next, as for even higher frequencies, skin, proximity effect and dielectric loss would not be significant problems even at 30-GHz according to [21]. Thus, the proposed model can be widely applicable to both advanced processes and even higher frequencies.

## 4.4 Optimization Methodology

#### 4.4.1 Optimization

In this section, we will introduce an optimization methodology that can treat the propagation delay, power consumption and output voltage swing at the same time using the proposed RLC-model. Fig. 4.9 shows the proposed optimization flowchart. This algorithm tries to seek for the smallest metal width, space and structure that can achieve the lowest power consumption within the given target specifications. It works as follows. (1) The designers set some target specifications and constraints as shown in Table 4.4. In this example, the target specifications are allowable propagation delay and output voltage swing. The fixed constraints are input voltage swing, length, frequency, driver and termination impedance. The variable constraints are the range of metal width, space and physical structures. (2) The absolute value of each *RLC* is calculated based on the equations introduced in Section 4.2.1 to 4.2.3 (3) Calculate the transmission line parameters such as  $T_{pd}$ ,  $V_{out}$  and  $I_{cc}$  using the proposed model introduced in Section 4.2.4 to 4.2.5. (4) Verify if  $T_{pd}$ ,  $V_{01}$  and  $V_{31}$  can meet the target specifications. If one of them is outside of the target specifications, go back to (2) and calculate *RLC* with different  $w_0$ and  $s_0$ . (5) If both are inside of the target specifications, verify if the power consumption is lower than before. If No, go back to (2) and calculate *RLC* with different  $w_0$  and  $s_0$ . (6) If Yes, overwrite  $w_0$  and  $s_0$  information in the memory, then go back to (2). (7) Iterate (2)

- (6) process until the final values of  $w_0$  and  $s_0$  are verified in the sweeping range. After this recursive process, the final  $w_0$  and  $s_0$  in the memory will be the optimized values that can achieve the lowest power consumption within the given target specifications.

Fig. 4.10 shows the optimization result. According to Excel-VBA-based calculation tool, it has computed the optimized result within 1 minute which is much faster than the conventional EM-simulation-based design methodologies. In Fig. 4.10, "0" means none of the cases achieved the target specifications. In this case, the proposed fully calculation-based optimization methodology anticipated that Case 00 with width =  $4\mu m$ , space =  $4\mu m$  can achieve the lowest power consumption. To justify this optimization result, we swept  $w_0$  and  $s_0$  for Case 00 (the best structure that the calculation expected) and Case 05 (no solution that the calculation expected) with EM-simulation at Fig. 4.7 configuration. Fig. 4.11 shows the EM-simulation results for propagation delay  $T_{pd}$ , far-end voltage swing  $V_{31}$  and power consumption  $I_{cc}$ . For  $T_{pd}$ , it meets the target specifications for both Case 00 and Case 05 as shown in Fig. 4.11 (a). For  $V_{31}$ , Case 00 meets the target specifications if width is larger than 2.0 $\mu m$  whereas Case 05 has no solution. For  $I_{cc}$ , smaller width and wider space are preferable. Therefore, the optimized result becomes width =  $4\mu m$  and space =  $4\mu m$  in Case 00. This is exactly what the optimization algorithm expected.

|                | Parameters                           | Value                                           |
|----------------|--------------------------------------|-------------------------------------------------|
| Tanaat         | Propagation delay $(T_{pd})$         | < 167ps (1/2 cycle time)                        |
| specifications | Min. output swing $(V_{01}, V_{31})$ | $> 0.3 V_{0p}$                                  |
| specifications | Current consumption $(I_{cc})$       | As small as possible                            |
|                | Signaling mode                       | Differential                                    |
|                | Input voltage ( $V_{00}$ )           | $1.0V_{0p}$                                     |
| Constraints    | Length $(l_0)$                       | 9mm                                             |
| (fixed)        | Frequency $(f)$                      | 3GHz                                            |
|                | Driver $(Z_{dr})$                    | 50Ω                                             |
|                | Termination $(R_t)$                  | 50Ω                                             |
|                | Physical structures                  | Table 4.2 (totally 15)                          |
| Constraints    | Width range $(w_0)$                  | 2μm–10μm, 2μm step (totally 5)                  |
| (sweep)        | Space range $(s_0)$                  | $2\mu m$ – $4\mu m$ , $2\mu m$ step (totally 2) |
|                | # of combinations                    | 150 ( =15*5*2 )                                 |

Table 4.4 Target specifications and constraints



Fig. 4.10 Optimization result calculated by VBA programming





Fig. 4.11 EM-simulation results in Case00 and Case05

#### 4.4.2 Transient Analysis

We assumed an ideal sinusoidal wave with 50  $\Omega$  driver impedance for transmission line modeling. However, the actual driver may have some non-linear characteristics. Hence, we compared the far-end voltage swing  $V_{31}$  for the following three cases between calculation-based model and EM-simulation-based model. For ideal linear 50 $\Omega$  impedance with sinusoidal wave (Fig. 4.12 (a)), shows a good match. For ideal linear 50 $\Omega$  impedance with a pulse wave (Fig. 4.12 (b)), shows some high frequency ripples. However, these would not be a serious problem for early design estimation purposes. For non-linear impedance with a pulse wave (Fig. 4.12 (b)), the proposed fully calculation-based transmission line model can be applicable for transient analysis as well.



Fig. 4.12 Transient analysis for  $V_{31}$ 

#### 4.5 Measurement Results

Fig. 4.13 shows the photograph of the fabricated test chip. For the physical structures, we selected Case 00 (M6) and Case 08 (M5//M4//M3//M2) from Table 4.2. Also, we set width = 4 $\mu$ m, space = 4 $\mu$ m to compare with the optimized result shown in section 4.4.1. The reasons for the selection of Case 00 and Case 08 are as follows. (1) Case 00 is to justify the optimized result in Fig. 4.10. (2) Case 08 is to verify the effects of large *RC* case. Note that large *RC* cases are likely to cause calculation error against EM-simulation as we stated in section 4.3. The measurements from the implemented on-

chip transmission line are recorded using a 50 $\Omega$  GSGSG probe station and Keysight technologies 4-port network analyzer as shown in Fig. 4.14. We simulated the generated s-parameter with Keysight technologies ADS to get transmission line parameters such as input impedance  $Z_0$ , current consumption  $I_{cc}$ , etc. Table 4.5 shows the evaluated results. From Table 4.5 (a) and (b), the measurement results for Case 00 showed good match with EM-simulation. From Table 4.5 (a) and (b), the comparison between calculation and measurement results for Case 08 showed large error for  $V_{31}$  however, this is same trend with Fig. 4.8. Therefore, the validity of our proposed model is proved.



Fig. 4.13 Photograph of the fabricated test chip



Fig. 4.14 Measurement setup

# Table 4.5 Measurement results of Case 00 and Case 08

(a) Case 00

Absolute value

| Parameters                  | Measurement | EM-simulation | Calculation |
|-----------------------------|-------------|---------------|-------------|
| $T_{pd}$ [ps]               | 40.7        | 38.0          | 36.9        |
| $Z_{in} [\Omega]$           | 63.7        | 63.2          | 70.1        |
| <i>I<sub>cc</sub></i> [mA]  | 8.9         | 8.7           | 8.3         |
| $V_{01}$ [V <sub>0p</sub> ] | 0.57        | 0.57          | 0.58        |
| $V_{31} [V_{0p}]$           | 0.38        | 0.40          | 0.41        |

# Error against measurement results

| Parameters             | Measurement | EM-simulation | Calculation |
|------------------------|-------------|---------------|-------------|
| $T_{pd}$               | 0.0%        | -6.5%         | -9.3%       |
| Z <sub>in</sub>        | 0.0%        | -0.8%         | 10.1%       |
| I <sub>cc</sub>        | 0.0%        | -1.6%         | -5.9%       |
| <i>V</i> <sub>01</sub> | 0.0%        | 1.0%          | 3.1%        |
| <i>V</i> <sub>31</sub> | 0.0%        | 3.3%          | 6.7%        |

# (b) Case 08

Absolute value

| Parameters                                | Measurement | EM-simulation | Calculation |
|-------------------------------------------|-------------|---------------|-------------|
| $T_{pd}$ [ps]                             | 65.2        | 60.3          | 58.2        |
| $Z_{in}[\Omega]$                          | 36.3        | 37.2          | 41.5        |
| $I_{cc}$ [mA]                             | 12.0        | 11.6          | 11.2        |
| <i>V</i> <sub>01</sub> [V <sub>0p</sub> ] | 0.44        | 0.46          | 0.47        |
| $V_{31} [V_{0p}]$                         | 0.27        | 0.30          | 0.37        |

# Error against measurement results

| Parameters      | Measurement | EM-simulation | Calculation |
|-----------------|-------------|---------------|-------------|
| $T_{pd}$        | 0.0%        | -7.4%         | -10.7%      |
| $Z_{in}$        | 0.0%        | 2.5%          | 14.4%       |
| I <sub>cc</sub> | 0.0%        | -3.5%         | -6.8%       |
| $V_{01}$        | 0.0%        | 4.6%          | 6.7%        |
| V <sub>31</sub> | 0.0%        | 11.8%         | 38.6%       |

# 4.6 Chapter Summary

In this chapter, we have introduced a novel on-chip transmission line modeling and optimization methodology for high-speed serial links. For transmission line modeling, we derived a fully calculation-based, simplified *RLC*-model for multiple interconnect layers. A five-wire GSGSG physical structure is converted to a simplified, equivalent single-ended RLC-model. Thanks to this simplification, a basic transmission line theory can be easily adopted. In addition, this simplified model not only shortens the computational time but also provide us intuitive insight. The maximum error of the proposed model is about 13.5% for propagation delay, 11.5% for input impedance, 8.5% for current consumption, 3% for near-end voltage swing and about 28% for far-end voltage swing at 9-mm length, 3-GHz frequency in TSMC 0.18-µm 1-poly 6-metal CMOS fabrication process. The large error (28%) of far-end voltage swing is comprised of empirical equations (10.5%), resistive-loss of shielding layer (9.5%) and LCresonance effect caused by large capacitance (4%). The remaining errors (4%) are due to accumulation of small effects such as skin, proximity, dielectric-loss and the limited number of components in our model etc. The easiest way to improve the accuracy is to avoid high capacitance cases. This condition improves the accuracy from 28% to 15%. This is accurate enough for early design estimation purposes. As for the optimization methodology, the proposed algorithm can seek for the smallest physical parameters such as metal width, space and structure while achieving the lowest power consumption from the given target specifications and constraints. These two novelty allow us to design high-speed clock distribution line instantaneously without using EM-simulations. As a result, the proposed on-chip transmission line modeling and optimization methodology can contribute to dramatic cost reduction and improvement of design quality.

# References

- P. J. Restle et al., "A clock distribution network for microprocessors," *IEEE Journal of Solid-State Circuits*, vol. 36, pp. 792-799, 2001.
- [2] A. J. Drake, K. J. Nowka, T. Y. Nguyen, J. L. Burns, and R. B. Brown, "Resonant clocking using distributed parasitic capacitance," *IEEE Journal of Solid-State Circuits*, vol. 39, pp. 1520-1528, 2004.
- P. E. Gronowski, W. J. Bowhill, R. P. Preston, M. K. Gowan, and R. L. Allmon, "High-performance microprocessor design," *IEEE Journal of Solid-State Circuits*, vol. 33, pp. 676-686, 1998.
- [4] M. Ichihashi and H. Kanaya, "A simple methodology of on-chip transmission line modeling for high speed clock distribution," *Extended Abstracts of the 2018 Int. Conf. on Solid State Devices and Materials*, p. 927, 2018.
- [5] P. Heydari, S. Abbaspour, and M. Pedram, "A comprehensive study of energy dissipation in lossy transmission lines driven by CMOS inverters," *Proceedings of the IEEE 2002 Custom Integrated Circuits Conference*, 2002.
- [6] M. A. Azadpour and T. S. Kalkur, "A clock interconnect extractor for multigigahertz frequencies incorporating inductance effect," *IEEE Transactions* on Very Large Scale Integration (VLSI) Systems, vol. 11, pp. 1143-1146, 2003.
- [7] M. Mondal and Y. Massoud, "Reducing pessimism in RLC delay estimation using an accurate analytical frequency dependent model for inductance," *ICCAD-2005*. *IEEE/ACM International Conference on Computer-Aided Design*, 2005.
- [8] A. B. Kahng and S. Muddu, "An analytical delay model for RLC interconnects," *IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems*, vol. 16, pp. 1507-1514, 1997.

- [9] J. Zheng, V. K. Tripathi, and A. Weisshaar, "Characterization and modeling of multiple coupled on-chip interconnects on silicon substrate," *IEEE 9th Topical Meeting on Electrical Performance of Electronic Packaging*, 2000.
- [10] R. Venkatesan, J. A. Davis, and J. D. Meindl, "Compact distributed RLC interconnect models—part IV unified models for time delay, crosstalk, and repeater insertion," *IEEE Transactions on Electron Devices*, vol. 50, pp. 1094-1102, 2003.
- [11] S.-C. Wong, G.-Y. Lee, and D.-J. Ma, "Modeling of interconnect capacitance, delay, and crosstalk in VLSI," *IEEE Transactions on Semiconductor Manufacturing*, vol. 13, pp. 108-111, 2000.
- [12] T. Sakurai, "Closed-form expressions for interconnection delay, coupling, and crosstalk in VLSIs," *IEEE Transactions on Electron Devices*, vol. 40, pp. 118-124, 1993.
- [13] X. Huang, P. Restle, T. Bucelot, Y. Cao, T.-J. King, and C. Hu, "Loop-based interconnect modeling and optimization approach for multigigahertz clock network design," *IEEE Journal of Solid-State Circuits*, vol. 38, pp. 457-463, 2003.
- [14] T.-C. Chen, S.-R. Pan, and Y.-W. Chang, "Performance optimization by wire and buffer sizing under the transmission line model," *Proceedings 2001 IEEE International Conference on Computer Design: VLSI in Computers and Processors. ICCD 2001*, 2001.
- [15] K. Banerjee and A. Mehrotra, "Analysis of on-chip inductance effects for distributed RLC interconnects," *IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems*, pp. 904-915, 2002.
- [16] Y. I. Ismail, E. G. Friedman, and J. L. Neves, "Equivalent Elmore delay for RLC trees," *IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems*, vol. 19, pp. 83-97, 2000.

- [17] M. A. El-Moursy and E. G. Friedman, "Power characteristics of inductive interconnect," *IEEE Transactions on Very Large Scale Integration (VLSI) Systems*, vol. 12, pp. 1295-1306, 2004.
- [18] Q. Zhu and W. W. M. Dai, "High-speed clock network sizing optimization based on distributed RC and lossy RLC interconnect models," *IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems*, vol. 15, pp. 1106-1118, 1996.
- [19] T. Sakurai, K. Tamaru, "Simple formulas for two-and three-dimensional capacitances," *IEEE Transactions on Electron Devices*, vol. 30, pp. 183-185, 1983.
- [20] J.-H. Chern, J. Huang, L. Arledge, P.-C. Li, and P. Yang, "Multilevel metal capacitance models for CAD design synthesis systems," *IEEE Electron Device Letters*, vol. 13, pp. 32-34, 1992.
- [21] M. Yao, X. Zhang, and C. Zhao, "Impact of skin effect, resistive and dielectric losses on the input voltage waveforms of current estimation for ULSI interconnects," *International Conference on Communications, Circuits and Systems*, 2013.
- [22] T. Asada, Y. Baba, N. Nagaoka, A. Ametani, J. Mahseredjian, and K. Yamamoto,
   "A study on basic characteristics of the proximity effect on conductors," *IEEE Transactions on Power Delivery*, vol. 32, pp. 1790-1799, 2017.
- [23] S. Mei and Y. I. Ismail, "Modeling skin and proximity effects with reduced realizable RL circuits," *IEEE Transactions on Very Large Scale Integration* (VLSI) Systems, vol. 12, pp. 437-447, 2004.
- [24] R.-J. Chan and J.-C. Guo, "Analysis and modeling of skin and proximity effects for millimeter-wave inductors design in nanoscale Si CMOS," 2014 9th European Microwave Integrated Circuit Conference, 2014.

- [25] M. Mondal, Y. Massoud, and Y. I. Ismail, "Accurate loop self inductance bound for efficient inductance screening," *IEEE Transactions on Very Large Scale Integration (VLSI) Systems*, vol. 14, pp. 1393-1397, 2006.
- [26] G. Zhong and C. K. Koh, "Exact closed form formula for partial mutual inductances of on-chip interconnects," *Proceedings. IEEE International Conference on Computer Design: VLSI in Computers and Processors*, 2002.
- [27] L. He, N. Chang, S. Lin, and O. S. Nakagawa, "An efficient inductance modeling for on-chip interconnects," *Proceedings of the IEEE 1999 Custom Integrated Circuits Conference*, p. 457, 1999.
## Design of LC Oscillator

A conventional *LC* oscillator-based on-chip clock distribution design requires a buffer stage due to the high frequency sensitivity. Therefore, the power consumption and jitter performance are limited by the buffer and repeater stages. In this section, we introduce a theory of low-frequency sensitivity bufferless *LC* oscillator that is directly able to drive a 10-mm on-chip clock distribution line in the TSMC 0.18- $\mu$ m 1-Poly 6-Metal CMOS fabrication process.

### 5.1 Introduction

#### 5.1.1 Background

High-speed clock distribution design is one of the most difficult and challenging task in recent digital VLSI systems. A huge amount of power can be dissipated in microprocessors, memories etc. due to the high swing rate and large capacitance with the increase of system frequency and chip size [1], [2], [3]. There are two different types of clock distribution structures. A global tree structure determines the system frequency. A differential signaling structure determines the bandwidth of high-speed serial links which generally require the highest frequency in the chip due to parallel to serial conversion [4] - [7] as shown in Fig. 5.1. In this research, we focus on Fig. 5.1 structure.

With the increasing frequencies and chip area in recent VLSI systems, the effects of inverter-based repeaters on the clock distribution line have been critical due to the following reasons. First, the power consumption is fundamentally high due to the large voltage swing (usually 0 to  $V_{dd}$ ). Second, a slew rate limitation due to the large swing makes the high-frequency operation difficult. Third, the high  $V_{dd}$  sensitivity make the jitter performance worse. A bufferless clock distribution architecture as shown in Fig. 5.2 can be one of the attractive solutions for the above stated problems. However, a theoretical tradeoff between the power consumption and oscillation frequencies defined as Eq. (5.1) - (5.4), makes such a bufferless structure difficult if a conventional *LC* oscillators are used to drive a large capacitive load. In Eq. (5.1) - (5.4), *f*, *C*<sub>D</sub>, *L*<sub>s</sub>, *R*<sub>s</sub>, and *R*<sub>p</sub> represent the oscillation frequency, frequency tuning capacitor, series inductance, series resistance, and equivalent parallel resistance, respectively. Fig. 5.3 shows the image of this serial to parallel conversion methodology. Note that this approximations are allowed in the limited narrow bandwidth only.



Fig. 5.1 Conventional repeater-based clock distribution



Fig. 5.2 Proposed directly driving (i.e. bufferless) clock distribution

$$f = \frac{1}{2\pi\sqrt{LC}} \tag{5.1}$$

$$I \propto \frac{1}{R_p} \tag{5.2}$$

$$R_p = \omega L_p Q \tag{5.3}$$

$$Q = \frac{\omega L_s}{R_s} \tag{5.4}$$

## Here

| f     | : oscillation frequency          | L              | : inductance of DCO   |
|-------|----------------------------------|----------------|-----------------------|
| С     | : capacitance                    | Ι              | : current cunsumption |
| ω     | : 2 <i>πf</i>                    | L <sub>s</sub> | : series inductance   |
| Q     | : quality factor                 | $R_s$          | : series resistance   |
| $R_p$ | : equivalent parallel resistance |                |                       |

 $L_p$  : equivalent parallel inductance



Fig. 5.3 Series to parallel conversion of inductor

#### 5.1.2 Previous Work

Many low-power and high-speed resonant clock distribution structures have been investigated as shown in [8] - [17]. For instance, a 1.5-GHz bufferless LC oscillator introduced by Mesgarzadeh et al [8], [9] achieved an ~57% lower power consumption as compared to a conventional repeater-based clock distribution architectures. However, the theoretical tradeoff between the power consumption and the oscillation frequencies explained in section 5.1.1 have not been solved as the circuit topology is based on the conventional LC oscillator shown in Fig. 5.3.

#### 5.1.3 Objectives and Scope of This Study

The goal of this study is to develop a low-frequency sensitivity *LC* oscillator to overcome the tradeoff between the oscillation frequency and power consumption. In this section, we propose a theory of novel low frequency sensitivity bufferless *LC* oscillator with the following conditions. A 10-mm differential signaling configuration is used for the clock distribution line structure as shown in Fig. 5.2. However, the 10-mm line has been designed as a meander line due to the limited chip area. The TSMC 0.18- $\mu$ m 1-poly, 6-metal CMOS process is used for the simulation environment due to the limited fabrication options available in our laboratory. We choose a target frequency 3-GHz since the maximum frequency in the worst case simulation condition at FO = 4 (Fan Out) is ~3-GHz for this fabrication process.

## 5.2 Circuit Descriptions and Characteristics

#### 5.2.1 Theory of Proposed LC Oscillator

Fig. 5.4 (a) shows the proposed bufferless *LC* oscillator. The *LC* tank is shared between the frequency tuning capacitor  $C_D$  and the load capacitance  $C_L$  by the inductor tap as it is drawn with red arrow. Fig. 5.4 (b) and Fig. 5.4 (c) show the equivalent half circuit model to calculate the resonant frequency and output voltage swing. In Fig. 5.4 (b), the parasitic resistance and the mutual inductance are excluded to make equations simpler. In Fig. 5.4 (c), serial to parallel conversion methodology shown in Fig. 5.3 is applied to make calculation intuitive. From Fig. 5.4 (b), the impedance from *LC* oscillator side can be expressed as (5.5). The solutions of Eq. (5.5) for a numerator and denominator equal to 0 are the series resonant frequency  $f_s$  and the two parallel resonant frequency  $f_{p1}$  and  $f_{p2}$  respectively. From Fig. 5.4 (c), we get the equivalent parallel resistance and *Q* using the same manner as (5.3) - (5.4). The output voltage swing  $V_{out}$  is reduced due to the ratio of  $R_{pall}$  and  $R_{p1}$ . Table 5.1 shows the summary of theoretical differences between the conventional and proposed topology.

$$Z_{DCO} = \frac{j\omega(L_1 + L_2 - \omega^2 L_1 L_2 C_L)}{1 - \omega^2 L_1 C_L - \omega^2 C_D (L_1 + L_2 - \omega^2 L_1 L_2 C_L)}$$
(5.5)



(a) Proposed LC oscillator



(b) Equivalent half circuit model for oscillation frequency calculation



(c) Equivalnet half circuit model for Q and  $V_{out}$  calculation

Fig. 5.4 Proposed bufferless LC oscillator

| Parameters         | Conventional                         | Proposed                                                       |
|--------------------|--------------------------------------|----------------------------------------------------------------|
| $f_s$              | -                                    | $\gamma_s \cdot f_0$                                           |
| $f_{p1}$           | $\gamma_{p1} \cdot f_0$              | $\gamma_{p1} \cdot f_0$                                        |
| $f_{p2}$           | -                                    | $\gamma_{p2} \cdot f_0$                                        |
| C                  | 1                                    | 1                                                              |
| J <sub>0</sub>     | $2\pi\sqrt{L_{all}C_D}$              | $2\pi\sqrt{L_{all}C_D}$                                        |
|                    |                                      | 1                                                              |
| $\gamma_s$         | -                                    | $\sqrt{\alpha_{TAP}(1-\alpha_{TAP})\cdot C_L/C_D}$             |
|                    | 1                                    | 1                                                              |
| $\gamma_{p1}$      | $\overline{\sqrt{1+rac{C_L}{C_D}}}$ | $\sqrt{1+lpha_{TAP}\cdot rac{C_L}{C_D}}$                      |
|                    |                                      | $f_s$                                                          |
| Yp2                | -                                    | $\overline{f_{p1}}$                                            |
| 0                  | $2\pi f_{p1}L_{all}$                 | $2\pi f_{p1}\{L_{all} - C_L(\alpha_{TAP}R_{all})^2\}$          |
| $Q_{all}$          | Rall                                 | $R_{all}$                                                      |
| R <sub>p.all</sub> | $Q_{all}^2 \cdot R_{all}$            | $Q_{all}^2 \cdot R_{all}$                                      |
| Vout               | $\frac{4}{-1\cdot R}$                | $\frac{4}{2} \cdot I \cdot R$ $\mu \cdot \alpha_{\pi} \cdot R$ |
| ·oui               | π                                    | $\pi$                                                          |

 Table 5.1 Theoretical differences between proposed and conventional

Here,

$$L_{all} = L_1 + L_2 \qquad L_1 = \alpha_{TAP} \cdot L_{all}$$
$$L_2 = (1 - \alpha_{TAP}) \cdot L_{all} \qquad R_{all} = R_1 + R_2$$
$$R_1 = \alpha_{TAP} \cdot R_{all} \qquad R_2 = (1 - \alpha_{TAP}) \cdot R_{all}$$

#### 5.2.2 Oscillation Frequency

Fig. 5.5 shows the simulation and calculation results of  $f_s$ ,  $f_{p1}$ ,  $f_{p2}$  under  $C_L$ =370fF,  $C_D$ =1.57pF,  $L_{all}$ =2.73nH,  $\alpha_{TAP}$ =0.5 at Fig. 5.4 (b) condition. The comparison result between calculation and simulation showed good agreement each other. Note that the oscillation frequency is determined by  $f_{p1}$  since impedance of  $f_{p1}$  is much higher than  $f_{p2}$ , in this example.



|            | $f_s$    | $f_{p1}$ | $f_{p2}$ |
|------------|----------|----------|----------|
| Simulation | 10.0-GHz | 2.36-GHz | 10.4-GHz |
| Theory     | 10.0-GHz | 2.30-GHz | 10.6-GHz |
|            |          |          |          |

Fig. 5.5 Simulation result of oscillation frequency at Fig. 5.4 (b) condition

#### 5.2.3 Frequency Sensitivity and Voltage Swing

Fig. 5.6 shows the test circuit to compare the frequency sensitivity and the voltage swing between the conventional and proposed *LC* oscillator. In Fig. 5.6,  $C_D$  and  $C_L$  are adjusted such that Fig. 5.6 (a) and Fig. 5.6 (b) have the same oscillation frequency at  $\Delta C_L=0$ . As for inductance model, we used s-parameter model which is extracted by Keysight technologies "Momentum-RF".



Fig. 5.6 Test circuit to compare conventional and proposed structure

Here,

| $\alpha_{TAP}$ | : TAP ratio $= 0.5$         | $I_{DC}$ | : fixed current source of 2mA |
|----------------|-----------------------------|----------|-------------------------------|
| L              | : 2.73nH at single-ended    | $C_D$    | : 1pF in (a), 1.26pF in (b)   |
| $M_N$          | : 60µm/0.18µm               | $C_L$    | : 370fF (fixed)               |
| $\Delta C_L$   | : variable output loading 0 | ~ 2.5pF, | 0.5pF step size               |

Fig. 5.7 and Fig. 5.8 show the comparison results between theory and simulation of  $f_{p1}$  and  $V_{out}$  when we sweep  $\Delta C_L$  from 0 to 2.5pF by 0.5pF step. From Fig. 5.7, the proposed structure maintains higher frequency than conventional one. Obviously, it means the frequency sensitivity of the proposed structure is lower than conventional one. From Fig. 5.8 (a), the proposed structure has higher  $V_{out}$  than conventional one when  $\Delta C_L$  is more than 2.5pF. This is because the proposed structure can gain higher Q due to the low frequency sensitivity of  $f_{p1}$  although  $V_{out}$  is divided by inductor tap ratio  $\alpha_{TAP}$  as shown in Table 5.1. Fig. 5.8 (b) shows the normalized  $V_{out}$  to confirm this effect. The loss of voltage swing of the proposed structure at  $\Delta C_L = 2.5$ pF is only 40% whereas the conventional one is 75%. This means as  $\Delta C_L$  becomes higher, the proposed structure shows the higher voltage swing than conventional one and divided voltage caused by  $\alpha_{TAP}$  will be no longer a drawback.







(a) Absolute value of Vout



(b) Normalized value of Vout

Fig. 5.8  $\Delta C_L$  dependency of  $V_{out}$  at  $\alpha_{TAP}=0.5$ 

Fig. 5.9 shows  $\alpha_{TAP}$  dependency derived from equations in Table 5.1. From Fig. 5.9 (a), as  $\alpha_{TAP}$  gets larger,  $f_{p1}$  gets lower whereas  $V_{out}$  gets larger. Therefore, fundamental tradeoff exists between  $f_{p1}$  and  $V_{out}$ . From Fig. 5.9 (b), the sensitivity of  $f_s$  and  $f_{p2}$  seems weak since these are always higher frequencies than  $f_{p1}$ . However, the magnitude of  $Z_{DCO}$  at  $f_{p2}$  must be taken care of thoroughly. For instance, in case of  $\alpha_{TAP}=0.7$ ,  $Z_{DCO}$  at  $f_{p2}$  exceeds that of  $\alpha_{TAP}=0.5$  as shown in Fig. 5.9 (c). This indicates that the oscillation frequency is determined by  $f_{p2}$  rather than  $f_{p1}$  and we may have undesired oscillation mode. In practical design,  $\alpha_{TAP}$  should be as big as possible such that the level shifting operation at receiver stages operate in reasonable power consumption while taking above tradeoffs into account.



(b)  $f_s$  and  $f_{p2}$ 



(c) Magnitude of  $Z_{DCO}$ 

Fig. 5.9  $\alpha_{TAP}$  dependency at  $\Delta C_L = 2.0 \text{pF}$ 

#### 5.2.4 Phase Noise

Fig. 5.10 shows the phase noise result simulated by Cadence spectre-RF under the condition of Fig. 5.6,  $\Delta C_L$ =2pF. The proposed structure achieved ~2dB better phase noise than conventional one at 1MHz offset frequency. This is because the proposed structure can gain higher Q due to the low frequency sensitivity as shown in Table 5.1. The proposed structure can achieve another ~2dB better phase noise if the frequency difference is taken into consideration. The total improvement can be ~4dB at 1MHz offset.



Fig. 5.10 Phase noise at  $\Delta C_L=2pF$ 

#### 5.3 Test Chip Implementation

Fig. 5.11 shows the block diagram of the test chip. It is comprised of two blocks, the *LC* oscillator core and the output buffer stage. The purpose of the output buffer stage is to check the power consumption of level shifting stages. The *LC* oscillator core directly drives a 10-mm on-chip clock distribution line. Since the generated differential output swing is severely reduced due to the resistive loss, it is amplified from a few hundred mV<sub>pp</sub> to  $V_{DD}$ =1.8V by level shift (LS) stage. After that, the differential signal is converted to single-ended signal and eventually drives output loading by 50Ω-buffer (BUF) stage. Fig. 5.12 shows the core circuit of proposed *LC* oscillator. It is comprised of cross-coupled NMOS, 720fF MIM-capacitor, 40µA-step current DAC and 5.7nH 8-shaped differential inductor introduced in Chapter-3. Note that capacitor tuning function has not been implemented since the main purpose of this study is to investigate the drivability of the proposed bufferless *LC* oscillator. Fig. 5.13, Fig. 5.14 and Fig. 5.15 show LS, BUF and the chip layout respectively. The 10-mm on-chip clock distribution line is organized as meander line due to the limited chip area. The core area of *LC* oscillator is only 270 x 280 µm<sup>2</sup>. The detailed design parameters are shown in Table 5.2.



Fig. 5.11 Block diagram of the test chip



Fig. 5.12 *LC* oscillator core circuit







Fig. 5.14 50 $\Omega$ -buffer (BUF) circuit



Fig. 5.15 Chip layout

Table 5.2 Design parameters

| LC oscilla | ator core :      |          |                       |
|------------|------------------|----------|-----------------------|
| $M_{N1}$   | : 2µm/0.72µm     | $M_{N2}$ | : 2µm/0.72µm *15LSB   |
| $M_{N3}$   | : 60µm/0.18µm    | $M_{N4}$ | : 60µm/0.18µm         |
| $M_{P1}$   | : 6µm/0.36µm     | $M_{P2}$ | : 36µm/0.36µm         |
| $R_1$      | : 27kΩ           | $R_2$    | : 0.1kΩ               |
| L          | : 5.7nH (diff.)  | $C_D$    | : MIM-capacitor 720fF |
| Clock dis  | tribution line : |          |                       |
| length     | : 10-mm          | layer    | : Metal-6 (top metal) |
| width      | : 2µm            | space    | : 2µm                 |
| R          | : 89Ω            | С        | : 2.36pF (extracted)  |
| Level shi  | ft stage (LS) :  |          |                       |
| $M_{P1}$   | : 16µm/0.18µm    | $M_{P2}$ | : 16µm/0.18µm         |
| $M_{N1}$   | : 8µm/0.18µm     | $M_{N2}$ | : 8µm/0.18µm          |
| 50Ω-buff   | er stage (BUF) : |          |                       |
| $M_{P1}$   | : 280µm/0.18µm   | $M_{N1}$ | : 180µm/0.18µm        |
| $M_{P2}$   | : 224µm/0.18µm   | $M_{N2}$ | : 112µm/0.18µm        |
| $M_{P3}$   | : 120µm/0.18µm   | $M_{N3}$ | : 60µm/0.18µm         |
| $M_{P4}$   | : 64µm/0.18µm    | $M_{N4}$ | : 32µm/0.18µm         |
| $M_{P5}$   | : 32µm/0.18µm    | $M_{N5}$ | : 16µm/0.18µm         |
| $M_{P6}$   | : 16µm/0.18µm    | $M_{N6}$ | : 8µm/0.18µm          |
| $M_{P7}$   | : 16µm/0.18µm    | $M_{N7}$ | : 8µm/0.18µm          |
| $R_{P1}$   | : 20Ω            | $R_{N1}$ | : 20Ω                 |
| Output lo  | ading :          |          |                       |
| $Z_t$      | : 50Ω            | $C_o$    | : 2pF                 |

## 5.4 **Post Simulation Results**

## 5.4.1 Summary of Post-Layout Simulation Results

The full-chip post layout simulation is performed based on Fig. 5.15 configurations by Cadence Spectre-RF. Table 5.3 and Fig. 5.16 show the summary of simulation results and transient waveforms.

| Process           | TSMC 1-Poly, 6-Metal CMOS         |
|-------------------|-----------------------------------|
| Area              | 270 x 280 μm <sup>2</sup>         |
| $V_{DD}$          | 1.8 V                             |
| Frequency         | 2.54 GHz                          |
| Phase noise       | -123 dBc/Hz at $\Delta f = 1$ MHz |
| Current           | I-DAC = 9d                        |
| LC-DCO core       | 2.2 mA                            |
| LS                | 5.4 mA                            |
| BUF               | 26.3 mA                           |
| Output voltage    | I-DAC = 9d                        |
| Vouti             | 0.31 V <sub>pp</sub> .single      |
| V <sub>OUT2</sub> | 0.20 V <sub>pp</sub> .single      |

Table 5.3 Summary of simulation results



(a) *V*<sub>OUT1</sub> outputs (*LC* oscillator outputs)



(b) *V*<sub>OUT2</sub> outputs (TAP outputs)



Fig. 5.16 Transient waveforms

## 5.4.2 Considerations on Post-Layout Simulation

This section discusses the accuracy of the simulation results. Firstly, we believe the accuracy of RC extraction is relatively accurate since it was performed in full-chip layout. However, the transient simulation results on Fig. 5.16 and phase noise results on Table 5.3 may have some errors due to the following reasons. As for transient simulation results, the convolution process from s-parameter to time-domain is strongly dependent on simulator. This effect may cause the output swing error of LC oscillator and affect to the level shift operation. As for phase noise analysis, transient-based periodic analysis is performed before it goes into noise analysis. Therefore, the same problem exists on convolution process and it may cause a phase noise error. These two concerns must be investigated in experimental results which will be discussed in Chapter-6.

## 5.5 System Comparison

From Section 5.2 to Section 5.4, we have characterized the proposed bufferless clock distribution system. However, the superiority from the system point of view has not been clear yet. Hence, we will compare the overall characteristics such as current, jitter and area between conventional (Fig. 5.1) and proposed (Fig. 5.2) structure. In this study, we set the following assumptions. For LC oscillator, clock distribution line and level shift, we use same condition as Table 5.2 and Table 5.3. For IO pitch, we set 500µm, so totally it will be 20-IOs in a 10-mm length. For the number of repeaters, we design FO = 2 inverter chain based on the process parameters shown in Table 5.4 which is extracted by 9-stage ring oscillator. Fig. 5.17 shows the example when we divide the 10-mm on-chip clock distribution line by  $N_{div} = 4$ . We will compare the the overall performance between Fig. 5.17 (a) and Fig. 5.17 (b) while sweeping  $N_{div}$ . For jitter analysis, we assumed 10-mVrms  $V_{DD}$  noise. Table 5.5 shows the comparison results. For the number of repeaters, the proposed structure not only requires no repeater, but also gives us layout design flexibility. For the current consumption and area, the conventional structure shows 13.1% and 6.7% better than the proposed one respectively. For deterministic jitter, the proposed structure shows 13 times better than conventional one when  $N_{div} = 4$ . Although jitter performance of the proposed structure is much better than the conventional one, the absolute value of conventional structure is not a significant problem at this frequency. Therefore, the proposed bufferless structure does not have a strong superiority in this 0.18µm process unless current consumption of level shift is reduced. This situation will be changed under the much higher frequency and sub-micron process as follows. If the operating frequency becomes higher, Q becomes higher owing to (5.2) - (5.4). As a result, the voltage swing of LC oscillator outputs becomes higher which reduces the current consumption at level shifting stages. Also, if submicron process is used, the current consumption at level shifting stages can be reduced by a process scaling factor. For instance, 90nm technology can achieve almost half the current of 0.18µm technology. As a reference, paper [4] achieved 1.75mW for the level converters and buffers under 3.125 GHz, 0.6 Vpp.single in 90nm technology.

| $C_{g.um}$   | : 1.74 fF/µm   | Gate capacitance per 1µm/0.18µm           |
|--------------|----------------|-------------------------------------------|
| $C_{m.um}$   | : 0.20 fF/µm   | wire capacitance per 1µm/0.18µm           |
| $T_{pd.1tg}$ | : 42.7 ps/gate | Propagation delay of 1-inverter           |
| $T_{pd.VDD}$ | : 23.1 ps/V    | V <sub>DD</sub> sensitivity of 1-inverter |

Table 5.4 Process parameters extracted by ring-oscillator



(a) Conventional repeater based structure ( $N_{div}=4$ )



(b) Proposed bufferless structure

Fig. 5.17 Test circuits for system comparison

## Here,

 $N_{div}$  : the number of division for clock distribution line

 $N_{rpt.global}$  : the number of global repeaters

- $N_{rpt.local}$  : the number of local repeaters
- $N_{rpt.total}$  : the total number of repeaters ( $N_{rpt.global} * N_{rpt.local} * 2$ )
- $N_{LS.total}$  : the total number of LS
- $l_{IO}$  : IO-pitch distance [µm]

## Table 5.5 System comparison results

## (a) The number of repeaters

| Conventional repeater based |    |    | Proposed |    |
|-----------------------------|----|----|----------|----|
| N <sub>div</sub>            | 2  | 3  | 4        | -  |
| N <sub>LS.total</sub>       | 1  | 1  | 1        | 20 |
| N <sub>rpt.global</sub>     | 2  | 3  | 4        | 0  |
| N <sub>rpt.local</sub>      | 7  | 6  | 6        | 0  |
| N <sub>rpt.total</sub>      | 28 | 36 | 48       | 0  |

## (b) Current consumption [mA]

| Conventional repeater based |      |      | Proposed |       |
|-----------------------------|------|------|----------|-------|
| LC oscillator               | 2.2  | 2.2  | 2.2      | 2.2   |
| LS                          | 5.4  | 5.4  | 5.4      | 108.2 |
| Repeater                    | 66.7 | 66.7 | 66.7     | 0     |
| Clock line                  | 21.6 | 21.6 | 21.6     | 0     |
| Total                       | 95.9 | 95.9 | 95.9     | 110.4 |

## (c) Deterministic jitter when $\Delta V_{DD}$ =10mVrms [ps]

|            | Conv  | Conventional repeater based |       |       |
|------------|-------|-----------------------------|-------|-------|
| LS         | 0.462 | 0.462                       | 0.462 | 0.462 |
| Repeater   | 3.234 | 4.158                       | 5.544 | 0     |
| Total      | 3.696 | 4.620                       | 6.006 | 0.462 |
| U.I. ratio | 0.019 | 0.023                       | 0.031 | 0.002 |
| LS         | 0.462 | 0.462                       | 0.462 | 0.462 |

U.I.: Unit Interval 1 / (2*f*<sub>0</sub>)

## (d) Area $[\mu m^2]$

| Conventional repeater based |       |       |       | Proposed |
|-----------------------------|-------|-------|-------|----------|
| LC oscillator               | 75600 | 75600 | 75600 | 75600    |
| LS                          | 999   | 999   | 999   | 19980    |
| Repeater                    | 12689 | 12589 | 12589 | 0        |
| Total                       | 89288 | 89188 | 89188 | 95580    |

Area is estimated based on actual layout.

LS (Fig. 5.13): 37 x 27 [µm<sup>2</sup>]

Repeater (48µm/0.18µm): 13.5 x 13.5 [µm<sup>2</sup>]

## 5.6 Chapter Summary

In this chapter, we introduced a theory of novel bufferless *LC* oscillator directly driving a 10-mm on-chip clock distribution line for high-speed serial links. The shared *LC* tank between the frequency tuning capacitor and the capacitive load of the clock distribution line can mitigate the frequency sensitivity. This feature makes a bufferless configuration possible. The layout is implemented under TSMC 0.18 $\mu$ m, 1-poly, 6-metal CMOS process and core area of *LC* oscillator is only 270 x 280  $\mu$ m<sup>2</sup>. The full-chip post layout simulation showed 2.54-GHz oscillation frequency, 2.2-mA current consumption and -123 dBc/Hz phase noise at 1MHz offset. We also showed that GHz-band with advanced fabrication process at least 90nm are preferable to make full use of the proposed bufferless structure to reduce the current consumption at level shifting stages.

#### References

- P. J. Restle et al., "A clock distribution network for microprocessors," *IEEE Journal of Solid-State Circuits*, vol. 36, pp. 792-799, 2001.
- P.E. Gronowski, W.J. Bowhill, R.P. Preston, M.K. Gowan, R.L. Allmon, "High-performance microprocessor design," *IEEE Journal of Solid-State Circuits*, vol. 33, pp. 676-686, 1998.
- [3] A.J. Drake, K.J. Nowka, T.Y. Nguyen; J.L. Burns, R.B. Brown, "Resonant Clocking Using Distributed Parasitic Capacitance," *IEEE Journal of Solid-State Circuits*, vol. 39, pp. 1520-1528, 2004.
- [4] J. Poulton, et al., "A 14-mW 6.25-Gb/s transceiver in 90-nm CMOS," *IEEE Journal of Solid-State Circuits*, vol. 42, pp. 2745-2757, 2007.
- [5] K. Fukuda, et al., "A 12.3-mW 12.5-Gb/s complete transceiver in 65-nm CMOS process," *IEEE Journal of Solid-State Circuits*, vol. 45, pp. 2838-2849, 2010.
- [6] T. Shibasaki, et al., "18-GHz clock distribution using a coupled VCO array," *IEICE Transactions on Electronics*, Vols. E90-C, pp. 811-822, 2007.
- [7] K. Hu, et al., "Comparison of on-die global clock distribution methods for parallel serial links," *International Symposium on Circuits and Systems*, 2009.
- [8] B. Mesgarzadeh, et al., "Low-power bufferless resonant clock distribution networks," *50th Midwest Symposium on Circuits and Systems*, 2007.
- [9] P.-Y. Lin, et al., "LC resonant clock resource minimization using compensation capacitance," *IEEE International Symposium on Circuits and Systems (ISCAS)*, 2015.
- [10] B. Mesgarzadeh, et al., "Jitter characteristic in charge recovery resonant clock distribution," *IEEE Journal of Solid-State Circuits*, vol. 42, pp. 1618-1625, 2007.

- [11] M. Ichihashi and H. Kanaya, "A low-power and GHz-band LCDCO directly drives 10mm on-chip clock distribution line in 0.18µm CMOS," *IEICE TRANSACTIONS on Fundamentals of Electronics, Communications and Computer Sciences*, Vols. E101-A, pp. 1907-1914, 2018.
- [12] F. O'Mahony, et al, "A low-jitter PLL and repeaterless clock distribution network for a 20Gb/s link," Symposium on VLSI Circuits, 2006. Digest of Technical Papers, 2006.
- [13] Y. Xu, et al., "Bufferless resonant clocking with low skew and high variationtolerant," *IEEE 54th International Midwest Symposium on Circuits and Systems* (MWSCAS), 2011.
- [14] Y. Xu and S.-M. Chen, "Research of transformer based bufferless resonant clock network," 10th IEEE International Conference on Solid-State and Integrated Circuit Technology, 2010.
- [15] F. O'Mahony, et al., "A 10-GHz global clock distribution using coupled standingwave oscillators," *IEEE Journal of Solid-State Circuits*, vol. 38, pp. 1813-1820, 2003.
- [16] S. Chan, et al., "A resonant global clock distribution for the cell broadband engine processor," *IEEE International Solid-State Circuits Conference - Digest of Technical Papers*, 2008.
- [17] M. R. Guthaus and B. Taskin, "High-performance, low-power resonant clocking," *IEEE/ACM International Conference on Computer-Aided Design (ICCAD)*, 2012.

# A Low-Power, High-Speed Bufferless Clock Distribution System

We have discussed some key designs such as inductor, on-chip transmission line and *LC* oscillator to realize a low-power, high-speed on-chip bufferless clock distribution system in the previous chapters. In this section, we present an experimental implementation of these integrated system. A low-frequency sensitivity bufferless *LC* oscillator that is directly connected to a 10-mm on-chip clock distribution line is fabricated in TSMC 0.18- $\mu$ m 1-Poly 6-Metal CMOS technology. The core area of the *LC* oscillator is only 270 × 280  $\mu$ m2. The measurement results show that a 2.8-GHz oscillation frequency, 3.3-mA current consumption, and -112.8 dBc /Hz phase noise at 1 MHz offset can be achieved.

## 6.1 Objectives and Scope of This Study

To realize a low-power, high-frequency bufferless clock distribution system, we have discussed following three key features. (1) A high-frequency, low-coupling 8-shaped differential inductor with PGS in chapter-2 [1]. (2) A simple methodology for on-chip transmission line modeling and optimization in chapter-3 [2], [3]. (3) A theory of low-frequency sensitivity *LC* oscillator in chapter-4 [4]. In this section, we will

present an experimental implementation of these integrated system with following conditions.

A differential signaling configuration in a 10-mm meander line is used for the clock distribution line due to the limited chip area. A length of 10-mm allows the evaluation of the transmission line effects. Note that the proposed optimization methodology is not applied to this transmission line this time since our modeling methodology is based on straight line. The test chip is fabricated in TSMC 0.18- $\mu$ m 1-poly, 6-metal CMOS process due to the limited options available in our laboratory. A 2.8-GHz target frequency was selected as the maximum frequency in the worst case simulation condition at FO = 4 is around 3-GHz for this fabrication process. This experimental results will not only prove the low-frequency sensitivity features but also reveal the effects of the transmission line on the *LC* oscillator [5]. In addition, convolution problems caused by s-parameter to time-domain introduced in section 5.4.2 can be clarified.

#### 6.2 Test Chip Implementation

Fig. 6.1 and Fig. 6.2 show the structure of the fabricated circuit and photograph of the chip. The circuit directly drives a 10-mm on-chip clock distribution line with the proposed low frequency sensitivity *LC* oscillator introduced in Chapter-5. The *LC* oscillator uses the 8-shaped differential inductor introduced in Chapter-3 with *L*=2.86 nH (at single-ended value),  $\alpha_{TAP}$ =0.5. The core area of *LC* oscillator is only 270 x 280 µm<sup>2</sup>. The 10-mm on-chip transmission line is designed as the meander line using a thick metal layer (M6). The metal width and space of 2-µm each has been selected due to the limited chip area and resistive loss point of view. The far-end differential signal is amplified and buffered using a 5-stage FO = 2 (Fan Out) current mode logic (CML) and finally output with an impedance of 50Ω to the PAD for the purposes of measurement.



Fig. 6.1 Circuit structure of the test chip



Fig. 6.2 Photograph of the test chip

## 6.3 Measurement Results and Analysis

## 6.3.1 Measurement Setup

Fig. 6.3 shows the measurement setup. The measurements of the proposed bufferless *LC* oscillator are recorded using a 50  $\Omega$  probe station (Fig. 6.4) and signal source analyzer (SSA) via semi-rigid cables. The gain-loss of this measurement setup is mainly from the cable-loss (2.6-dB) since the open-circuit input capacitance for GSGSG probe is only 5.4-fF. Note that we used a 39.5-dB Low Noise Amplifier (LNA) to for the

phase noise measurements as the output level was not as high as expected. This problem will be further discussed in Section 6.3.3.



Fig. 6.3 Measurement setup



Fig. 6.4 Probe station

## 6.3.2 Measurement and Simulation Results

Table 6.1 shows a comparison between the simulation and measurement results for the oscillation frequency  $f_{p1}$  and single-ended output swing  $V_{OUTP}$ . We compared the measurement with simulation results with both *RC* and *RLC* models to investigate whether the effects of the transmission line should be taken into account. In this research, we used the Cadence Assura-RCX and Integrand Software, Inc. EMX for the *RC* and *RLC* extraction, respectively. Fig. 6.5 shows the output spectrum and phase noise of  $V_{OUTP}$ .

Table 6.1 Comparison between measurement and simulation results

| Parameters        | Measurement | Simulation( <i>RC</i> ) | Simulation( <i>RLC</i> ) |
|-------------------|-------------|-------------------------|--------------------------|
| $f_{p1}$          | 2.83 GHz    | 2.54 GHz                | 2.74 GHz                 |
| V <sub>OUTP</sub> | -61.7 dBm   | 3.7 dBm                 | -20.6 dBm                |

Condition:  $V_{dd1} = V_{dd2} = 1.8V$ 

Gain of LNA (39.5 dB) has been deducted from Voute.



Fig. 6.5 Plots of the output spectrum and phase noise with LNA

#### 6.3.3 Analysis

This section analyzes the measurement and simulation results. First, the oscillation frequency of the *RLC* model shows a better match compared to the *RC* model. This indicates that the transmission line effects must be considered under this frequency and length. The RLC model shows a higher oscillation frequency as compared to the RC model since the parasitic capacitances have been canceled out to a certain extent by LC resonance mechanism. Second, the *RLC* model shows a better match for the output swing as compared to the RC model. This indicates that the transmission line effects should be taken into account as also seen in the case of the oscillation frequency [6], [7], [8]. The difference of the output swing between the RLC and the RC model can be intuitively explained as follows. A reflection wave is produced as the far end is not terminated (=open) under the condition of 3-GHz oscillation frequency and 10-mm length. There is a phase shift of ~144 degrees when the reflection-wave returns to the near end if a relative dielectric constant  $\varepsilon_r = 4$ . This intuitive insight can also been understood by the S11 Smith chart of the 10-mm transmission line as shown in Fig. 6.6. Thus, the swing of the *RLC* model becomes much smaller than that of the *RC* model in this condition. This synthesized waveform by the forward and backward (reflection) wave is called standing-wave. However, even a *RLC*-based model, there is a significant error compared with the measurement results. We believe that the convolution process from the s-parameter (frequency-domain) to the time-domain is the source of this error as this is strongly dependent on the simulator. Next, we evaluate the figure-of-merit (FoM) expressed by Eq. (6.1) - (6.2). Table 6.2 shows the comparison results. FoM<sub>A</sub> of [9] and FoM of [11] have much better performance as compared to the results from our research, which might be due to the following reasons. First, the major difference with respect to the FoM<sub>A</sub> of [9] results from the inductor area. We use a simple 2D structure whereas [9] has a stacked 3D structure. Second, there are two major differences with respect to the FoM of [11], namely the circuit structure and capacitive load. The proposed LC oscillator only has Nch for the negative trans-conductance generation whereas [11] has both the Pch and Nch which will cause 3-dB improved transconductance. This results in 3-dB improved power consumption. Next, the capacitive load in our research is around 2.5-pF while [11] has only 0.5-pF, which results in an

oscillation frequency difference of ~6-dB. Thus, the FoM of [11] is fundamentally 9-dB better than that seen in our research. However, a further improvement (minimum 10-dB) is necessary for the proposed structure to be comparable with other state-of-the-art structures. Some solutions to achieve this are as follows: (1) The use of both the Pch and Nch for the improved trans-conductance. (2) The selection of an appropriate transmission line length and frequency to have a higher output swing (3) The selection of even higher frequencies to improve the quality factor of inductor. However, this may also need advanced technologies at least 90nm for the improvement of operating speed of internal logic circuits.

$$FoM = \mathcal{L}(\Delta f) - 20\log\left(\frac{f_0}{\Delta f}\right) + 10\log\left(\frac{P_{dc}}{1mW}\right)$$
(6.1)

$$FoM_A = FoM + 10log\left(\frac{Area}{1mm^2}\right)$$
(6.2)

Here,

| $f_0$    | : oscillation frequency | ∆f                      | : offset frequency          |
|----------|-------------------------|-------------------------|-----------------------------|
| $P_{dc}$ | : power consumption     | $\mathcal{L}(\Delta f)$ | : phase noise at $\Delta f$ |

|                                  | [9]    | [10]    | [11]   | [12]   | This work |
|----------------------------------|--------|---------|--------|--------|-----------|
| Tech [nm]                        | 65     | 180     | 130    | 90     | 180       |
| Area [µm <sup>2</sup> ]          | 484    | 1260460 | 695520 | 1739   | 75600     |
| $\mathcal{L}(\Delta f)$ [dBc/Hz] | -110   | -116    | -120.6 | -93    | -123      |
| fo[GHz]                          | 21     | 5.32    | 5.29   | 5.35   | 2.54      |
| $\Delta f$ [MHz]                 | 10     | 1       | 1      | 1      | 1         |
| Current [mA]                     | 3.20   | 3.17    | 1.32   | 1.00   | 2.21      |
| $V_{DD}$ [V]                     | 0.6    | 1.8     | 1.5    | 1.0    | 1.8       |
| $P_{dc}$ [mW]                    | 1.92   | 5.71    | 1.98   | 1.00   | 3.98      |
| FoM[dBc/Hz]                      | -173.6 | -183.0  | -192.1 | -167.6 | -185.1    |
| $FoM_A[dBc/Hz]$                  | -206.8 | -181.9  | -193.7 | -195.2 | -196.3    |

Table 6.2 Comparison results for the FoM and FoM<sub>A</sub> from various studies



Fig. 6.6 The S11 smith chart of the 10-mm transmission line (input: 50  $\Omega$ , output: open)

### 6.4 Chapter Summary

In this chapter, we have introduced a low-frequency sensitivity bufferless *LC* oscillator that is directly connected to a 10-mm on-chip clock distribution line which is fabricated using the TSMC 1-poly, 6-metal CMOS technology. The core area of *LC* oscillator is only  $270 \times 280 \ \mu\text{m}^2$ . The measurement results showed the relatively good agreement with simulation results except for the output swing. The source of output swing error might be caused by the convolution process which is strongly dependent on the simulation tools. This experimental results not only prove the low-frequency sensitivity features but also reveal the effects of the transmission line on the *LC* oscillator. The standing wave mode happens in this open termination architecture. Hence, appropriate selections between transmission line length and frequency are vital to have high output swing. The measurement results showed that a 2.83-GHz frequency, 3.3-mA current consumption, and -112.8 dBc/Hz phase noise at 1-MHz offset which is comparable to other state-of-the-art *LC* oscillators.

#### References

- M. Ichihashi and H. Kanaya, "A high-frequency, low-coupling 8-shaped differential inductor with patterned ground shield," *Microwave and Optical Technology Letters*, vol. 60, pp. 2704-2707, 2018.
- M. Ichihashi and H. Kanaya, "A simple methodology of on-chip transmission line modeling for high speed clock distribution," *Extended Abstracts of the 2018 International Conference on Solid State Devices and Materials*, pp. 927-928, 2018.
- [3] M. Ichihashi and H. Kanaya, "A simple methodology for on-chip transmission line modeling and optimization for high-speed clock distribution," *Japanese Journal of Applied Physics*, vol. 58, p. Number SB, 2019.
- [4] M. Ichihashi and H. Kanaya, "A low-power and GHz-band LCDCO directly drives 10mm on-chip clock distribution line in 0.18µm CMOS," *IEICE TRANSACTIONS on Fundamentals of Electronics, Communications and Computer Sciences,* Vols. E101-A, pp. 1907-1914, 2018.
- [5] M. Ichihashi and H. Kanaya, "3.3-mA 2.8-GHz bufferless LC oscillator directly driving a 10-mm on-chip clock distribution line," *IEICE Electronics Express*, vol. 16, pp. 1-5, 2019.
- [6] K. Banerjee and A. Mehrotra, "Analysis of on-chip inductance effects for distributed RLC interconnects," *IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems*, vol. 21, pp. 904-915, 2002.
- [7] M. A. El-Moursy and E. G. Friedman, "Power characteristics of inductive interconnect," *IEEE Transactions on Very Large Scale Integration (VLSI) Systems*, vol. 12, pp. 1295-1306, 2004.

- [8] P. Heydari, et al., "A comprehensive study of energy dissipation in lossy transmission lines driven by CMOS inverters," *Proceedings of the IEEE 2002 Custom Integrated Circuits Conference*, 2002.
- [9] R. Murakami, et al., "A 484- m2 21-GHz LC-VCO beneath a stacked-spiral inductor," *Microwave Conference (EuMC)*, 2010.
- [10] A. El Oualkadi, "5-GHz low phase noise CMOS LC-VCO with PGS inductor suitable for ultra-low power applications," *Microwave Symposium (MMS)*, 2009.
- [11] Y. Shin, et al., "A low phase noise fully integrated CMOS LC VCO using a largegate length pMOS current source and bias filtering technique for 5-GHz WLAN," *Signals, Systems and Electronics*, 2007.
- [12] L. Iotti, et al., "Insights into phase-noise scaling in switch-coupled multi-core LCVCOs for E-band adaptive modulation links," *IEEE Journal of Solid-State Circuits*, vol. 52, pp. 1703-1718, 2017.

## **Conclusions and Future Works**

## 7.1 Conclusions

This dissertation presents a development of low-power, high-frequency on-chip clock distribution system for high-speed serial links. The proposed architecture can directly drive a 10-mm on-chip transmission line using a novel shared *LC* resonance mode without any buffers and repeaters. This simplified and highly efficient design makes low-power, high-frequency and area-saving operation possible compared with the other state-of-the-art structures. The significant contributions of this dissertation can be summarized as follows:

A. A high-frequency, low-coupling 8-shaped differential inductor with patterned ground shield

Inductors are one of the most important parts to determine the performance of LC oscillators. The proposed inductor can achieve almost twice higher self-resonant frequency compared to conventional differential inductors while keeping the high-symmetry and low-coupling. The PGS structure maximizes Q-factor and improves EM-simulation time owing to the reduced components. The experimental results showed good match with EM-simulation.

B. A simple methodology for on-chip transmission line modeling and optimization for high-speed clock distribution

The clock distribution line of the proposed architecture must be treated as a transmission line due to the long wire length caused by the absence of buffers and repeaters. The proposed fully calculation-based design methodology of on-chip transmission line converts a five-wire of GSGSG physical model to a single-ended *RLC*-distributed model. Thanks to this simplification, a general transmission line theory can be easily adopted. The experimental results of this proposed model showed good match with both calculation and EM-simulation. The proposed optimization algorithm can find the smallest metal width, space and structure while achieving the lowest power consumption from the given target specifications such as propagation delay and output swing which significantly improves design time and quality.

#### C. A theory of low-frequency sensitivity LC oscillator

The frequency sensitivity of *LC* oscillators must be reduced to realize the proposed bufferless on-chip clock distribution line system. The proposed *LC* oscillator is able to achieve a low frequency sensitivity due to the unique shared *LC* tank structure. The proposed theory showed good match with SPICE simulation. The system comparison between the conventional repeater-based and the proposed structure was also discussed. Advanced processes at least 90nm are preferable for the proposed bufferless structure to reduce the power consumption at the level shift stages.

D. A low-power, high-speed on-chip clock distribution line system

The test chip of the proposed bufferless on-chip clock distribution line system was implemented under TSMC 1-poly 6-metal CMOS fabrication process. The measurement results showed a 2.8-GHz oscillation frequency, 3.3-mA current consumption and -112.8 dBc/Hz phase noise at 1MHz which is comparable to the other state-of-the-art systems.

#### 7.2 Future Works

In this dissertation, the on-chip clock distribution line was designed as meander line due to the limited chip area. For this reason, the optimization methodology of onchip transmission line is not applied to the test chip design. The following works will improve the power-efficiency even better:

(1) To incorporate the optimization methodology for on-chip clock distribution line design

By selecting an appropriate frequency and length of transmission line, low-power operation can be possible.

(2) The study of standing wave mode signaling

As the length of transmission line becomes longer, the output voltage swing becomes smaller due to the resistive-loss of wire. By placing some inductors to appropriate locations, it is possible to mitigate this degradation.