# 九州大学学術情報リポジトリ Kyushu University Institutional Repository

# Reduction of Coupling Effects by Optimizing the 3-D Configuration of the Routing Grid

Sakai, Atsushi

The Materials and Devices Development Center BU, Sanyo Electric Co., Ltd.

Yamada, Takashi

The Materials and Devices Development Center BU, Sanyo Electric Co., Ltd.

Matsushita, Yoshifumi

The Materials and Devices Development Center BU, Sanyo Electric Co., Ltd.

Yasuura, Hiroto

The Graduate School of Engineering Sciences, Kyushu University

https://hdl.handle.net/2324/6794468

出版情報:IEEE Transactions on Very Large Scale Integration (VLSI) Systems. 11 (5), pp.951-954, 2003-10. Institute of Electrical and Electronics Engineers

· バージョン: 権利関係:



# Reduction of Coupling Effects by Optimizing the 3-D Configuration of the Routing Grid

Atsushi Sakai, Takashi Yamada, Yoshifumi Matsushita, and Hiroto Yasuura

Abstract—In this brief, we propose a new physical design technique for a subquarter micrometer system-on-a-chip (SoC). By optimizing the individual layer's routing grid space, coupling effects such as crosstalk noise, crosstalk-induced delay variations, and coupling power consumption are almost eliminated with little runtime penalty. Experiments are performed on the design of an image processing circuit using a subquarter micron CMOS process with multilayer interconnects. Simply by employing our proposed technique, the maximum delay and the power consumption can be decreased simultaneously by up to 15% and 10%, respectively, without any other process improvements.

Index Terms—Analysis, delay, design, digital-CMOS, performance.

#### I. INTRODUCTION

In the case of system-on-a-chip (SoCs) fabricated in subquarter micron or nanometer-scale technologies, on-chip interconnection is an important performance limiting factor [1]. With technology scaling, the transistor size, the line width, and the line-to-line spacing have all been scaled. However, the line height has not yet been scaled in order to avoid any increase in the wiring resistance that scaling would cause. Therefore, the proportion of horizontal line-to-line capacitance with respect to the overall capacitance increases. This large horizontal line-to-line capacitance results in unwanted coupling effects such as an increase in crosstalk noise [2], [3], crosstalk-induced delay variations [4], [5], and coupling power consumption [6], [7].

Crosstalk noise is a voltage change induced by cross-coupling capacitance in adjacent lines. It causes logical malfunctions because of the distortion of signals by noise [8].

Crosstalk-induced delay variations arise from the large variation in effective line-to-line capacitance [3], [9]. If opposite-direction switching occurs in adjacent lines, the effective line-to-line capacitance will be doubled. By contrast, if same-direction switching occurs, the line-to-line capacitance is considered to be nonexistent. Because it is practically impossible to identify the state of the switching activity occurring between adjacent lines, both opposite- and same-direction switching must be considered to satisfy timing constraints. Therefore, timing closure becomes even more difficult if significant crosstalk noise occurs.

Coupling power consumption is the energy dissipation that charges and discharges the coupling capacitance between wires. The power also varies with the switching activity of the signals in adjacent lines [7]. Therefore, power constraints should be satisfied in worst-case switching conditions.

To reduce these unwanted coupling effects, various circuitry and routing design techniques have been proposed. When considering circuitry techniques, crosstalk noise was reduced by inserting repeaters, whereby the charge retaining capability of the victim nets was improved [10], [11]. In addition, in the case reported in [12], the variations in effective line-to-line capacitance caused by switching activity were mitigated by inserting repeaters (inverters) between adjacent lines

Manuscript received October 9, 2002; revised February 27, 2003.

A. Sakai, T. Yamada and Y. Matsushita are with the Materials and Devices Development Center BU, Sanyo Electric Co., Ltd., Gifu 503-0195, Japan.

H. Yasuura is with the Graduate School of Engineering Sciences, Kyushu University, Fukuoka 816-8580, Japan.

Digital Object Identifier 10.1109/TVLSI.2003.817126

at staggered points. The changes made to the circuitry would have a direct effect on timing or power, and might result in increasing the chip area. Although the insertion of repeaters at staggered points is effective in a handcrafted or semi-custom design, it is difficult to locate the most suitable insertion points using the automatic place and route methodology that is common in today's large-scale SoCs design. Furthermore, the increase in the number of repeaters is directly linked to an increase in power consumption.

Routing techniques were also proposed as a solution to reduce the coupling effects. The first method proposed was to change the line width and space, while keeping the line pitch given by the design rules [12], [13]. By using this method, the line capacitance or line resistance changes considerably if the line widths are widened to any great extent, and as a result the timing is unfavorably affected. The second method was to increase the distance between adjacent lines for selected critical nets [14]. By using this method, the degree of congestion in unselected nets increases and as a result the overall crosstalk noise occurring on a chip cannot be decreased. The third method was to establish a power line and other lines that have no electric-potential variations on either side of a noise-sensitive line to shield it [12], [15], [16]. By using this method, many additional routing resources are required for the shielding and the design complexity is increased. Moreover, additional coupling capacitance still remains between the signal and the shielding lines.

To address these problems, we propose new physical design techniques for large-scale SoCs by optimizing the three-dimensional (3-D) configuration of the routing grid for the automatic place and route methodology. The correlation between the routing grid settings and crosstalk noise, timing, and power are also investigated in developing our techniques. Post-layout simulation shows that coupling effects can be reduced without area penalty.

### II. TECHNIQUES TO REDUCE COUPLING EFFECTS

In the automatic place-and-route methodology for large-scale SoCs layouts, the grid-based model and the reserved-layer model are generally used [17]. Since many routing resources are needed to avoid short circuits between wires, the routing grid pitch is calculated according to design rules and the smallest value is generally adopted.

Line-to-line coupling capacitance between signal wires is the main cause of coupling effects such as crosstalk noise, crosstalk-induced delay variations and coupling power consumption. This capacitance can be separated into two components; the horizontal line-to-line capacitance  $(C_H)$  and the vertical line-to-line capacitance  $(C_V)$ . In deep subquarter micron SoCs,  $C_H$  is between 2.4–5.6 times larger than  $C_V$  [18]. Therefore, reducing  $C_H$  by increasing the horizontal line-to-line space is considered to be the most effective approach to reduce coupling effects.

To increase the horizontal line-to-line space, widening the routing grid pitch on the same layer seems to be the most attractive approach. However, if this approach is employed in isolation, a shortage of routing resources occurs, resulting in an extraordinary increase in runtime at the detailed routing stage.

The objective of this study is to address the tradeoff between the reduction of coupling effects and the shortage of routing resources. We propose a 3-D optimization technique that can be used to tune the routing grid configuration, both in the horizontal and vertical directions. We elaborate on the details of this technique in the subsequent sections.

# A. Optimizing the 3-D Configuration of Routing Grid

An effective way to decrease the coupling effects and maintain the routing resources is to focus on the layers that have many long wires.



Fig. 1. Three-dimensional (3-D) routing grid optimization technique.

For this purpose, we propose a new routing technique that has two principal features as follows:

- 1) Increase the routing grid pitch in an individual layer in an incremental manner to reduce  $C_H$ , thereby, reducing the coupling effects.
- 2) In addition to 1), choose appropriate layers for widening the routing grid pitch by considering the tradeoff between reducing the coupling effects and the routing resources.

These concepts are illustrated in Fig. 1. In this four-metal layer example, Ma and Mc run horizontally, and Mb and Md run vertically. We can control the routing grid pitch in Ma, Mb, Mc, and Md independently.

We respecify the minimum routing pitch for various layers from their original values, then reroute, re-extract, and perform timing, noise, power, and runtime analyses. These results are compared to the original design to find the optimal wiring pitches by evaluating the cost function (COST)

$$COST = CN^{\alpha}PD^{\beta}PW^{\gamma}RT^{\delta}.$$
 (1)

Here, CN, PD, PW, and RT are the number of crosstalk-violated nets, the maximum delay, net switching power, and routing runtime, respectively, and  $\alpha,\,\beta,\,\gamma,$  and  $\delta$  are weighting factors for these four parameters, which are heavily weighted if the parameter is important but are otherwise zero if the parameter is negligible. The correlation between these parameters can be checked for various routing grid settings to find the optimal one that gives the minimum COST value. Some "interconnect rich" designs are possibly difficult to route when the routing grid pitch is a minimum. In this case, we should change the placement to ease the congestion by retuning the cell utilization ratio.

To save on the overall design period, our technique should be applied to the global routing instead of the detailed routing for all settings. In this case, RT can be substituted for the number of violations (or other parameters that can express routing congestion).

#### B. Timing Analysis

After the routing has been completed, gate-level timing analysis is performed to estimate the maximum delay. To increase the accuracy of analysis, actual layout information is used. The parasitic components of wires including the coupling capacitance are extracted by using a 3-D field solver.

Timing analysis is performed taking the crosstalk noise into consideration. We change the coupling capacitance into the effective capacitance. To consider the best and worst case switching activity in adjacent lines, the effective switching factor is set between 0 to 2. Coupling from all adjacent lines is summed up to estimate the effective capacitance of each net.

TABLE I
WIRE DIMENSIONS FOR EACH ROUTING GRID SETTING

| Routing grid pitch [a.u.] | Width [a.u.] | Space [a.u.]<br>0.56 |  |
|---------------------------|--------------|----------------------|--|
| 1.00                      | 0.44         |                      |  |
| 1.25                      | 0.44         | 0.81<br>1.06<br>1.31 |  |
| 1.50                      | 0.44         |                      |  |
| 1.75                      | 0.44         |                      |  |
| 2.00                      | 0.44         | 1.56                 |  |

## C. Power Analysis

Gate-level power analysis is also performed to estimate the power consumption. Due to the limitations of the analyzer, the effective switching factor is fixed at 1 for all conditions.

Power libraries are characterized by SPICE simulation, taking the operating mode into consideration. For example, RAMs are characterized separately in read and write mode. The overall power consumption is summed up for three components, such as 1) the dynamic cell internal power, 2) the net switching power, and 3) the cell-leakage power [19]. While estimating 1) or 2) by using a static-based approach, the switching activity ratio must be considered. Since this ratio is difficult to estimate, we use 0.2 for all nets, a value that is frequently used with reasonable accuracy [3].

#### III. EXPERIMENTAL RESULTS

The methods proposed in Section II are used to layout an image processing circuit of the 100 000-instance scale, which is part of a SoCs fabricated in a 0.13- $\mu$ m CMOS six-copper layers technology. Experiments were performed using these circuits to see if the delay and power could be decreased with little sacrifice of the runtime. In this design, the cell utilization ratio was kept at around 70%, which has no severe congestion spots for routing. If this ratio exceeds 70%, it is difficult to complete the detailed routing, or the maximum delay is increased due to the increase of  $C_H$ .

The routing grid pitch was increased in five steps from a value specified according to the minimum rule to a value twice as large as described in Table I. For this design, we defined two cases of layer selection for widening the routing grid pitch. For the first of these, we chose the third to sixth layers (case 1) and for the other we chose the fourth to sixth layers (case 2).

The timing improvements achieved in cases 1 and 2 are shown in Figs. 2 and 3, respectively. Clock 1 was operated at 72 MHz and Clock 2 was at 114 MHz. In case 1, the maximum delays of Clock 1 and Clock 2 were decreased by increasing the routing grid pitch both in the cases in which crosstalk noise was considered (switching factor = 2) or was not considered (switching factor = 1). This fact indicates that not only reducing the crosstalk noise but also simply reducing  $C_H$  are effective for reducing the maximum delay. We estimate the maximum delay improvements for case 1 are 4–8% for Clock 1 and 5–15% for Clock 2. On the contrary, little timing improvements are observed in case 2.

The net switching power improvements achieved in cases 1 and 2 are shown in Fig. 4. By increasing the routing grid pitch, the power also decreased until the pitch reached 1.5, but on widening beyond 1.5 the power maintained the same constant value. As a result, the maximum improvements in net switching were 9% in case 1 and 5% in case 2.

To find the optimum settings for the configuration of the routing grid, the cost function is employed. The cost is evaluated based on (1). Here, to evaluate the tradeoff between timing improvement and runtime, we set  $\alpha=\gamma=0$ ,  $\beta=3$ , and  $\delta=0.25$ . In terms of PD, the average improvement for Clock 1 and Clock 2 are considered. The switching





Fig. 2. Timing improvement in case 1: (a) Clock 1 (72 MHz) and (b) Clock 2 (114 MHz).





Fig. 3. Timing improvement in case 2: (a) Clock 1 (72 MHz) and (b) Clock 2 (114 MHz).

factor is assumed to be 2. The results of the cost calculation are displayed in Fig. 5, which shows that a pitch of 1.5 for case 1 is the best solution for achieving minimum cost. If evaluating the tradeoff between



Fig. 4. Power improvement in case 1 and 2.



Fig. 5. Cost evaluation ( $\alpha = \gamma = 0, \beta = 3$ , and  $\delta = 0.25$ ).

TABLE II SUMMARY OF DELAY AND POWER IMPROVEMENT

|                              |         | Timing[%]  | Power [%] |
|------------------------------|---------|------------|-----------|
| Case 1 (optimizing M3,4,5,6) | Clock 1 | 4.3 - 8.4  | 8.7       |
|                              | Clock 2 | 5.1 - 14.7 |           |
| Case 2 (optimizing M4,5,6)   | Clock 1 | 0.6 - 1.6  | 3.5       |
|                              | Clock 2 | 1.5 - 6.9  |           |

power improvement and runtime by setting  $\alpha = \beta = 0$ ,  $\gamma = 4$  and  $\delta = 0.2$ , the same solution can be obtained.

For this design, a pitch of 1.5 for case 1 is the best solution for achieving the maximum improvements in delay (15%) and power (9%) with little runtime penalty (within fourfold). If we widened the routing grid pitch by more than 1.5, the improvement seems to saturate. One reason for this could be the increase in detour or off-grid routing due to the shortage of routing resources.

The delay and power improvements are summarized in Table II. This technique can achieve dramatic improvements once the optimum routing grid settings are obtained.

#### IV. CONCLUSION

In this paper, we demonstrated that coupling effects such as crosstalk noise, crosstalk-induced delay variations, and coupling power consumption can be reduced by optimizing the 3-D configuration of the routing grid without any process improvements or area penalty. Experimental results showed that the maximum delay and

the net switching power consumption were reduced by up to 15% and 10%, respectively.

#### REFERENCES

- J. Cong, L. He, K.-Y. Khoo, and Z. Pan, "Interconnect design for deep submicron IC's," in *Proc. ICCAD*, 1997, pp. 478–485.
- [2] L. Gal, "On-chip cross talk," in Proc. CICC, 1995, pp. 251–254.
- [3] D. Sylvester and K. Keutzer, "Getting to the bottom of deep submicron," in *Proc. ICCAD*, 1998, pp. 203–211.
- [4] F. Dartu and L. T. Pileggi, "Calculating worst-case gate delays due to dominant capacitance coupling," in *Proc. DAC'97*, 1997, pp. 46–51.
- [5] A. B. Kahng, S. Muddu, and D. Vidhani, "Noise and delay uncertainty studies for coupled RC interconnects," in *Proc. IEEE Int. ASIC/SOC Conf.*, 1999, pp. 3–8.
- [6] K.-W. Kim, K.-H. Baek, and N. Shanbhag, "Coupling-driven signal encoding scheme for low-power interface design," in *Proc. ICCAD*, 2000, pp. 318–321.
- [7] C. N. Taylor, S. Dey, and Y. Zhao, "Modeling and minimization of interconnect energy dissipation in nanometer technologies," in *Proc. DAC*, 2001, pp. 754–757.
- [8] P. Larsson and C. Svensson, "Noise in digital dynamic CMOS circuits," IEEE J. Solid-State Circuits, vol. 29, pp. 655–662, June 1994.
- [9] K. Hirose and H. Yasuura, "A bus delay reduction technique considering crosstalk," in *Proc. DATE*, 2000, pp. 441–445.
- [10] D. Li, A. Pua, P. Srivastava, and U. Ko, "A repeater optimization methodology for deep sub-micron high-performance processors," in *Proc. ICCD*, 1997, pp. 726–731.
- [11] C. J. Alpert, A. Devgan, and S. T. Quay, "Buffer insertion for noise and delay optimization," in *Proc. DAC*, 1998, pp. 362–367.
- [12] A. B. Kahng, S. Muddu, E. Sarto, and R. Sharma, "Interconnect tuning strategies for high-performance IC's," in *Proc. DATE*, 1998, pp. 471–478.
- [13] J. Cong, L. He, C.-K. Koh, and Z. Pan, "Global interconnect sizing and spacing with consideration of coupling capacitance," in *Proc. ICCAD*, 1997, pp. 628–633.
- [14] C. Nicoletta, J. Alvarez, E. Barkin, C.-C. Chao, B. R. Johnson, F. M. Lassandro, P. Patel, D. Reid, H. Sanchez, J. Siegel, M. Snyder, S. Sullivan, S. A. Taylor, and M. Vo, "A 450-MHz RISC microprocessor with enhanced instruction set and copper interconnect," *IEEE J. Solid-State Circuits*, vol. 34, pp. 1478–1491, Nov. 1999.
- [15] S. P. Khatri, A. Mehrotra, R. K. Brayton, A. Sangiovanni-Vincentelli, and R. H. J. M. Otten, "A novel VLSI layout fabric for deep sub-micron applications," in *Proc. DAC*, 1999, pp. 491–496.
- [16] J. D. Z. Ma and L. He, "Formulae and applications of interconnect estimation considering shield insertion and net ordering," in *Proc. ICCAD*, 2001, pp. 327–332.
- [17] N. Sherwani, Algorithms for VLSI Physical Design Automation Second Edition. Norwell, MA: Kluwer, 1995.
- [18] J.-S. Yim and C.-M. Kyung, "Reducing cross-coupling among interconnect wires in deep-submicron datapath design," in *Proc. DAC*, 1999, pp. 485–490.
- [19] M. Pedram, "Power minimization in IC design: Principles and applications," ACM Trans Design Automation Electron. Syst., vol. 1, no. 1, pp. 3–56, 1996.