A Replacement Strategy for Canary Flip-Flops

Kunitake, Yuji
Kyushu University

Sato, Toshinori
Fukuoka University

Yasuura, Hiroto
Kyushu University

https://hdl.handle.net/2324/18814

バージョン：
権利関係：(c) IEEE.
A Replacement Strategy for Canary Flip-Flops

Yuji Kunitake
Kyushu University
y-kunitake@soc.aist.kyushu-u.ac.jp

Toshinori Sato
Fukuoka University
toshinori.sato@computer.org

Hiroto Yasuura
Kyushu University
yasuura@c.csce.kyushu-u.ac.jp

Abstract—The deep submicron semiconductor technologies increase parameter variations. The increase in parameter variations requires excessive design margin that has serious impact on performance and power consumption. In order to eliminate the excessive design margin, we are investigating canary Flip-Flop (FF). Canary FF requires additional circuits consisting of an FF and a comparator. Thus, it suffers large area overhead. In order to reduce the area overhead, this paper proposes a selective replacement method for canary FF and evaluates it. In the case of Renesas’s M32R processor, the area overhead of 2% is achieved.

Keywords: VLSIs; deep submicron technologies; variations

I. INTRODUCTION

Due to aggressive integration in semiconductor technologies, problems in parameter variations are becoming serious [1]. Process variations, IR drop, and supply voltage fluctuation due to ground bounce cause variability in device parameters. This makes performance differences between chips even though they are identical both in design and in process. In other words, timing yield is diminished. The conventional design methodologies consider the worst case and rely on timing margin in order to keep timing yield high. However, unfortunately, the timing margin becomes larger and larger as the advanced technologies increase the variability. Larger timing margin means larger power consumption. Hence, it becomes very difficult to satisfy both high yield and low power consumption under the worst-case design methodologies.

To attack this difficult problem, the combination of dynamic voltage scaling (DVS) system with timing-error-detecting FF is studied to reduce the overestimated timing margin [2-4]. These FFs dynamically detect timing errors in a microprocessor and help the DVS system manage the supply voltage without any timing errors. As overestimated timing margin is eliminated, the power consumption is significantly reduced.

Canary FF [4] has an area overhead because it essentially utilizes space redundancy. In addition, the other circuits for error gathering, for the DVS system, and for error correction are necessary. They also increase area in the LSI device. The increase in area causes larger power consumption and higher manufacturing costs. This paper considers the problem of area overhead, proposes a solution, and evaluates it.

II. CANARY FLIP-FLOP

Canary FF is augmented with a delay element and a redundant FF named shadow FF, as shown in Figure 1. They are used as a canary in a coal mine to help it detect whether a timing error is about to occur. Timing errors are predicted by comparing the main FF value with that of the shadow FF, which runs into the timing error a little bit before the main FF. An alert signal triggers voltage or frequency control. Canary FF has additional circuits such as the shadow FF, the delay element, and the comparator. They will have a large impact on circuit area of the entire microprocessor. We designed the layout of a canary FF using 65nm standard cell library provided by VDEC and found its area is 2.65 times larger than the conventional FF.

Figure 1. Canary FF

III. REPLACEMENT STRATEGY

We reduce the number of replaced FFs by considering the distributions of path delay. The delays of paths in a circuit are different with each other due to their logic depth, wire length, and so on. The paths with small delay will not cause a timing error even if its supply voltage is declined by the DVS system. These FFs need not to be replaced by canary FFs. We specify the paths where timing errors might occur and replace them on their outputs by canary FFs. We call them timing-violating FFs. This replacement policy reduces the area overhead.

Figure 2. Delay Distributions

Using Figure 2, we explain the replacement policy. First, the circuit is logic-synthesized at the best case scenario and the target cycle time is determined. Next, static timing analysis is performed on the synthesized net list. The paths that do not satisfy the target cycle time are specified and their output FFs...
should be replaced by canary FFs. In other words, they are timing-violating FFs. Other FFs will not cause timing errors.

IV. EVALUATIONS

A. Methodology

Our motif processor is Renesas Electronics’s M32R. We use the 65nm standard cell library shown in Table 1. Synopsys’s DesignCompiler logic-synthesizes the processor using the cell library at 1.3V. Table 2 summarizes the target processor specifications based on the logic synthesis results.

<table>
<thead>
<tr>
<th>Vdd (V)</th>
<th>Temperature (°C)</th>
<th>Process</th>
<th>Synthesis</th>
</tr>
</thead>
<tbody>
<tr>
<td>1.3</td>
<td>125</td>
<td>Fast</td>
<td></td>
</tr>
<tr>
<td>1.2</td>
<td>25</td>
<td>Typical</td>
<td></td>
</tr>
<tr>
<td>1.1</td>
<td>125</td>
<td>Slow</td>
<td></td>
</tr>
<tr>
<td>1.05</td>
<td>25</td>
<td>Slow</td>
<td></td>
</tr>
</tbody>
</table>

| Clock cycle time (nsec) | 5.5 |
| # of FFs               | 10,585 |
| Area (μm²)             | 8,028,585 |
| Pipeline stages        | 5 |

B. Results

We vary the supply voltage between 1.05V and 1.2V. Figure 3 summarizes the number of timing-violating FFs for every macro block. Only blocks that have timing-violating FFs between 1.05V and 1.2V are shown in the figure. The vertical line shows the block names and horizontal line indicates the associated numbers. At 1.2V, any timing error is not detected. As the supply voltage is down to 1.1V and to 1.05V, timing errors are detected, especially in the multiply-add unit (cpumac) and in instruction address data path (cpudp_pc). In the memory management unit (mmu), timing errors are not found when the supply voltage is larger than 1.1V. However, at 1.05V, it has the largest number of timing-violating FFs.

Figure 4 presents the area overhead when the timing-violating FFs are replaced by canary FFs. The bar graph indicates the number of replaced FFs and the line graph indicates the percentage overhead in the area. While we have already found canary FF is 2.65 larger in area than the conventional FF, we assume the former is 3 times larger than the latter in this evaluation to consider other overheads such as wire routing. In the graph, the values at 1.3V present the area overhead and the number of replaced FFs, when all FFs are replaced by canary FFs. The values at the other supply voltage present them when only timing-violating FFs are replaced by canary FFs. It does not have any timing-violating FFs at 1.2V. Even if the supply voltage is down to 1.05V, it has only 756 timing-violating FFs. In other words, only 7% of FFs are timing-violating FFs. Hence, the area overhead is also as small as 2%. These observations confirm the selective replacement method is useful for mitigating the area overhead. The method is applicable to other error-detecting FFs such as Razor [2].

IV. CONCLUSIONS

This paper proposed the selective replacement method for timing-error-detecting FFs in order to mitigate the area overhead. The evaluations showed that the chip area is increased by 25% in the case of M32R processor. The selective replacement method identifies timing-error-prone FFs and replaces them by canary FFs. Using the method, the area overhead is reduced to only 2%.

ACKNOWLEDGMENT

This work is partially supported by the CREST (Core Research for Evolutional Science and Technology) program of Japan Science and Technology Agency (JST), by Grant-in-Aid for Scientific Research (B) #20300019, and by Grant-in-Aid for JSPS Fellows #22-2357. The logic synthesis of the motif circuits were performed with the collaboration of STARC, e-Shuttle, Fujitsu, Renesas Electronics, Synopsys through Toshiba and VDEC, the University of Tokyo.

REFERENCES