# Studies on Low Power Technologies for Battery-Operated Semiconductor Random Access Memories

山内, 寛行

https://doi.org/10.11501/3130936

出版情報:九州大学, 1997, 博士(工学), 論文博士 バージョン: 権利関係:

# CHAPTER-4 Data Retention Power Saving for DRAM's

### Abstract

A 16M self-refresh DRAM achieving less than  $0.5\mu$ A per megabyte data retention current has been developed. Several techniques to achieve low retention current, including a relaxed junction biasing (RJB) scheme, a plate-floating leakage-monitoring (PFM) system, and a VBB pull-down word-line driver (PDWD) are described. An extension of data-retention time by three-fold and the refresh timer period by 30-fold over previously reported self-refresh DRAMs has been achieved. This results in a reduction of the AC refresh-current to less than  $0.4\mu$ A per megabyte. Furthermore, the addition of a Gate-Received V<sub>BB</sub> Detector (GRD) reduces DC retention current to less than 0.1µA per megabyte. This allows a 20-megabyte RAM disk to retain data for 2.5 years when powered by a single button-shaped 190-mAh lithium battery.

### **4-1 Introduction**

In recent years, the demand for DRAM has been growing rapidly, driven primarily by the personal computer (PC) market. The average DRAM capacity of PCs is expected to climb to 16-megabyte by 1996[1]. DRAM has had the advantages of lower cost per bit cost than SRAM and faster read/write random access than flash memory. When considering the requirements for DRAM's in battery-operated portable equipment's, it is expected that DRAM data-retention current as small as SRAM standby current will be needed so as to reduce the size and weight allocated to batteries. Self-refresh DRAMs with small data-retention current using cell-leak and temperature monitoring schemes were previously reported [2][3], but the achieved data-retention current was higher than 6-µA per megabyte<sup>[2]</sup>. This value is not sufficient to replace SRAMs, which consume less than 0.5-µA per megabyte data-retention current, as shown in Fig.4-1, in applications such as memory-cards and the RAM disks for PDA (personal digital assistant) equipment supporting portable multi-media access. The reduction of DRAM data-retention current is governed by the dynamic refresh current, which is strongly depended on the refresh cycle. Therefore a break-through in extending retention time is a prerequisite for the development of ultra low-power data-retention DRAM<sup>[4]</sup>, making possible the substitution of SRAM.



Fig.4-1. DRAM data retention current trends.

In this paper, a circuit technology to realize a self-refresh 16M-bit DRAM with a sub 0.5-µA per megabyte data-retention current, allowing a 20-megabyte RAM disk to retain data for 2.5 years with a single button-shaped 190mAh lithium battery, is presented. Part of the new design technology incorporates a Relaxed Junction Biasing (RJB) scheme that makes it possible to shift the storage-node voltage to a lower potential to relax the junction bias, and in turn, to suppress the junction leakage to 1/3 that of conventional leakage. One key concern, is to ensure that a low data value on the storage node is not lost when the cell-plate is pulled down from 1/2Vcc to Vss. To avoid loss of data, a VBB Pull Down Word-line Driver (PDWD) scheme was added to shift the low data value to a negative potential zone (between V<sub>BB</sub> and Vss). Details of these are discussed in the next section. Supplementing the biasing scheme is a leakage-monitoring circuit with a Plate Floating leakage-Monitoring (PFM) scheme, that helps compensate for a speed difference in charge-decline between the few short-retention cells, which govern the retention time of whole-chip, and normal retention cells, which constitute over 99.99% of the dummy cells. This scheme is proposed in Section 4-3. In Section 4-4, the Gate-Received V<sub>BB</sub> level Detector (GRD) scheme, making it possible to avoid the DC idling current, is described. The combined result of these improvements is summarized and the contribution of the proposed schemes to the accomplished data is clarified in Section 4-5. This is followed by the conclusion in Section 4-6.

### 4-2 Extending DRAM Data Retention Time

### 4-2-1 Background

When the practical retention time data of conventional 16M-bit DRAM<sup>[5]</sup> shown in Fig.4-2(a) is observed, it is found that the retention characteristic curve has a "hump", which is caused by the minor short-retention cells, and governs the retention time of whole chip. The leakage characteristics of the whole chip, thus divide the cells into two groups, the "bad" and the "normal" cells.

According to the measured retention characteristics at various junction bias conditions, the leakage of the bad cells strongly depend on the junction bias between storage-node  $V_N$  and substrate  $V_{BB}$ . For example, the estimated leakage at  $V_N=3.6V$  is 3-times larger than the case of  $V_N=1.8V$  as shown in Fig.4-2(b). The leakage has been estimated based on the measured storage-node potential for each retention time, which can be estimated by measuring the sensing margin using cell-plate bump tests<sup>[6]</sup> - by controlling the amount of the cell-plate bumping at each retention time so as to estimate the remaining storage-node potential.



Fig.4-2. (a) Measured retention characteristics, and (b) Estimated storage-node junction leakage.

(b)

### Relaxed Junction Biasing (RJB) scheme 4-2-2

Thus, to extend the retention time, we propose the Relaxed Junction Biasing (RJB) scheme, shown in Fig.4-3(a), making it possible to shift the storage-node voltage to a lower potential comfort zone (of less than half Vcc (e.g., 1.8V)), in which the junction bias can be reduced so as to suppress the leakage to 1/3 that of the conventional case, shown in Fig.4-3(b).

When the burst refresh period is finished in self-refresh mode, the retention period starts, and the storage-node shifts down by 1/2Vcc and rests at the comfort zone. After a retention period, the storage-node goes back to a higher detectable zone ( $\geq V_{limit}$ ), in which fast and stable sense-amplifier operation can be realized, as shown in Fig.4-3(a). The shift down/up of the storage-node V<sub>N</sub> can be realized by controlling the pull down/up of the cell-plate voltage, respectively, as shown in Fig.4-3(a).

### 4-2-3 Comparison with Boosted-GND Scheme

To reduce junction bias, the boosted-GND scheme with zero biased-V<sub>BB</sub> (V<sub>BB</sub>=0V) was previously reported<sup>[7]</sup>. However, comparing the effectiveness in terms of the relaxation of the junction bias between the two (See Fig.4-4 and Table 4-1), the RJB scheme has the following two advantages : 1) The junction leakage of storage-node can be reduced to 68% of the conventional case<sup>[7]</sup>, owing to the reduction of the junction bias by 0.5V to 0.8V. Also, the RJB scheme can provide a 0.3V-wider variable range of  $V_N$  compared with the conventional case<sup>[7]</sup>. As a result, the retention time can be about 1.9-times longer than the conventional approach. 2) The VBB can still be maintained as a negative voltage (-1.3V) to overcome the I/O undershoot injection problem and other problems, without any additional costly process steps, such as forming a triple-well.

### 4-2-4 V<sub>BB</sub> Pull Down Word-line Driver (PDWD) Scheme

The most important design issue in realizing the Relaxed Junction Biasing (RJB) scheme is prevention of the "Low"-data from compressing between the cell-plate and the storage-node when the cell-plate is pulled down from 1/2Vcc to Vss. For example, if the storage-node of "Low"-data is clamped at Vss at that time, the stored potential difference between the storage-node and the cell-plate is lost.

To overcome this problem, the VBB Pull Down Word-line Driver (PDWD) scheme, allowing the "Low"-data to shift down to negative potential zone ( $V_{BB} = -1.3V \le V_L \le$ Vss) when cell-plate is pulled down, has been developed. The PDWD scheme pulls the



Fig.4-3 Conceptual comparisons between (a) Relaxed Junction Biasing (RJB) and (b) Conventional schemes.





Table. 4-1. Comparisons between Boosted GND and RJB schemes.

|                                                                 | Boosted GND       | RJB scheme                            |  |
|-----------------------------------------------------------------|-------------------|---------------------------------------|--|
| Detectable- range<br>(VH-VLMT)                                  | 3.6 ~ 2.6 V       | 1.8 ~ 0.5 V<br>(3.6 ~ 2.3 V)          |  |
| VREF<br>(half BL precharge)                                     | 2.1 V             | 1.8 V                                 |  |
| Width of variable range<br>of storage-node VN<br>(VEFF=VH-VLMT) | 3.6 - 2.6 =1.0V - | ► 1.8 - 0.5 =1.3V<br>(3.6 - 2.3=1.3V) |  |
| VBB                                                             | 0.0 V             | -1.3 V                                |  |
| Junction Bias<br>(VN-VBB) range                                 | 3.6 ~ 2.6 V       | ► 3.1 ~ 1.8 V                         |  |
| Average-<br>Junction leak(IJA)<br>Ratio<br>in range of VN       | 1                 | 0.68                                  |  |
| Expected<br>Retention Time<br>Ratio<br>(VEFF/IJA)               | 1<br>(1.0/1)      | 1.9<br>(1.3/0.68)                     |  |

unselected word-lines (WLs) down to the  $V_{BB}$  by using the level-shifter (Vss to  $V_{BB}$ ). The negative VBB biased WL allows the cell-transistor to remain turned off and to

keep the storage-node floating even if the storage-node shifts down to negative potential zone as shown in Fig.4-3(a).

The circuit configuration and the simulated operating waveforms of the PDWD, are shown in Figs. 4-5(a) and 4-5(b). The PDWD features the following three points: 1) VBB is supplied to the WL and the selected gate-electrode (VA) of WL driver transistor (Q1). 2) The WL pull-down signal WDn and the inserted-MOSFETs (Q4, Q5) assist the MOSFET Q3 and MOSFET Q2 in pulling-down the WL and node VA, respectively, so as to reduce the discharge current to VBB in a two step WL pull-down operation (i.e. to Vss and then to  $V_{BB}$ ). and 3) High-V<sub>T</sub> transistors Q2 and Q3 are employed in the Vss to  $V_{BB}$  level-shifter<sup>[8]</sup>. The High-V<sub>T</sub> value is designed to be 2.0V with zero back bias, which is 0.7V-larger than the absolute value of  $V_{BB}$  (-1.3V) eliminating leakage current to substrate. The gate length (Lg) for the High-V<sub>T</sub> transistor is designed to be  $1.0\mu m$  to suppress VT lowering due to the short-channel effect. The boosted-voltage VPP (V<sub>CC</sub>+1.5V) can fully turn on the High-V<sub>T</sub> transistors, even at the minimum supply voltage of Vcc=1.8V.

### 4-2-5 Results and Discussions

To verify the effectiveness of the proposed Relaxed Junction Biasing (RJB) scheme coupled with the Pull-Down Word-line Driver (PDWD) scheme, the pause time of the developed 16M-bit DRAM chip was measured as shown in Fig.4-6. The pause time can be extended to 2.7s at Vcc=3.6V and Ta=75°C, while maintaining the VBB= -1.3V. This value is about 3-times longer than that of the conventional schemes. An interesting point shown in Fig.4-6(b) is that the retention time for the RJB scheme no longer decreases even if Vcc is larger than 3.0V, unlike the conventional case. This is because, the comfort zone of the storage-node in the RJB scheme, which is in the vicinity of 0V, is independent of Vcc unlike the conventional scheme.

### 4-3 Extension of DRAM Refresh Interval

### 4-3-1 Background

Another important design requirement in realizing ultra-low AC refresh current, is to control the pause period of the self-refresh timer in order to monitor the actual retention time of the chip, which depends strongly on temperature and junction bias. To achieve this, a cell leakage monitoring scheme with 1K-bit dummy cells was previously











Fig.4-6. (a) Comparison of Retention Time Characteristics between two cases of using and without RJB scheme, and
(b) Pause Time Characteristics as a function of Vcc.

reported<sup>[2]</sup>. However, even by using a structure for dummy cells identical to actual cells, the leakage from bad cells can never be monitored. This is because the number of the bad cells form only less than 0.01% of whole chip, at the most. This implies that by the same proportion, only 0.01% of dummy cells constitute bad cells for leakage-monitoring. It is important to monitor the leakage of bad cells because according to only measured data, shown in Fig.4-2(b) and Fig.4-8(a), the leakage of bad cells is over 40-times that of normal cells and hence data retention in them is 40-times poorer.

### 4-3-2 Plate-Floating Leakage Monitoring (PFM) Scheme

To monitor the retention time more accurately, we have developed the Plate-Floating leakage Monitoring (PFM) scheme, which accelerates the falling-speed of monitored storage node, of which may be the normal cell, nearby the same level as the bad cell, while maintaining the same dependency on temperature and junction bias.

The PFM scheme shown in Fig.4-7(a) operates as follows: 1) When the storage-node V<sub>N</sub> is being monitored, the plate-node of 1K-dummy cells are controlled to be floating so that the effective capacitance (C) of the storage-node is reduced to accelerate the fallingspeed of the storage node VN, while maintaining the same dependency on temperature and junction bias, as shown in Fig.4-7(c). In fact, when the plate becomes floating, the capacitance value (C) of the cell capacitor is reduced to 1/20 of that with a fixed plate voltage, due to the serially connected parasitic and junction capacitance in that. As a result, the falling-speed of the storage-node can be accelerated by a factor of 20 compared to the fixed-plate case, as shown in Fig.4-8(a). 2) When the storage node V<sub>N</sub> drops to the reference level V<sub>REF</sub>, the plate potential of the whole chip is reset by the VPLD signal. (i.e., the Plate goes back from Vss to 1/2Vcc) and the burst refresh is started by the internal RAS as shown in Fig.4-7(c).

When the detectable margin of the potential difference between VN and VREF is considered in a practical chip design, a margin of  $\pm 100$  mV is required due to the process fluctuation and noise. Thus, measuring the range of V<sub>N</sub> becomes very important factor in determining the pause period  $(T_P)$  (See Fig.4-7(c)).

A comparison of TP for the proposed PFM with that of conventional Fixed-Plate schemes, is shown in Fig.4-8(b). It can be seen that for the same value of TP, the PFM can provide more than 10-times larger margin when compared with the conventional case. For example, even if the V<sub>REF</sub> fluctuates by ±100mV at Ta=75°C, TP varies by only ±0.16s. On the other hand, in the conventional case, the falling speed of the  $V_N$  is too slow to distinguish the amount of drop of the  $V_N$  from the  $V_{REF}$ fluctuation ( $\pm 100$  mV), and the T<sub>P</sub> fluctuation reaches  $\pm 2.7$ s. This is too large compared to the actual pause time of 2.7s at Vcc=3.6V and Ta=75°C.



(c) Timing diagram of PFM and conventional scheme.

Fig.4-7. Comparisons of cell-leak monitoring scheme. (a) Proposed Plate-Floating leakage Monitoring (PFM) scheme, (b) Conventional Fixed plate scheme,







(b) Pause-Period TP in Timer vs. reference voltage VREF.

Fig.4-9 Dependence of Pause period TP on (a) VCC, and (b) temperature Ta.

### 4-3-3 Results and Discussions

The dependency of the designed and the measured value of T<sub>P</sub> on Vcc and on Ta are shown in Figs. 4-9(a) and 4-9(b), respectively. It can be seen that the proposed PFM scheme can extend Tp to over 3-times longer than the conventional scheme in the range of the Vcc=1.8V ~ 3.8V and Ta= $20^{\circ}$ C ~ 75°C.

To verify the effectiveness of the proposed Relaxed Junction Biasing (RJB) scheme combined with the PFM scheme, the dynamic value of the refresh current (excluding DC standby current (ISB)) for the 16M-bit DRAM was measured. According to the measured data at Ta=75°C, the PFM can reduce the refresh current IRF to 1/3 of that in the Fixed-Plate scheme, and that value corresponds to 1/30 of that using a constant pause-period scheme (0.7s), as shown in Fig.4-10(a). The refresh current  $I_{RF}$  at Ta=25°C can be reduced to  $\leq 0.5\mu$ A and 0.7 $\mu$ A at Vcc=1.8V and 3.6V, respectively. Even at Ta=55°C, the I<sub>RF</sub> can be contained within  $4\mu$ A as shown in Fig.4-10(b).

One key concern for the RJB scheme is to ensure that the driver of the cell-plate dissipates less operating current than the refresh current. To clarify this, the current consumption of cell-plate driver has been estimated to be less than 1/350 of the IRF as shown in Fig.4-11. This is because the plate capacitance is drastically reduced due to the serially connected junction capacitance's to memory-cell capacitor when all of WLs are turned off, as shown in Fig.4-11. The RC-delay (rising time or falling time) of potential transition of the cell-plate can be suppressed to less than 2µs with a driving current of 5mA.

### 4-4 DC Retention Current

### 4-4-1 V<sub>BB</sub> Level Detector

Another important design requirement in realizing ultra-low retention current of subhalf µA per megabyte, is to reduce the DC retention current, which is dominated by the current consumed by the VBB generator's intermittent-operation, to less than 0.04µA per megabyte. For meeting this, a reduction of the frequency of VBB generator's intermittent-operation, which is strongly depended on the amount of the injected substrate current I<sub>BB</sub> shown in Fig.4-12(a), is required.

In order to eliminate IBB, which is caused mainly by the DC idling current from Vcc to  $V_{BB}$  within the conventional  $V_{BB}$  level detector circuit shown in Fig.4-12(a), we investigated the following two V<sub>BB</sub> level detector circuits: 1) Monitoring the V<sub>T</sub> difference between the VBB and Vss well biased n-MOSFETs (Q1 and Q2) shown in



Fig.4-11. Comparisons of current consumption between AC refresh current and plate-driving current.



Fig.4-12 Comparisons of three types of VBB Level Detectors.

Fig.4-12(b) (called Well-Received scheme), and 2) Monitoring the gate to source voltage VGS of p-MOSFET (Q3) shown in Fig.4-12(c) (called Gate-Received VBB level Detector (GRD) scheme).

Comparing the current consumption between the two, the GRD scheme have the following two advantages: 1) In the V\_{BB} and Vcc ranges of -1.8V ~ -0.8V and 1.8V ~ 3.8V, respectively, the GRD scheme can reduce the current consumption to  $\leq 0.04 \mu A$ per megabyte and this value is 1/10 of the consumption in the Well-Received scheme, as shown in Figs.13(b) and 13(c). This is because the Gate-Received transistor Q3 provides higher sensitivity and no longer requires larger DC current (Iss) for maintaining the same sensitivity to  $V_{BB}$ -level changes, as compared to the Well-Received type. 2) The GRD scheme no longer requires the costly triple-well process to isolate between the different well-biased MOSFETs Q1 and Q2, shown in Fig.4-12(b).

### 4-4-2 Other DC Current

Another key concern to hold the DC retention current to less than 0.1µA per megabyte is to ensure that the consumption levels of all other DC currents (e.g., including 1/2Vccreference generator and Vpp-reference generator), apart from the VBB level detector, is suppressed to 0.06µA per megabyte.

To meet this requirement, we employed the dynamically-controlled reference generator (DCRG) [9] for the 1/2Vcc and Vpp reference generators. The DCRG scheme can reduce the DC idling current to less than 0.06µA by dynamically controlling the DC current path within the reference generators (including the voltage-divider and the differential amplifier). This current is cut off during the pause period Tp in the selfrefresh mode.

Since the pause period TP of the self-refresh timer with PFM has dependence on the junction temperature (e.g, Tp=20-seconds at 25°C, Tp=1-second at 75°C), the DC current in the reference generators is 0.06µA per megabyte at 25°C, while, for at 75°C, this is 1.2µA per megabyte.

### 4-4-3 Results and Discussions

To verify the effectiveness of GRD and DCRG schemes, we designed and fabricated the test devices of the 16M-bit DRAM, which can stop the memory cell refresh operation in the self-refresh mode so as to exclude the dynamic refresh current from the whole self-refresh current. The measured standby current (ISB) of the 16M-bit DRAM was less than 0.2µA for Vcc=3.6V and Ta=25°C.





(c) Current characteristic vs. Vcc level.

Fig.4-13 Comparisons of current consumption among three types of VBB Level Detectors.

### 4-5 Low Power Performance

### 4-5-1 Combined Results and Discussions

As a result of all the proposed improvements, the measured total data retention current IRC (IRF +ISB) in the fabricated 16M-bit DRAM, has been suppressed to less than 0.9µA at Vcc=3.6V and Ta=25°C. According to this, RAM disk of two megabytes using this DRAM can retain the data by only supply current of less than 0.9µA. That is, the data retention current per megabyte of RAM disk using this DRAM is less than 0.5µA.

To clarify the contribution of the proposed schemes, the data retention current of four different types of DRAMs are compared in Fig.4-14 the following three points: 1) the RJB scheme in combination with the PDWD scheme can reduce the dynamic refresh current to about 1/3 of the conventional DRAM, 2) the PFM scheme which helps in better monitoring of cell node, can further reduce the dynamic refresh current to about 1/3 of the fixed plate scheme, 3) the DC retention current has been reduced by a magnitude by introducing the GRD and the DCRG schemes.

### 4-5-2 Features of 16M-bit DRAM

A photomicrograph of the developed 16M-bit DRAM chip is shown in Fig.4-15. The plate drivers for the RJB scheme are distributed among the four 4M-bit arrays and the 1K-dummy cell block used in the PFM scheme is placed at the side of one 4M-bit array. The PDWD blocks are arranged at the middle zone of the chip. The GRD and the PFM controllers are laid out at the periphery of the chip.

Figure 4-16 shows the internal operating waveforms in the self-refresh mode of the fabricated 16M-bit DRAM. According to these, the pause period Tp and the burst refresh period TB, are 1.2s and 0.2s, respectively, at Vcc=3.6V and Ta=75°C.

been obtained.

These access time data during the normal mode (not the self-refresh mode) are never degraded compared with the conventional 16M-bit DRAM[5]. This is because the proposed schemes come into effect only during the self-refresh mode.

Process and performance of this 16M-bit DRAM are summarized in Table 4-2.

A 0.9µA data retention current has been accomplished at Vcc=3.6V and Ta=25°C. The access time (t<sub>RAC</sub>) is 27ns RAS access time at (Vcc=2.7V, Ta=75°C, CLOAD=50pF). Even at the minimum Vcc=1.8V, a fast access time (tRAC=41ns) has



6.5mm x 15.9mm

Fig.4-15 Microphotograph of sub-µA data retention 16Mb DRAM chip

| $16M \text{ words} \times 1 \text{ bit}$                                                                             |
|----------------------------------------------------------------------------------------------------------------------|
| IM words × 4 bit                                                                                                     |
| 0.5µm twin-well CMOS (P-substrate)<br>e poly Si /single polycide /double metal                                       |
| Lp/Ln = 0.7µm/0.55µm                                                                                                 |
| stacked type capacitor                                                                                               |
| 1.16μm × 2.62μm = 3.04 μm <sup>2</sup>                                                                               |
| $6.5$ mm $\times$ 15.9mm = 103.7 mm <sup>2</sup>                                                                     |
| 1.8V ~ 3.6V                                                                                                          |
| $tRAC = 27 \text{ ns} (2.7V, 75^{\circ}C)$<br>= 41 ns (1.8V, 75^{\circ}C)                                            |
| Icc7 = $0.9\mu$ A (3.6V, 25°C)<br>< self-refresh current ><br>Icc2 = $0.2\mu$ A (3.6V, 25°C)<br>< stand-by current > |
| Fast-page / Hyper-page<br>Self-refresh                                                                               |
| 300-mil 26/24 pin SOJ/TSOP (Type-II)                                                                                 |

Access Time

Function

Package

Current Consumption

### 4-6 Conclusion

Targeting on the replacing the SRAM in PDA equipment's and memory card with a new DRAM, a self-refresh 16M-bit DRAM with the ultra-low data-retention current of sub half-µA per megabyte has been developed mainly based on the circuit technology, while still maintaining the high-speed characteristic in the battery-based devices. It is our feeling that the proposed circuit technology becomes the most attractive DRAM candidate in PDA equipment's and memory card, making it possible to support portable multimedia access. This is because the proposed circuit technology would allow a 20megabyte RAM disk to retain data for 2.5 years when powered by a single buttonshaped 190-mAh lithium battery.

### References

[1] L. Geppert "Solid State", IEEE Spectrum, pp. 35-39, Jan. 1995. [2] K. Sato et al " A 4Mb Pseudo SRAM Operating at 2.6±1V with 3µA Data Retention Current", ISSCC Digest of Technical Papers, pp. 268-269, Feb. 1991.

[3] Y. Kagenishi et al " Low Power Self Refresh Mode DRAM with Temperature Detecting Circuit ", in Symposium. on VLSI Circuits Digest of Technical Papers, pp. 43 - 44, Jun. 1993.

[4] H. Yamauchi et al " A Sub-0.5µA/MB Data-Retention DRAM " ISSCC Digest of Technical Papers, pp. 244-245, Feb. 1995.

[5] H. Yamauchi et al " A 20ns Battery-operated 16Mb CMOS DRAM " ISSCC Digest of Technical Papers, pp. 44-45, Feb. 1993.

[6] T. Iwata, et al " A Evaluation of Memory-Cell Leakage at 16Mbit DRAM " Proceedings of the 1995 IEICE General Conference, C-637, Mar. 1995.

[7] M. Asakura et al " A 34ns 256Mb DRAM with Boosted Sense-Ground scheme" ISSCC Digest of Technical Papers, pp. 140-141, Feb. 1994.

[8] D. Galbi et al " A 33ns 64-Mb DRAM with Master-Wordline Architecture", ESSCIRC Digest of Technical Papers, pp. 131-134, Sep. 1992.

[9] H. Tanaka et al " Sub-1-µA Dynamic Reference Voltage Generator for Battery-Operated DRAMs ", in Symposium. on VLSI Circuits Digest of Technical Papers, pp. 87 - 88, Jun. 1993.

### Abstract

A battery-operated 16-Mb CMOS DRAM with an address multiplexing has been developed by using an existing 0.5µm CMOS technology. This can access data in just 36ns when powered from 1.8-V battery-source, and 20ns at 3.3V. However, this requires a mere 57mA of operating current for a 80ns cycle time and only 5µA of standby current at 3.3V. To achieve both the high-speed and the low-power operation, the following four circuit techniques have been developed: 1) a Parallel Column Access Redundancy scheme coupled with a Current Sensing Address Comparator, 2) an N&PMOS Cross-coupled read bus Amplifier, 3) a Gate Isolated Sense Amplifier with a low  $V_T$ , 4) a layout that minimizes the length of the signal path by employing the LOC assembly technique.

### 5-1 Introduction

Targeting low-power, portable applications, the battery-operated 16-Mb DRAM was previously developed [1], but the access time was not sufficiently achieved in a Vcc range of 1.8V to 3.6V. Figure 5-1 illustrates the background and the target for our work. In Vcc reducing from 3.6V to 1.8V, the access time of the conventional DRAM [1] is roughly doubled, and just over 60ns as shown by the upper curve. Sub 40ns access time is indispensable performance in the high-speed battery-operated applications, such as the portable and palm-top computers. For example, even at 1.8V corresponding to the voltage using two Ni-Cd batteries, faster DRAM's are required in order to avoid wait states without adding the complexity of a memory hierarchy. Therefore we focused on the realizing the high-speed operation as shown by the lower curve.

This paper describes an address multiplexed 16-Mb CMOS DRAM with the RAS access time of 20ns at 3.3V and also 36ns even at 1.8V.<sup>[2]</sup> Their values of access speed are the fastest among 16-Mb DRAM's which have been ever reported. However, this requires a mere 57mA of operating current for a 80ns cycle time and a only 5µA of standby current at 3.3V. The chip size measures 6.52 x 15.9 mm<sup>2</sup>, which can be assembled in a 26pin-300mil SOJ package. To achieve the fast access time even using a existing  $0.5\mu m$  CMOS technology, the four circuit techniques have been developed<sup>[2]</sup>,

# **CHAPTER-5** Circuit Technology for **High-Speed Battery-Operated DRAMs**





as follows: 1) A Parallel Column Access Redundancy (PCAR) scheme coupled with a Current Sensing Address Comparator (CSAC), 2) A Quasi-static N&PMOS Crosscoupled data bus Amplifier (NPCA), 3) a Gate Isolated Sense Amplifier (GISA) with low threshold voltage, 4) a layout that minimizes the length of the signal path by employing the LOC assembly technique. Access speed degradation and Charging and discharging current have been minimized even at the minimum Vcc of 1.8V.

In the following section, the Parallel Column Access Redundancy (PCAR) scheme coupled with a Current Sensing Address Comparator (CSAC) are discussed. In Section 5-3 a quasi-static N&PMOS Cross-coupled data bus Amplifier (NPCA) is explained. The Gate Isolated Sense Amplifier (GISA) with low threshold voltage is described in Section 5-4. The chip architecture, device features, and the chip performance are demonstrated in Section 5-5. Conclusions are given in Section 5-6.

### 5-2 Redundancy Architecture

### 5-2-1 Parallel Column Access Redundancy (PCAR) Scheme

In the conventional redundancy scheme, an access-time penalty of a few nanoseconds is inevitable, because a defective sense-amplifier (S/A) and a spare S/A still connect simultaneously to the same I/O bus until the spare column line (SClm) replaces the defective normal column line (NCLn) completely.

The Parallel Column Access Redundancy (PCAR) scheme shown in Figure 5-2 solves this problem [3]. When the the column address (Y0 - Y11) corresponds to the redundant address, that is to say, in the redundant operation, SPY9 has the inverse-relation to the signal of Y9. On the other hand, in the normal operation, the SPY9 corresponds to the Y9.

In the case of little redundancy, the PCAR technique gives no access time penalty owing to the following two reasons: 1) The I/O buses, which are connected respectively to the normal and the redundant S/A, are separated, and 2) The delay time (Td) to switch the I/O buses by the SPY9 is usually less than the time (Tdc) to transmit the S/A's data to the read-bus-amplifier.

No data-collision between the normal and the redundant S/A has been experimentally demonstrated using the PCAR technique, as shown in Figure 5-3. A 3ns delay reduction of column access has been obtained compared with the conventional redundant scheme.

### 5-2-2 Current Sensing Address Comparator

To achieve both longer refresh period and higher manufacturing yield by repairing



Fig.5-2. Parallel Column Access Redundancy (PCAR) scheme.



Fig.5-3. Comparison of I/O bus operating waveforms between conventional and PCAR scheme

several defective and shorter-retention cells, much redundancy is required in the batteryoperated high-density DRAMs.

The load (Cx) of the VRN node in Figure 5-4(a) becomes heavier and heavier drastically with increased the redundancy. Figure 5- 4(a) shows the conventional SPY9 generator combined with the dynamic NOR type address comparator. The VRN voltage conversion requires longer delay time and larger idling current through QP and QN during low-state of the ATD pulse. This delay directly affects the Td, but has no effect on Tdc. Consequently, the Td becomes longer than the Tdc, and the Td increases access time.

To overcome this problem, the Current Sensing Address Comparator (CSAC) has been developed as shown in Figure 5-4(b). The CSAC consists of the ground-fixedaddress comparator and the current mode S/A [4]. The comparator functions as a differential current source controlled by the input address. The differential current is transmitted to the current mode S/A through the complementary current conveyor lines, that are fixed at around the ground level. Therefore, the voltage conversion of the VRN node is no longer necessary. The response of the S/A is very fast for the output nodes, that are not loaded with large capacitance of the Cx and the current conveyor lines.

### Delay Comparisons 5-2-3

Figure 5-5 shows the simulated SPY9 delay comparisons between the CSAC and the conventional scheme, with regard to the dependency of Cx , current consumption, and Vcc, respectively. The CSAC scheme reduces the SPY9 delay of 1.7ns and decreases the current consumption of 2.0mA at the Cx of 1.2pF and the Vcc of 3.0V as shown in Figures 5-5(a) and 5-5(b). In the conventional scheme, the delay strongly depends on the voltage of the Vcc as shown in Figure 5-5(b), because the drive-capability of QP is drastically degraded with reduced the Vcc. On the other hand, the CSAC scheme almost neglects charging and discharging on the Cx and further reduces the delay dependence on the Cx and the Vcc. The delay reduction is 4.8ns at the minimum Vcc of 1.8V as shown in Figure 5-5(c).

Moreover, this CSAC scheme is used in the row redundant circuitry and reduces delay to about 1.5ns at the Vcc of 3.3V.

5-3 A Quasi-static Signal Sensing Amplifier

5-3-1 A P&PMOS Cross-coupled Amplifier (PPCA)

The conventional P&PMOS cross-coupled amplifier (PPCA) [3] shown in Figure 5-







Fig. 5-4. Redundant address SPY9 generator. (a) conventional scheme (b) CSAC scheme





Fig. 5-5. Comparisons of SPY9 generator delay. (a) Cx dependence of SPY9 delay (b) Current consumption versus SPY9 delay (c) Vcc dependence of SPY9 delay

6(a) is not enough for the high-speed operation and low-power consumption because of the following two reasons: 1) The operation speed of the first stage amplifier is too slow owing to the smaller and slower input voltage swing ( that is to say, voltage differences between DB and XDB line is about 100mV/ns ) as shown in Figure 5-6(a), and 2) The DC idling current ISA1 is too much, because the SA signal to control the amplifier should be kept high level except the equalizing period, to prevent the DBII node from floating.

### 5-3-2 A N&PMOS Cross-coupled Amplifier (NPCA)

The N&PMOS cross-coupled amplifier (NPCA) combined with the charge-transfer devices (Q<sub>N1</sub>, Q<sub>N2</sub>) solves the above mentioned problem. Figure 5-6(b) shows the NPCA. A much larger effective input voltage swing ( that is to say, voltage differences between the DBI node and the XDBI node ) is obtained compared with the PPCA's one. This is due to the relocating the input signal charge to the lower capacitance of the DBI, XDBI node, which is much less than the DB, XDB line. The larger effective input voltage swing allows the NMOS (QN3, QN4) and PMOS (QP1, QP2) to drive the DBII line quickly, resulting in high-speed amplification as shown in Figure 5-6(b). Once either of QP1 or QP2 is turned on, the QN3 and QN4 can be cut off to eliminate the DC idling current ISA1. Since either of input PMOS transistor QP1 or QP2 is certainly turned off even if the input voltage doesn't reach the full CMOS level, no DC idling current flows.

### 5-3-3 Comparison of Sensing Delay vs. Current Consumption

The simulated results of the sensing delay and the current consumption are shown in Figure 5-7. The NPCA is capable of reducing the delay of 1.7ns at the average current of 1.6mA (Tcyc=15ns, Vcc=3.0V) and decreases the current consumption of 2.4mA at the sensing delay of 2.4ns compared with the PPCA.

This is because the following the two reasons: 1) a much larger effective input voltage swing is obtained between DBI and XDBI node compared with the PPCA's one, and 2) no DC idling current flows unlike PPCA, even if the input voltage doesn't reach the full CMOS level, as mentioned before.

To demonstrate the high-speed operation of the NPCA, the internal operating waveforms has been observed through pico-probe as shown in Figure 5-8. The Td is defined as a delay time from when the voltage differences between DB and XDB is 0V to when the voltage difference between RDB and XRDB reaches 500mV. The Td of less than 2.0ns is achieved at the Vcc of 3.0V. IOD and XOD shown in Figure 5-8 denote the complementary data bus just before the output buffer.



Fig. 5-6. Comparison of the schematic diagram and the current waveforms of read-bus-amplifier, (a) P & PMOS cross - coupled read-bus-amp. (PPCA) (b) N & PMOS cross - coupled read-bus-amp. (NPCA)







Fig. 5-8. Measured Operating waveforms of Y, DB, RDB, IOD internal signal Y: Column-decode-line, DB; I/O bus, RDB; Read-data-bus, IOD:Data-bus just before output buffer

5-4 Gate Isolated Sense Amplifier (GISA) with Low Threshold Voltage

### 5-4-1 Sensing Delay vs. Threshold Voltage

In this DRAM, in order to limit the total subthreshold-current in the CMOS circuitry (less than 1  $\mu$ A), the NMOS threshold voltage (VT) is designed to be 0.6V for the gate length (Lg) of 0.6µm in the peripheral circuits. On the other hand, the Lg of the transistor pairs in the NMOS S/A is designed to be 0.9µm to prevent the increased VT asymmetry between the transistor pairs<sup>[5]</sup>. The VT in the NMOS S/A is increased to 0.75V owing to the short-channel effects as shown in Figure 5-9. There is an obstacle to achieve the high speed access in the sensing delay of some weak columns in the low voltage operation at 1.8V. The previously reported technique ( for example, the meshedpower-line merged with distributed S/A [6] ) never overcome the problem of the intolerable sensing delay at the higher V<sub>T</sub> of 0.75V in the NMOS S/A. For example, even using nine or seventeen distributed S/A drivers, the sensing delay time reaches over 25ns at the Vcc of 1.8V as shown in Figure 5-10(a). In fact, in order to achieve a sub 40ns access time, the bit-line sensing delay time of less than 10ns is required.

### 5-4-2 Concept of GISA with Low Threshold Voltage

This problem has been solved by introducing a localized low V<sub>T</sub> process and a Gate-Isolated-S/A (GISA) that required no additional processing, such as a counter-channeldoping. The localized low-V<sub>T</sub> process is characterized by eliminating the LOCOS channel-stopper for the NMOS S/A area. Since the doping for the LOCOS channelstopper shares with the doping for controlling the VT of the gate-channel, the elimination of the LOCOS channel-doping reduces the gate-channel-doses and gives the low V<sub>T</sub>. The GISA has no LOCOS isolation between the adjacent transistors in a S/A and between the adjacent S/A's, but the substantial isolation is realized by using the circle gate pair transistors, as shown in Figure 5-11. Therefore, the LOCOS channel-stopper is no longer necessary in the NMOS S/A area. The GISA technique makes it possible to lower the V<sub>T</sub> in the S/A without facing any difficulty in the submicron LOCOS isolation. A 15ns bit-line latch time reduction has been achieved by using this technique as shown in Figure 5-10(b). The voltage dependence shows that this GISA is suitable for a highspeed battery-operated DRAM in the Vcc range of 1.8 to 3.6V. The sensing delay time is limited less than 8ns even at the Vcc of 1.8V.













## 5-5 0.5µm CMOS 16Mbit DRAM Chip Features

### 5-5-1 Chip Architecture

The chip is divided into 64 sub-arrays of 256Kbits, as shown in Figure 5-12. The whole chip-size measures 6.52 x 15.9 mm<sup>2</sup> that can be assembled in a 300-mil-SOJ package. In order to reduce the delays of the word-line (WL) rising and the bit-line (BL) sensing , the number of the WL shunt area is doubled (17 WL shunt areas per sub-array). The drivers of S/A are located at the WL shunt area. The WL rising and the BL sensing delays can be reduced by 1.5ns and 2.5ns, respectively. The floor-plan is optimized to realize the short signal paths for controlling the address and data circuits. The external strobe signal (RAS, CAS, etc) pads and address pads, and their associated circuits are located at the center of the chip by employing the LOC assembly technique.

### 5-5-2 Measured Access Time

Figure 5-13(a) shows the measured output waveforms in the random access mode. A 20ns RAS access time has been accomplished, at the typical conditions (Vcc=3.3V, Ta=25°C., Cload=50pF). The time differences between low-activation of RAS and CAS takes 11ns (TRCD=11ns), and the time from RAS to column address is 8ns (TRAD=8ns). Figure 5-13(b) shows a shmoo plot of the RAS access time (TRAC). Even at the minimum Vcc of 1.8V, a fast access time (TRAC=36ns) has been obtained. This DRAM makes it possible to realize the high-speed characteristic in the battery-operated applications which operates at less than 2.0V without any additional process and device technologies under development.

### 5-5-3 Comparisons of Access Time Components

The comparisons of the access time components are carried out among the three types of 16-Mb DRAM's in Figure 5-14. They are a conventional 0.6µm DRAM without a centralized layout for the peripheral circuit, a 0.5µm DRAM using a conventional circuit combined with a centralized layout for the peripheral circuit, and this work. The measured internal operating waveforms in this work are shown in Figure 5-15. The measured time differences between activation of RAS and word-line is 9ns, and moreover 4ns is needed to activated the column selected line. The measured delay time from the activation of the column-line to the data-out is 7ns.

A 14ns access time improvement in this work results from use of the following newly developed three circuit techniques:



Fig.5-12. Micrograph of 16-Mb CMOS DRAM

Vcc=3.3V Ta=25°C

> 2V/div 5ns/div

| Dmin=1 | Ins             | <i>Ta</i> = 25 ℃ | -  |
|--------|-----------------|------------------|----|
| 30.    | Ons             | 40.0ns           |    |
| - []   |                 | TRAC             | C  |
|        | • • • • • • •   | ••••• 18n        | S  |
|        | • • • • • • • • | •••• <u>20n</u>  | S  |
|        | • • • • • • •   | •••• 22n         | S  |
|        | • • • • • • •   | •••• 24n         | S  |
|        |                 | •••• 27n         | lS |
|        |                 | •••• 31n         | IS |
|        |                 | <u>36n</u>       | S  |
| 30.    | Ons             | 40.0ns           |    |



Fig. 5-14. Comparisons of access time components for various 16-Mb DRAM's



| Organization        | 16MWords×1b/4MWords×4b                                                                          |
|---------------------|-------------------------------------------------------------------------------------------------|
| Process             | 0.5 μ m twin - well CMOS<br>(P substrate)<br>Triple poly Si / single polycide<br>/ double metal |
| Transistor          | Lp / Ln = 0.7 $\mu$ m / 0.60 $\mu$ m                                                            |
| Memory cell         | Stacked type capacitor<br>1. 16 $\times$ 2.62 = 3.04 $\mu$ m <sup>2</sup>                       |
| Chip size           | $6.52 \times 15.89 = 103.6 \text{ mm}^2$                                                        |
| Supply voltage      | $1.8V \sim 3.6V$                                                                                |
| Access time         | tRAC = 20 nsec (3.3V, R.T.)<br>= 36 nsec (1.8V, R.T.)<br>tAA = 10 nsec (3.3V, R.T.)             |
| Current consumption | Icc1 = 57 mA (tRC = 80 nsec)<br>Icc2 = 5 $\mu$ A                                                |
| Function            | Fast page / Static column<br>Self refresh<br>16 bit parallel test                               |
| Refresh cycle       | 4096 cycle                                                                                      |
| Redundancy          | 320 rows × 96 columns                                                                           |
| Package             | 300 mil 26 pin SOI                                                                              |



1) row access time reduction of 3ns by introducing the CSAC scheme and doubling the number of the WL shunt, 2) bit-line sensing time reduction of 5ns by introducing the low VT GISA and doubling the number of the distributed S/A drivers, and 3) column access reduction of 6ns mainly by introducing the PCAR coupled with the CSAC and the NPCA scheme. Also a 16ns access time reduction is achieved using the 0.5 $\mu$ m device technology, the chip architecture for reducing the charging and discharging current, and the optimized layout that minimizes the length of the signal path by employing the newly LOC assembly technique.

### 5-5-4 Process and Other Performance of This Chip

Process and the other performance of this chip are summarized in Table 5-1.

This chip not only achieves a 20ns RAS access speed, but also provides a low current consumption of less than 57mA for a 80ns cycle time at the Vcc of 3.3V.

In this DRAM, in order to overcome the problem of the low-voltage operation, such as the decreasing of the storage-charge amount, the sufficient memory capacitance Cs of 30fF has been realized by using the stacked type capacitor. Moreover, to repair not only the defective-cells but several shorter-retention-cells, the much redundancy has been employed, for example 320-rows and 96-bit-line-S/As.

### 5-6 Conclusion

A battery-operated 16-Mb CMOS DRAM with an address multiplexing has been developed by using an existing  $0.5\mu$ m CMOS technology. This DRAM can access data in just 36ns when powered from 1.8-V battery-source, and 20ns at 3.3V. However, this requires a mere 57mA of operating current for a 80ns cycle time and a only 5µA of standby current at 3.3V. The above mentioned high-speed and low-power performance is mainly achieved through the newly four circuit techniques. This DRAM is actually capable of giving much higher performance, allowing DRAM to move into a new range of high speed battery-operated applications.

### References

[1] S.M Yoo ., et al., "Variable Vcc Design Techniques for Battery Operated DRAMs", in Symposium. on VLSI Circuits Digest of Technical Papers, pp. 110 - 111, Jun. 1992.
[2] H. Yamauchi ., et al., "A 20ns Battery-Operated 16Mb CMOS DRAM", ISSCC Digest of Technical Papers, pp. 44 - 45, Feb. 1993.

[3] K. Sasaki ., et al., " A 9ns 1Mb CMOS SRAM ", ISSCC Digest of Technical Papers, pp. 34 - 35, Feb. 1989.
[4] T.N.Blalock ., et al., " A High-Speed Sensing Scheme for 1T Dynamic RAM's Utilizing the Clamped Bit-Line Sense Amplifier ", IEEE J. Solid-State -Circuits, Vol. 27, pp. 618 - 625, Apr. 1992.
[5] H. Yamauchi ., et al., " A Circuit Design to Suppress Asymmetrical Characteristics in High-Density DRAM Sense Amplifiers ", IEEE J. Solid-State -Circuits, Vol. 25, pp. 36 - 41, Feb. 1990.

[6] T. Yamada ., et al., " A 64-Mb DRAM with Meshed Power Line ", IEEE J. Solid-State -Circuits, Vol. 26, pp. 1506 - 151, Nov. 1991.

## **Circuit Technology for CHAPTER-6 High-Speed Battery-Operated SRAMs**

### Abstract

This paper proposes a 0.5V / 100MHz / sub-5mW-operated 1-Mbit SRAM cell architecture which uses a Boosted and Offset-Grounded data Storage (BOGS) scheme. The key target of BOGS is to minimize the charge amount supplied from the embedded charge pump circuits, which are required to boost the effective gate to source voltage (V<sub>O</sub>=V<sub>GS</sub>-V<sub>T</sub>) up to 0.8V necessary to achieve 100MHz-operation even at 0.5V single power-supply. Thus, the key low-power strategy of BOGS is "putting the right (higher efficiency) boosted power-supply from charge pump circuit into the right position (less power consumed transistor) in a SRAM cell". This paper is focused on why BOGS can realize a greater savings of the charge amount supplied from the boosted power-line and can reduce the power dissipation to  $\leq 1/30.4$  and  $\leq 1/3.9$  compared to the previously reported negative source-line drive (NSD) scheme<sup>[1]</sup> and negative word-line drive (NWD) scheme<sup>[2]</sup>, respectively, while achieving a 0.5V / 100MHz-operation.

### **6-1** Introduction

In near future, the demand for solar-battery-operated well-miniaturized portable equipment's, such as a wrist-watch type computer, could be increased in order to create a new personal multimedia market. One of the key issues in realizing such computer, is a reduction in operating power supply voltage to a solar-cell voltage of 0.5V or less. This means a 0.5V single power supply operated SRAM with 100MHz operation must be necessary to enable the computer system to execute a read and/or write operation between the central processing unit (CPU) and cache memory or frame buffer memory.

In realizing the SRAM supply to 0.5V, it is important to maintain sufficient  $V_{GS}-V_T$ for 100MHz operation, while still avoiding subthreshold leakage. The relationship between the supply voltage V<sub>CC</sub> and threshold voltage V<sub>T</sub> when needed to keep the 100MHz SRAM operation is shown in Fig. 6-1(a). This graph shows the V<sub>T</sub> should be changed from 0.6V down to -0.3V when the number of the solar cell is reduced from the triple to a single, in order to keep the large enough VGS-VT. However, Fig.6-1(b) clarifies the subthreshold leakage would be increased by the 9 orders of magnitude, that



is, booming up to 10A assuming a 1M-bit SRAM. This implies that the 100MHz operation with a single solar-battery-supply no longer can be realized only by using a conventional VT reduction technique.

In addition, the another problem in the SRAM cell is briefly touched before discussing this target. Figure 6-2(a) shows the concept of the power down scheme. This features to cut-off the power switch P1 during the sleep mode, in order to suppress the leakage IDC flowing from the power line to the ground line through the peripheral circuits with low threshold voltage. However, power-down must not be done on the SRAM memory cell, because it would lose data without the power supply. Therefore, the  $V_T = -0.3V$  which causes the intolerable large leakage, can not be used in the SRAM cell, unlike in the peripheral circuit. As a result, this V<sub>T</sub> constraint causes to increase the delay time in the cell-portion, as shown in Fig.6-2(b).

Thus, this target is to meet the following conflicting requirements, simultaneously: 1) to gain the strongly ON state (e.g.  $V_{GS}-V_T \ge 0.8V$ ) in the cell transistor, just so as to keep the 100MHz operation. On the other hand, 2) to get the strongly OFF state (e.g.  $V_{GS}-V_T \leq -0.5V$ ), in order to realize the low-power data retention without any leakage problem.

Figure 3 shows the bit-line (BL) access delay time (twL-BL), (which is defined as the time from when word-line (WL) rising to when developing the BL potential difference up to 100mV), as a function of the effective gate to source voltage ( $V_O = V_{GS} - V_T$ ) in the access transistors and drive transistors. Assuming that the BL access delay time (twL-BL) is limited to the  $\leq 1/4$  of the cycle time ( $\leq 2.5$ ns at 100MHz operation), the V<sub>O</sub> of 0.8V is necessary to keep the 100MHz operation. Thus, here in after, the required Vo necessary to keep the 100MHz operation is defined as 0.8V, unless otherwise noted.

In recent years, the negative source driving (NSD) scheme<sup>[1]</sup> has been reported as the centerpiece of sub-1V-operated high-speed SRAM cell architecture. This is because the negative-driving of virtual source-line (0V -> -0.75V) can be exploited to increase VGS- $V_T$  of the access and drive transistors up to 0.8V, while maintaining the  $V_T=0.6V$ , as shown in Fig.6-4. This enables a considerable reduction in the BL access delay (twL-BL) to  $\leq 2.5$  ms, while preventing the subthreshold leakage current from booming. For example, if  $V_{T}$  = -0.3V is required to achieve  $V_0$ =0.8V even at  $V_{CC}$ =0.5V, the leakage results in increase by 9 orders of magnitude, as shown in Fig.6-4.

However, even when exploiting the NSD scheme[1], there inevitably comes an increase in power dissipation in the negative power-supply circuit based on the chargepump circuit (-Pump-C shown in Fig.6-5(b)). This is because of an intolerable degradation in the supply-efficiency ( $\eta_C$ ) of -0.75V in the charge-pump circuit at



cell architectures.

Fig.6-4 Concept comparison of 0.5V single power-supply operated SRAM

V<sub>CC</sub>=0.5V. This results from the increase in the required number of pumping stage[3] necessary to achieve the target voltage (-0.75V), as shown in Table I. Note that negative power-supply of -0.75V is needed to discharge the heavy loads of BL and virtual source-line (SL) at V<sub>CC</sub>=0.5V.

On the other hand, to solve this, the negative word-line drive (NWD) scheme[2] has been recently proposed.

The NWD scheme features as follows: 1) using a negative  $V_T$ =-0.3V in the access transistor, in order to achieve (V<sub>GS</sub>-V<sub>T</sub>)=0.8V, 2) pulling down the WL up to -0.9V in order to avoid the standby leakage through the access transistor even if using  $V_T=-0.3V$ . However, unfortunately, NWD newly induces the problem of leakage flowing from the boosted storage node (NH) to the BL, just when WL goes high, unlike when WL remains -0.9V. This problem causes an intolerable large power dissipation in the charge pump circuit of +Pump-E, which is needed to charge the N<sub>H</sub> up to 1.4V (shown in Fig.6-5(c)).

Thus, to solve the above problems, while targeting for a 0.5V /100MHz /sub-5mWoperated 1-Mbit SRAM cell, we have newly developed a Boosted and Offset-Grounded Data Storage (BOGS) scheme<sup>[5]</sup>. This is based on the our recently proposed offsetsource driving (OSD) scheme<sup>[4]</sup>, which was optimized only for a 0.8V single batteryoperated SRAMs.

The key target of the BOGS scheme is to minimize the charge amount supplied from the embedded charge pump circuits, which are required to increase the VGS-VT up to 0.8V, necessary to achieve 100MHz-operation even at V<sub>CC</sub>=0.5V. Why BOGS can realize a greater savings of the charge amount supplied from the boosted power-line is the focus of this paper.

We discussed why BOGS can reduce the power dissipation to  $\leq 1/30.4$  and  $\leq 1/3.9$ compared to the previously reported NSD scheme[1] and the NWD scheme[2], respectively, while achieving a 0.5V/ 100MHz-operation.

In Section 6-2, the following details of the BOGS scheme are given by comparing to the NSD and/or NWD schemes: 1) concept of BOGS, 2) how to minimize the consumed charge amount for SL driving, 3) why can BOGS reduce the cell leakage from the boosted N<sub>H</sub> to the BL unlike conventional scheme, 4) charge-recycle virtual SL driving scheme. The power consumption comparisons between the proposed BOGS and the conventional schemes are discussed in Section 6-3, following conclusion.







Fig. 6-5 Concept comparisons of the charge-pump power-supply schemes and their charge-pump current paths, (a) This Work (BOGS), (b) Negative source drive (NSD), (c) Negative word-line drive (NWD).

### 6-2 Deep Sub-1V High-Speed SRAM Cell Strategy

### 6-2-1 Concept of Boosted and Offset-Grounded Data Storage (BOGS) Scheme

As shown in Figs. 6-4 and 6-5(a), the key concept of the BOGS scheme features as follows :

1) "potential-shifting" of the data storage-node pairs  $N_H/N_L$  of "high"/"low" (0.5V/0V -> 1.3V/0.65V) for the unselected cell. This enables the charge pump circuits (Pump-A and/or Pump-B) to get free from the role of charge supply needed to drive the virtual SL for BL discharging, resulting from using the GND. Note that how is the power for virtual source-line ( $V_{VPL}$ =0.65V shown in Fig. 6-5(a)) generated, is described in section 6-2-5.

2) "equalizing the boosted level of power-supply" between for the BL-precharging and WL-driving, which is needed to avoid the cell leakage flowing from the boosted  $N_H$  to BL.

3) virtual SL driving (0.65V -> 0V), which can increase the V<sub>O</sub> of the access and drive transistors up to 0.8V, when accessing the cell. On the other hand, during standby state, the virtual SL with potential of V<sub>VPL</sub> (=0.65V) can suppress the leakage through the access transistor with V<sub>T</sub>=0V, resulting from providing the negative V<sub>GS</sub> to the access transistors.

4) charge-recycling virtual SL control, making it possible to save the power consumption when resetting the potential of virtual SL ( $0V \rightarrow 0.65V$ ), and

5) column-decoded virtual SL drive in WL direction, enabling to realize a pseudo cross-point access.

BQGS can meet the following requirements simultaneously :

1) minimizing the charge amount  $Q_{Total}$  supplied from pumping circuit (-Pump-C or +Pump-B) required for charging and discharging the heavy loads of BL-capacitance  $C_{BL}$  and source-node capacitance  $C_{SN}$ , as shown in Fig.6-7, and

2) avoiding the leakage  $I_{LK}$  from the boosted storage node  $N_H$  to BL, which causes a large amount of idle charge dissipation in the pumping circuit (+Pump-E or +Pump-A), as shown in Figs. 6-8(a) and 6-8(b).

In addition, another attractive advantage of BOGS compared to NSD, is SRAM cell stability at  $V_{CC}=0.5V$ . This results from larger potential differences of 0.65V and 1.3V between storage node pairs (N<sub>H</sub> /N<sub>L</sub>) for unselected and selected cells, respectively, at  $V_{CC}=0.5V$ , as shown in Fig.6-4. This contributes to harden against an influence of cell offset voltage due to V<sub>T</sub> mismatch.



Fig.6-6. (a) Timing diagram of proposed (BOGS) scheme, (b) bit-line write signal swing needed to invert the storage data vs. offset source potential level.

### 6-2-2 BOGS read/write operation

Before discussing the power saving of the Boosted and Offset-Grounded Data Storage (BOGS) scheme, the read/write operation scheme is briefly touched. Figure 6-6 shows the timing diagram of BOGS scheme.

### 1) Read Cycle

In the read cycle, the potential of storage node NL of the selected cell is pull down to GND by driving the virtual SL, when the WL goes high. This gives rise to the higher drain current of the access and drive transistors, which increases the BL discharging speed, resulting from both larger VGS and lower VT caused by relaxing the body effect, which depends on the potential difference between the source and substrate electrodes. Thus, the BOGS scheme can provide a faster BL discharging.

### 2) Write Cycle

In the write cycle, only the selected SL with potential of VVPL is controlled to be separated from others and to be kept floating for the former half of the cycle, and after that, it is connected with others unselected SLs for the latter half of the cycle. Since the following two reasons, a smaller BL voltage swing is sufficient for writing: 1) floating potential (=VVPL) of the virtual SL makes the latch capability of drive transistor pairs weaken, 2) using a  $V_T = 0V$  for the access transistors gives a larger  $V_{GS}-V_T$  without the deeper write level of BL. Where, write level of BL corresponds to VS. On the other hand, NSD<sup>[1]</sup> chooses the way of half-V<sub>CC</sub> BL precharging to realize a larger V<sub>GS</sub>-V<sub>T</sub> in access transistors. This contributes to reduces the required BL swing necessary to invert the storage data. However, half-V<sub>CC</sub> BL precharging tends to make static noise margin worsen in read operation<sup>[2]</sup>. This is reason why BOGS chooses to use  $V_T = 0V$ for the access transistors instead of using a half-V<sub>CC</sub> BL precharging.

Figure 6-6(b) shows the required BL write signal swing necessary to invert the storage data as a function of the VVPL. BOGS with VVPL=0.65V can reduce the BL swing ( $\Delta V_{WR}$ ) down to 0.2V required for writing. This can suppress the  $\Delta V_{WR}$  to 1/3 of that compared to the conventional scheme with ground fixed SL. This means BOGS can save the 2/3 of BL charging for write operation compared to the conventional one.



Fig.6-7 Concept comparison of dissipated charge amounts supplied from charge-pump B and C, between this work (BOGS) and negative source drive (NSD), (a) This Work (BOGS), (b) Negative source drive (NSD).

$$Q_{SD}/\eta_C + Q_{PR} = Q_{PR} \cdot (22/\eta_C + 1)$$

### 6-2-3 Minimizing charge amount for source over-driving

The key to minimizing the charge amount Q<sub>Total</sub> necessary for virtual SL driving at  $V_{CC}=0.5V$ , is to use the GND, instead of the negative power (= -0.75V) like NSD[1]. This is because the supply efficiency of 0V through GND is 100%, while that of negative power (-0.75V) supplied from charge pump circuit is 10% at the most, as shown in Table 6-I.

Furthermore, the charge amount for the negative SL driving (QSD), is truly huge, compared with that of BL-precharging (QPR), as shown in Fig.6-7(b). Note that QSD consists of 1) the charge (QSL) at 0.75V-swing on the total source-node capacitance (C<sub>SN</sub>) of 256-cells connected in common and 2) the charge at  $\Delta V=0.15V$ -swing on the capacitance ( $C_{BL}$ ) of discharged BL.

In order to compare the charge amount Q<sub>Total</sub> between the BOGS and the NSD schemes, the following expressions are introduced. And the supply efficiency of each charge pump circuit (+Pump-A, +Pump-B, -Pump-C, -Pump-D, and +Pump-E) is assumed as shown in Table 6- I.

Note that each supply efficiency is based on the simulated data of the Dickson's charge pump circuit<sup>[3]</sup>. The  $V_T$  of the charge-transfer MOSFET is assumed as 0.3V. Thus, unless otherwise noted, each supply efficiency ( $\eta_A$ ,  $\eta_B$ ,  $\eta_C$ ,  $\eta_D$ , and  $\eta_B$ ) is based on the Table 6-I.

In BOGS, the required charge amount (QSL BOGS) necessary to reset the potential of SL (GND -> 0.65V) is estimated by

where C<sub>SN BOGS</sub> represents capacitance of SL which is connected with the source electrode of drive transistor pairs of 1-cell and 0.65V means the voltage swing of SL as shown in Fig. 6-7(a). Note that C<sub>SN BOGS</sub> includes the capacitance of storage node N<sub>L</sub> of 1-cell besides wiring capacitance and junction capacitance of other transistors. Next, the charge amount (QPR BOGS) needed for the BL precharging is given by

 $Q_{PR BOGS} = (C_{BL}) \cdot \Delta V$  .....(2)

where  $C_{BL}$  represents capacitance of SL and  $\Delta V$  denotes the voltage swing of BL. Therefore, the total charge amount (Q<sub>Total BOGS</sub>) of the BOGS is as follows:

where,  $\eta B$  represents the supply efficiency of the charge pump circuit (+Pump-B) On the other hand, in the NSD scheme the charge amount (QSL NSD) needed for the SL driving is given by

where, C<sub>SN NSD</sub> represents capacitance of SL, which is connected with the source electrodes of drive transistor pairs of 256-cells, and 0.75V means the voltage swing of SL, as shown in Fig.6-7(b).

Since C<sub>SN NSD</sub> includes the gate to source channel capacitance and junction capacitance of the N-MOSFET drive transistor pairs and P-MOSFET load transistor pairs, the total capacitance of that is 4.2 times larger than the BL capacitance CBL, which consists of only junction capacitance of the access transistors, as shown in Fig.6-7(b). Therefore, C<sub>SN NSD</sub> is given by

### CSN NSD=CBL•4.2

Then, in the NSD scheme, since the negative power supply is required to discharge both of the BL capacitance CBL and SL capacitance, the total charge amount needed for SL driving is estimated by  $Q_{SD NSD} = (C_{BL} \cdot \Delta V + Q_{SL NSD})$  .....(6)

Then, (6) can be rewritten by using (4) and (5), as follows:

This expression means the charge amount (QSL NSD) needed for the SL driving is 22times larger than the charge amount (QPR NSD) needed for the BL precharging. This is mainly based on the following reasons : 1) CSN NSD is 4.2-times larger than CBL, and 2) the swing of SL (0.75V) is 5-times larger than that ( $\Delta V=0.15V$ ) of BL.

 $= (C_{BL}) \cdot \Delta V / \eta_B + (C_{SN} BOGS \cdot 0.65V) \dots (3)$ 

 $Q_{SL NSD} = (C_{SN NSD} \cdot 0.75V)$  .....(4)

.....(5)

 $Q_{SD NSD} = C_{BL} \cdot \Delta V (1 + 4.2 \cdot 0.75 / \Delta V) \dots (7)$ 

where, assuming that  $\Delta V=0.15V$  and  $Q_{PR}$  NSD=(C<sub>BL</sub>) •  $\Delta V$ , QSD NSD is given by

 $Q_{SD NSD} = 22 \cdot Q_{PR NSD}$  .....(8)

Therefore, the total charge amount (QTotal NSD) of the NSD scheme is as follows:

$$Q_{\text{Total}_N\text{SD}} = Q_{\text{SD}_N\text{SD}} / \eta_{\text{C}} + Q_{\text{PR}_N\text{SD}}$$
$$= (C_{\text{BL}}) \cdot \Delta V \cdot (22 / \eta_{\text{C}} + 1) \dots (9)$$

where,  $\eta C$  represents the supply efficiency of the charge pump circuit (-Pump-C) As a result, the ratio (RT) of the consumed charge amount between BOGS and NSD can be expressed as follows:

 $RT = (22 / \eta_C + 1) / (1 / \eta_B + G) \dots (10)$ 

where, RT represents (QTotal NSD / QTotal BOGS). G represents (QSL\_BOGS / QPR\_BOGS).

Since G is less than 0.1, (10) can be rewritten as follows:

 $RT = (22 \cdot \eta_B / \eta_C + \eta_B) \dots (11)$ 

In the BOGS scheme, potential-shifting of NL up to 0.65V permits to use the GND for SL driving, as shown in Fig. 6-7(b). This is reason why BOGS contributes to save the power dissipation in SL driving, compared to NSD. In other words, BOGS no longer requires negative power supply when discharging the BL as shown in Fig. 6-7(b). Thus, BOGS never consumes QSD-NSD / yC unlike NSD. Furthermore, for the BOGS scheme, the charge amount Q<sub>Total BOGS</sub> is reduced to  $\leq$  1/33 that of NSD (Q<sub>Total NSD</sub>), which can be estimated by using expression (11), despite using the boosted power (V<sub>UPB</sub>=0.8V) supplied from charge pump (+Pump-B) to precharge the BL-capacitance (C<sub>BL</sub>). This is because of the following reasons : 1) QPR BOGS is less than 1/22 of QSD NSD, and 2) supply efficiency (nB) of 0.8V-VUPB is 15%, while that of -0.75V-V<sub>DPC</sub> (η<sub>C</sub>) is 10%.

### 6-2-4 Eliminating Leakage from Boosted Storage Node to Bit-Line

Our targeting SRAM must meet the following requirements for the access and drive transistors, composing the SRAM cell: 1) V<sub>0</sub>=0.8V, necessary to achieve the 100MHzoperation when selected, and 2)  $V_0 = -0.6V$  necessary to suppress the subthreshold leakage current to 0.2µA even at 1M-bit cell array when unselected.

To meet these, the BOGS scheme employs as follows: 1) SL potential-shifting up to



Fig. 6-8 Concept comparison of cell leakage ILK supplied from charge-pump A and E, between (a) negative word-line drive (NWD) and (b) this work (BOGS)

0.65V, 2) using  $V_T=0V$  in the access transistor, 3) boosting storage node  $N_H$  up to 1.3V, and 4) equalizing boosted potential level between for BL precharging and WL driving, as shown in Fig.6-8(b).

In these voltage-relations, 1) both of the boosted potential of  $N_H$  (=1.3V) contribute to achieve  $V_O=0.8V$  in drive transistor even using  $V_T=0.5V$ , when selected, and 2) virtual SL potential of  $V_{VPL}$  (=0.65V) allows to use  $V_T=0V$  in access transistor without any leakage problem, resulting from keeping  $V_O=-0.75V$  when unselected. This results from  $V_{GS}=-0.65V$  and  $V_T'=0.1V$ . Where,  $V_T$  and  $V_T'$  represent the threshold voltage at  $V_{BS}=0V$  and =0.6V, respectively.  $V_{BS}$  is potential difference of substrate to source voltage.

On the other hand, to meet these, the NWD scheme must introduce as follows: 1) negative  $V_T$ =-0.3V in the access transistor, 2) boosted storage node N<sub>H</sub> up to 1.4V, and 3) negative WL driving up to -0.9V, as shown in Fig.6-8(b).

However, unfortunately, NWD suppresses the leakage only during standby period. On the contrary, when WL is activated, NWD newly induces the problem of leakage from the storage node  $N_H$  to BL when WL goes high, unlike when WL remains a negative level, as shown in Fig.6-8(a). This problem causes a large power dissipation in the +Pump-E, for charging  $N_H$  up to 1.4V (shown in Fig.6-5(c)).

On the other hand, the BOGS scheme can solve the above problem by equalizing the boosted potential level between for BL-precharging and WL-driving. This enables to achieve  $V_0=0.8V$  in access transistor when WL goes high, even if using  $V_T=0V$  instead of  $V_T=-0.3V$ , while keeping  $V_0=-0.15V$  when WL goes down, as shown in Fig.8(b). Thus, the leakage I<sub>O</sub> when WL goes high, can be reduced to  $1/10^{3.5}$  compared to NWD at 100MHz-operation, resulting from the reduction in  $V_O$  from 0.2V to -0.15V. As a result, BOGS never requires a large power dissipation in the +Pump-A, which is needed to charge the N<sub>H</sub> up to 1.3V.

When charging the N<sub>H</sub> up to V<sub>UPA</sub> (=1.3V) when write-access, the charge dissipation in +Pump-A can be neglected compared to that of BL precharging for BOGS or that of SL driving for NSD. This is because despite a supply-efficiency ( $\eta_A$ ) of  $\leq 10\%$  as low as the case of +Pump-C (-0.75V), the charge amount for charging the N<sub>H</sub> requires only 0.4% of that of NSD for SL driving. This is due to the reduction in the number of the storage node N<sub>H</sub> to be charged/discharged, resulting from using column decoded virtual SL drive.

Furthermore, BOGS gives the advantage of access speed (9%-delay of  $t_{WL-BL}$  reduction as shown in Fig.6-9), resulting from the reduced junction capacitance C<sub>j</sub> dominating C<sub>BL</sub>. This is due to the increased junction bias and increased drain-source current I<sub>DS</sub>, resulting from the increased V<sub>DS</sub> by boosting the BL-precharging level.

Negative Word-line (NWD)

This Work (BOGS)



-9% Delay reduction due to reduced CJ and increased IDS, resulting from increased junction bias and VDS in access Transistor, respectively.

Fig.6-9 BL delay time comparison between this work (BOGS) and negative word-line drive (NWD).

### 6-2-5 Charge-Recycle Over Supply Voltage (V<sub>CC</sub>) Virtual SL Over- Driving Scheme

BOGS also features the column-decoded virtual source-line (SLm, m=0-3) driving in WL-direction, enabling a pseudo cross-point access. Each of SLm is connected to the common source-node of the drive transistor pairs every four cells in WL-direction, as shown in Fig.6-10. This SL driving scheme gives to reduce the current consumption needed for SL driving to 1/8, compared to previously reported BL-direction SL driving scheme<sup>[6]</sup>, as shown in Fig.6-7. These SL<sub>m</sub> are laid out over the cell by using 3-rd metal without cell-area penalty. This contributes to suppress the increase in the charge dissipation, caused by supplying from charge-pump (+Pump-B) necessary to precharge the BLs, to 1/4 that of without this scheme. This is because BOGS reduces the number of cells connected in common to each SLm discharged to ground when WL goes high, compared to the BL-direction drive scheme[6].

BOGS reduces the BL-swing of unselected cell ( $\Delta V_{BL}$ ) to 1/256 that of the selected cell ( $\Delta V_{BI}$ ). This is because the SL potential of unselected cells is V<sub>VPL</sub>=0.65V, while that of selected cells is GND, as shown in Fig.6-13. This is reason why BOGS can realize the pseudo cross-point access.

Another attractive point of BOGS is to realize the charge-recycling virtual SL control, enabling to save the power consumption when resetting the SL at t=t1, as shown in Fig.6-12.

 $Q_0$  and  $Q_1$  shown in Fig. 6-12 are defined as follows: 1)  $Q_0$  is the total charge amount discharged from unselected BLs through virtual SLs when WL goes high, while virtual SLs remain V<sub>VPL</sub>=0.65V, 2) Q<sub>1</sub> is the required charge amount necessary to recover the potential of virtual SL from GND to VVPL=0.65V. The charge Q0 dumped to virtual SL with capacitance of CVPL is almost the equal to the required charge Q1 necessary to pull up the SL (0V -> 0.65V) up to VVPL=0.65V, as shown in Fig.6-14. Thus, this implies that charge Q<sub>0</sub> can be completely recycled to the charge Q<sub>1</sub> necessary to pull up the SL.

Here, how is VVPL generated is explained. As shown in Fig. 6-14, it is found that V<sub>VPL</sub>=0.65V is automatically fixed based on the balance between Q<sub>0</sub> and Q<sub>1</sub> even if without an extra voltage generator, because VVPL depends on Q0 and Q1 and viceversa.

However, when considering noise issues due to ground bounce and V<sub>CC</sub> fluctuation, it is clear that voltage regulator (shown in Fig.6-10) is required to solve such problems and to keep VVPL stable, but no longer necessary for large current consumption in every

### SL charging.

Since the total capacitance (CVPL) of virtual SL is 4096-times larger than CSSL of SL

(e.g, SL<sub>1-3</sub> in Fig.6-10), the potential bounce  $\Delta V_{VPL}$  is suppressed to only 0.2mV even just after Q<sub>0</sub> injection to virtual SL, as shown in Fig.6-15. Thus, BOGS provides a stable SL control and a pseudo cross-point access without any power-loss.

### 6-3 Power Comparisons and Discussions

### 6-3-1 Supply Voltage $V_{CC}=0.5V$ , $V_{CC} < 0.8V$

According to the simulated data of the 0.35µm 1M-bit CMOS SRAM, BOGS can save up to 1/30.4 the power of NSD at V<sub>CC</sub>=0.5V, as shown in Fig.6-16(b) and Fig.6-17. This is mainly due to the 97% reduction in the source and BL driving current consumption, which is the dominant factor of the SRAM-operating current, resulting from as follows: 1) saving the source-resetting current by using the charge-recycling SL control, 2) avoiding to use the negative power (-0.6V) for BL-discharging by shifting the potential range of the SL driving from (0V -> -0.75V) to (0.65V -> 0V). This results in avoiding over 90% supply loss, and 3) suppressing the swing of unselected BL to 1/256 by using the pseudo cross-point access, as shown in Fig.6-13.

Note that the power consumption of charge pump accounts for about 90% and 70% of total power dissipation for NSD at V<sub>CC</sub>=0.5V and =0.9V, respectively, as shown in Figs.6-16(b) and 6-17.

BOGS can save up to 1/3.9 the power of NWD at V<sub>CC</sub>=0.5V, compared to NWD, as shown in Fig. 6-16(a) and Fig. 6-17. This mainly results from reduction of the leakage current flowing from the boosted storage node N<sub>H</sub> through BL. This is caused by equalizing the boosted potential level between for BL precharging and WL driving. This enables to reduce the V<sub>O</sub> down to -0.15V, unlike V<sub>O</sub>=0.2V for NWD when WL goes high. As a result, the leakage IO reduces by 30mA at 100MHz-operation.

power dissipation for NWD at  $V_{CC}=0.5V$  respectively, as shown in Fig.6-17.

A drawback caused by using boosted power for BL precharging, can be found when take a look at the BL power consumption for BOGS shown in Fig.6-17. However, this power overhead can be reduced to less than 30% compared to NWD, resulting from using the cross-point access scheme. This is because this can reduce the number of BLs needed to be precharged to 1/4 that of NWD, in spite of the increase of the power consumption needed for precharging per BL.

In addition, this implies that using the boosted voltage supplied from the charge pump circuit makes it possible to push down the minimum operating voltage for the SRAM up to 0.5V or less, with 5mw power consumption for 100MHz operation, unlike using the

Note that the power consumption of charge pump accounts for about 90% of total



This scheme contributes a reduction of BL charging current due to a pseudo cross-point cell access, while enabling to save the source resetting current owing to charge-recycling between  $Q_0$  and  $Q_1$ .















(a) Timing diagram of charge-recycle operation from  $Q_0$  to  $Q_1$ , (b) Concept of charge-recycle operation from  $Q_0$  to  $Q_1$ .

Fig.6-13 Suppression of unselected bit-line discharging vs. potential of virtual SL potential VVPL.











Fig.6-16 Vcc dependence of power consumption comparisons :
(a) between this work (BOGS) and
negative word-line drive (NWD) scheme,
(b) between this work (BOGS) and negative source drive (NSD).





| Architect.                        | Single p       | ower Vcc (V)   | 0.5V   | 0.6V   | 0.7V   | 0.8V            | 0.9V |
|-----------------------------------|----------------|----------------|--------|--------|--------|-----------------|------|
| for<br>BOGS<br>+Pump-A<br>+Pump-B | +Pump-A        | VOUT (V)       | 1.3V   |        | -      | -               | -    |
|                                   |                | Efficiency (%) | 10%    | 15%    | 20%    | 25%             | 30%  |
|                                   | +Pump-R        | Vout (V)       | 0.8V   |        | -      | Not<br>Required | -    |
|                                   | Efficiency (%) | 15%            | 25%    | 30%    | 100%   |                 |      |
| for<br>NSD –Pump-C                | VOUT (V)       | -0.75V         | -0.65V | -0.55V | -0.45V | -0.35V          |      |
|                                   | Efficiency (%) | 10%            | 15%    | 20%    | 23%    | 30%             |      |
| for<br>NWD<br>+Pump-E             | VOUT (V)       | -0.9V          | -0.8V  | -0.7V  | -0.6V  | -0.5V           |      |
|                                   | Efficiency (%) | 9%             | 13%    | 18%    | 21%    | 28%             |      |
|                                   | Dump F         | VOUT (V)       | 1.4V   |        | -      |                 | -    |
|                                   | Efficiency (%) | 9%             | 14%    | 18%    | 23%    | 28%             |      |

| Table 6-I. VCC dependence of supply | efficiency and of required output voltage |
|-------------------------------------|-------------------------------------------|
|                                     | from charge pump A.B.C.D, and E.          |

deeply low  $V_T$  (<0V), as shown in Fig.6-16(a).

### 6-3-2 Supply Voltage $V_{CC} \ge 0.8V$

Another interesting point in Figs. 6-16(a) and 6-16(b) is that the power dissipation in the SRAM memory core gets the minimum point at the V<sub>CC</sub>=0.8V. This is because the  $V_{CC}=0.8V$  is the required minimum  $V_{CC}$  necessary to achieve the  $V_{O}=0.8V$  without any boosted power supply and without any leakage problem due to using low V<sub>T</sub>.

BOGS requires the boosted potential of 0.8V for WL driving and BL precharging when  $V_{CC} \le 0.8V$ , as explained in Section II. On the other hand, such boosted power is no longer required when the V<sub>CC</sub> is 0.8V or more. This is because V<sub>O</sub> gets up to 0.8V without reducing the V<sub>T</sub> down to beyond 0V.

### 6-4 Conclusion

Once targeting a 0.5V single battery-operated high-speed SRAM without an external power (e.g, -0.75V, 3V), the proposed BOGS becomes the most attractive candidate instead of NSD<sup>[1]</sup> and NWD<sup>[2]</sup> for suppressing the operating power to  $\leq$  5mW, while achieving a 64bit-100MHz operation at V<sub>CC</sub>=0.5V and maintaining less than 0.2µA subthreshold leakage current for 1M-bit cell array (16K-word x 64-bit). This is because BOGS can minimize the charge amount supplied from charge-pump circuits, resulting from putting the boosted power-supply with higher efficiency into the less power consumed position of transistors in the SRAM cell.

### References

[1] H. Mizuno and T. Nagano, " Driving Source-Line(DSL) Cell Architecture for Sub-1-V High-Speed Low-Power Applications." Technical Digest of Symposium on VLSI circuits, pp. 25-26, Jun.1995.

[2] K. Itoh, A.R. Fridi, A. Bellaouar, and M.I. Elmasry, " A Deep Sub-V, Single Power-Supply .SRAM Cell with Multi-Vt, Boosted Storage Node and Dynamic Load" Technical Digest of Symposium on VLSI circuits, pp. 132-133, Jun.1996. [3] J. Dickson, "On-Chip high voltage generation in NMOS integrated circuits using an improved voltage multiplier technique." IEEE Journal of Solid-State Circuits, Vol.SC-11,pp.374-378,1976

[4] H. Yamauchi, T. Iwata, H. Akamatsu, and A. Matsuzawa, " A 0.8V/100MHz/sub-5mW-Operated Mega-bit SRAM Cell Architecture with Charge-Recycle Offset-Source Driving (OSD) Scheme" Technical Digest of Symposium on VLSI circuits, pp. 126-127,

### Jun.1996.

[5] H. Yamauchi, T. Iwata, H. Akamatsu, and A. Matsuzawa, "A 0.5V/100MHz Over-Vcc Grounded Data Storage (OVGS) SRAM Cell Architecture with Boosted Bit-line and Offset Source Over-Driving Scheme" Technical Digest of International Symposium on Low Power Electronics and Design, pp.49-54, Aug.1996.

[6] N. Shibata, M. Watanabe, " A Low-power Synchronous SRAM Macrocell with Column-address Controlled Virtual-GND lines" Technical Report of IEICE, ICD95-171, pp.45-52, Nov.1995.

# **CHAPTER-7** Conclusion

### 7-1 Conclusion of This Study

This thesis has given the answer and/or the hint for realizing the low power circuit technologies for batter-operated devices, which enable to surmount the facing obstacles when meeting the following requirements : 1) never ending demands for reducing power consumption per bit transmission in memory systems (e.g., between DRAM and processor / graphics controller), - e.g., more than 3GB/s at less than 300mW power consumption even for the bus capacitance of 14pF and Vcc = 3.6V, 2) emerging demand for diminishing DRAM data retention current during battery back-up period as low as SRAM which needs no refresh operation, - e.g., less than 0.5 $\mu$ A/MB and 3) ever increasing demand for accommodating the operating voltage to the scaled voltage supplied from a single battery - e.g. ~0.9V of Ni-Cd cell and ~0.5V of solar cell, while keeping 100MHz operation and sub- $\mu$ A standby current.

Summary including new findings through this thesis are as follows : (1) To bear up under power hungry requirements in realizing more than 3GB/s data transfer rate through the bus whose capacitance is as large as 14pF, the charge-recycling data transfer scheme has been developed, which enables to save the power consumption (P) by the quadratic factor of suppressing ratio m of data bus swing as shown by (P=f•C/m•Vcc/m•Vcc). Assuming m is 8, the power saving factor dramatically increases up to 64 (8-squared), in turn, conventionally consumed 5W-power have resulted in saving down to merely 80mW. Such dramatically power reduction has been verified by the simulated and measured data.

Furthermore, the time-multiplexed charge-recycling data transfer scheme has also been developed. According to the findings given through simulated and measured data, the proposed technique can reduce the bus power consumption to 1/11 and 1/3 of that when the bus activities are 100% and 25%, respectively, while reducing the number of signal wires by half, compared to the parallel architecture.

(2) The obstacle when further reducing DRAM data retention current to replace SRAMs with DRAMs in battery-operated devices, is the necessity of power hungry refresh operation. To overcome this issue, we have developed the following circuit techniques: 1) the relaxed junction biasing scheme which enables to extend the data retention time by a factor of 3, resulting from relaxing the junction bias between the storage node and substrate and in turn, from reducing the junction leakage, and 2) plate

floating leakage monitoring timer which can extend the refresh interval by a factor of 30, resulting from setting the optimum refresh interval based on the DRAM's temperature dependence. As a results, these have contributed to diminish the current consumption down to sub 0.4µA/MB, which is as low as SRAM. Furthermore, the following circuit techniques have been developed : 1) gate received level detector, which provide higher gain for the leakage current from or to the potential monitored node such as substrate, and 2) dynamically controlled reference generator, which cuts off the static current resulting from on and off switching of the power supply.

The developed techniques have contributed to suppress the DC current to less than 0.1µA/MB, which is negligibly small even when compared to SRAM. By utilizing such techniques, the world's smallest data retention current of 0.5µA/MB has been accomplished by using experimental 16Mbit DRAM.

(3) To achieve the fast access time of less than 40ns even reducing the supply voltage to 1.8V, corresponding to the voltage of two Ni-Cd cells connected in series, five circuit techniques have been developed, as follows : 1) a parallel column access redundancy scheme featuring a current sensing address comparator, 2) a quasi-static cross-coupled data bus amplifier, 3) a gate isolated sense amplifier with low threshold voltage, 4) a layout that minimizes the length of the signal path by taking advantage of the lead on chip assembly technique, and 5) suppressing the asymmetrical characteristics in the sense amplifier when V<sub>T</sub> and gate length are scaling. By utilizing such techniques, the world's fastest battery operated 16Mbit DRAM with the RAS access time of 20ns at 3.3V and also 36ns even at 1.8V has been developed, while keeping the standby current of only 5µA.

(4) To accommodate the operating voltage to the single battery power supply voltage, which should be scaled down to 0.9V of Ni-Cd cell and beyond, like 0.5V of solar cell, the V<sub>T</sub> scaling have been chosen to compensate for the degradation in SRAM access speed - i.e. to keep 100MHz operation, while developing the circuit technology enabling to avoid the exponentially increased subthreshold leakage as V<sub>T</sub> is scaling. The key circuit technique to realize that is the offset data storage scheme, which enables to minimize the charge amount supplied from the embedded charge pump circuits. This provides the effective gate to source voltage (V<sub>GS</sub>-V<sub>T</sub>) up to 0.8V necessary to achieve 100MHz operation even at 0.5V single power supply. The possibility of realizing the 0.5V/100MHz SRAM operation, while suppressing the operating power of sub-5mW, has been verified by using simulated data.

According to the results of (1) through (4), it is expected that the low power circuit

technologies proposed in this thesis can meet the following requirements in batteryoperated semiconductor random access memory systems: 1) saving power consumption per bit transmission between DRAM and processor / graphics controller, - e.g., more than 3GB/s at less than 300mW power consumption even for the bus capacitance of 14pF and Vcc = 3.6V, 2) diminishing DRAM data retention current during battery backup period as low as SRAM which needs no refresh operation, - e.g., less than 0.5µA/MB and 3) accommodating the operating voltage to the scaled voltage supplied from a single battery - e.g. ~0.9V of Ni-Cd cell and ~0.5V of solar cell, while keeping 100MHz operation and sub-µA standby current.

Thus the proposed technologies can contribute to extend the battery life-time and to accommodate the operating voltage to single battery power supply voltage in batteryoperated semiconductor random access memory systems, resulting in increasing portability due to reducing battery size and weight and in getting a free from troublesome of quite often recharging necessary to recover battery supply voltage in portable battery operated devices.

### 7-2 Technical Prospect

New findings in managing the following key factors for realizing battery-operated devices are presented in this thesis : 1) data transfer power consumption saving per bit by breaking the CV<sup>2</sup>f barrier- i.e. by charge-recycling, 2) DRAM data retention current saving up to 0.5µA/MB as low as SRAM, and 3) operating voltage scaling up to solar cell voltage of 0.5V.

However, a number of problems are still left unsolved in putting into commercial production. Remaining of problems to be solved in making those fit for practical use and future technologies necessary to do that, are discussed in the following sections.

### 7-2-1 Remaining of Problems to be Solved

(a) Issues in utilizing charge-recycling data transfer bus (CRB) scheme

Significant problems to be solved in making the CRB scheme fit for practical use are as follows: 1) bus capacitance imbalance issues, 2) issues in setting an initial potential on each bus in power-on state, and 3) issues in suppressing data bus-swing.

Regarding 1), since sensitivity of bus swing deviation to capacitance imbalance is straightforward - i.e., 10% capacitance imbalance results in about 10% bus swing deviation, it seems to be problem when routing bus lines in parallel through automatically design tools, in turn, imbalance in wiring length between a large number of

buses running in parallel tends to be increased compared to careful manual routing.

As for 2), although several dummy clock cycles stabilize an each intermediate bus potential, it seems to be obstacle in realizing a quickly wake-up and restart from some power management modes, such as suspend and hibernation modes in PCs.

An extensive reduction of data bus-swing results in noise margin problems of 3) due to non scaling parts in the noise budget like device asymmetry (V<sub>T</sub> imbalances in pair transistors composing sense amplifier), coupling noise between neighboring signal wires, and ground bounce due to inductive noise caused by the simultaneous switching of I/O circuits.

### (b) Issues in further reducing DRAM data retention current

Essential problem to be solved in managing data retention current consumption is temperature dependence of data retention time. Although data retention current of sub-0.5µA/MB as low as SRAM has been attained at Ta of 25°C, it increases up to 2µA/MB at Ta of 55°C like that in the summer sunshine, resulting in reducing battery-life by a factor of 4. It should be obstacle in making this fit for a wide use.

### (c) Issues in accommodating supply voltage to a solar-cell voltage of 0.5V

Boosted voltage is essential to compensate for degradation in speed, besides threshold voltage scaling. However, an existing booster like charge-pump circuit generates boosted voltage with low efficiency of less than 40%. Thus, the high-efficiency of boosted voltage circuits such as DC-DC converter circuitry with inductor and capacitors should be required for efficient low-voltage operation. Unfortunately high-Q inductors can not be integrated on a chip together with other circuits and thus complicate the boardlevel design.

### 7-2-2 Requirements of Future Technologies

To open the door to further advanced levels of power saving and of making that fit for practical use, it is required to overcome the remaining problems above mentioned, by integrating all knowledge and technology with much of research and development efforts. Clearly, the following technologies are needed to do that : 1) design tools which can route a large number of data lines in parallel while making each of those equal in length and width so as to equalize each of parasitic capacitance of those, is required to reduce capacitance imbalances, resulting in signal-swing loss, 2) high efficiency DC-DC conversion circuitry to provide power supply voltages to arbitrary levels (precharge-

levels to each of stacking buses), is needed to reduce a settling time to restart the chargerecycling operation just after power-on. High-Q inductors have not been put into a single chip together with other circuitry yet, and it seems to be needed much effort in the area of semiconductor process and materials. 3) the impact of noise margin due to suppressed bus-swing may be reduced in the development of special inherent fault tolerant system and due to the progress in packaging technologies reducing the inductive noise due to the simultaneous switching of I/O circuits, besides fully balanced pair transistor layout design and imbalances free signal detector even if existing V<sub>T</sub> imbalance and capacitance and resistance imbalances between pair transistors and differential signal wires, respectively.

4) non-volatile RAM with ability of fast read/write random access such as ferroelectric RAM (FeRAM), is needed eventually to diminish data retention current. Although 1-Mbit FeRAM has been reported yet, it seemingly takes a long time to catch up DRAM in terms of density, in which 4-Gbit DRAM has already been reported. 5) high efficiency boost-converter using high-Q inductor powered by a single battery, is needed to provide an enough large voltage to gate over-voltage (VGS-VT) in MOS transistors. It is required to avoid an intolerable degradation in speed. This contributes to relax a demand for VT scaling to compensate for loss in speed, as a result, enables to avoid DC current crisis due to exponentially increased subthreshold leakage as VT is scaling. This is because V<sub>GS</sub> can be boosted without lowering V<sub>T</sub> using such boost-converter.

The author sincerely hopes that the present thesis can be a help to give the answer and/or the hint for overcoming the problems above mentioned.

# **BIBLIOGRAPHY** WRITTEN BY THE AUTHOR

### LIST OF PUBLISHED PAPERS after review process

### [Chapter-2]

(1) Hiroyuki Yamauchi, Hironori Akamatsu, Tsutomu Fujita, "An Asymptotically Zero Power Charge-Recycling Bus Architecture for Battery-Operated Ultrahigh Data Rate ULSI's", IEEE Journal of Solid-State Circuits, Vol.30, No. 4, April 1995.

(2) Hiroyuki Yamauchi, Hironori Akamatsu, Tsutomu Fujita, "A Low Power Bus Architecture with Local and Global Charge-Recycling Bus Techniques for Battery-Operated Ultra-High Data Rate ULSI's", IEICE Transaction on Electronics, Vol. E78-C, No.4, April 1995.

### [Chapter-3]

(3) Hiroyuki Yamauchi and Akira Matsuzawa, "A Signal-Swing Suppressing Strategy for Power and Layout Area Saving Using Time-Multiplexed Differential Data-Transfer Scheme", IEEE Journal of Solid-State Circuits, Vol.31, No. 9, September 1996.

### [Chapter-4]

(4) Hiroyuki Yamauchi, Toru Iwata, Akito Uno, Masanori Fukumoto, and Tsutomu Fujita, "A Circuit Technoloy for a Self-Refresh 16Mb DRAM with Less than 0.5µA/MB Data-Retention Current", IEEE Journal of Solid-State Circuits, Vol.30, No.11, November 1995.

(5) Toru Iwata, and Hiroyuki Yamauchi, "Plate Bumping Leakage Current Measurement Method and Its Application to Data Retention Characteristic Analysis for RJB DRAM Cells", IEICE Transaction on Electronics, Vol.E79-C, No.12, December 1996.

### [Chapter-5]

(6) Hiroyuki Yamauchi, Toshikazu Suzuki, Akihiro Sawada, Tohru Iwata, Toshikazu Tsuji, Masashi Agata, Takashi Taniguchi, Yoshinori Odake, Kazuyuki Sawada, Teruthito Ohnishi, Masanori Fukumoto, Tsutomu Fujita, Michihiro Inoue, "A Circuit Technology for High-Speed Battery-Operated 16Mb CMOS DRAM's", IEEE Journal of Solid-State Circuits, Vol.28, No.11, November 1993.

(7) Toshikazu Suzuki, Toru Iwata, Hironori Akamatsu, Akihiro Sawada, Toshikazu Tsuji, Hiroyuki Yamauchi, Takashi Taniguchi, Tsutomu Fujita, "High-Speed Circuit Techniques for Battery-Operated 16 Mbit CMOS DRAM", IEICE Transaction on Electronics, Vol. E77-C, No.8, August 1994.

(8) Hiroyuki Yamauchi, Toshiki Yabu, Toshio Yamada, Michihiro Inoue, "A Circuit Design to Suppress Asymmetrical Characteristics in High-Density DRAM Sense Amplifiers", IEEE Journal of Solid-State Circuits, Vol.25, No.1, February 1990. (9) Toshiki Yabu, Kazumi Kurimoto, Hiroyuki Yamauchi, Masanori Fukumoto, Takashi Ohzone, "Improvement of Asymmetrical Characteristics in Submicron CMOS Devices" IEICE Transaction on Electronics, C-II, Vol. J72-C-II, No.5, pp.456-462, May 1989. (10) Michihiro Inoue, Toshio Yamada, Hisakazu Kotani, Hiroyuki Yamauchi, Atsushi Fujiwara, Junko Matsushima, Hironori Akamatsu, Masanori Fukumoto, Masafumi Kubota, Ichiro Nakao, Nobuo Aoi, Genshu Fuse, Shin-ichi Ogawa, Shinji Odanaka, Atsushi Ueno, Hiroshi Yamamoto, "A 16-Mbit DRAM with a Relaxed Sense-Amplifier-Pitch Open-Bit-Line Architecture", IEEE Journal of Solid-State Circuits, Vol.23, No.5, pp. 1104-1112, October 1988.

### [Chapter-6]

(11) Hiroyuki Yamauchi, Toru Iwata, Hironori Akamatsu, Akira Matsuzawa, "A 0.5V Single Power-Supply Operated High-Speed Boosted & Offset Grounded Data Storage (BOGS) SRAM Cell Architecture", IEEE Transactions on VLSI Systems, to be published, September 1997.

### LIST OF PUBLISHED PAPERS without review process

### [Chapter-4]

(1) Hiroyuki Yamauchi, "What is needed for DRAM to replace SRAM in market of memory card", pp. 30-33, Special Issue of Low Power Device Targetting for Portable Information Terminals in ELECTRONICS by OHM, June 1995. (2) Toru Iwata, Hiroyuki Yamauchi, Akito Uno, "Circuit Techniques for Super-Low Retention Current DRAMs", National Technical Report Vol.41 No.6 pp.3-12 December 1995.

### [Chapter-5]

(10) Hiroyuki Yamauchi, Mitsuru Sekiguchi, Takashi Taniguchi, "High-Speed Low-Voltage-Operated 16-Mbit CMOS DRAM", National Technical Report Vol.39 No.6 pp.3-12 December 1993

### LIST OF PRESENTED PAPERS

### AT INTERNATIONAL CONFERENCE:

### [Chapter-2]

(1) Hiroyuki Yamauchi, Hironori Akamatsu, Tsutomu Fujita, "A Low Power Complete Charge-Recycling Bus Architecture for Ultra-High Data Rate ULSI's", in Technical Digest of 1994 Symposium on VLSI Circuits, pp.21-22, June 1994. [Chapter-3]

(2) Hiroyuki Yamauchi, Akira Matsuzawa, "A Low Power Signal-Swing Suppressing Strategy Using Time-Multiplexed Differential Data-Transfer (TMD) Scheme", in Technical Digest of 1995 Symposium on Low Power Electronics, pp.48-49, October 1995.

### [Chapter-4]

(3) Hiroyuki Yamauchi, Toru Iwata, Tetsuyuki Fukushima, Akito Uno, Kazuyuki Sawada, Masanori Fukumoto, Tsutomu Fujita, "A Sub-0.5µA/MB Data-Retention DRAM", in Technical Digest of 1995 International Solid-State Circuits Conference, pp.244-245, February 1995.

### [Chapter-5]

(4) Hiroyuki Yamauchi, Toshikazu Suzuki, Akihiro Sawada, Tohru Iwata, Toshikazu Tsuji, Masashi Agata, Takashi Taniguchi, Yoshinori Odake, Kazuyuki Sawada, Teruhito Ohnishi, Masanori Fukumoto, Tsutomu Fujita, Michihiro Inoue, "A 20ns Battery-Operated 16Mb CMOS DRAM", in Technical Digest of 1993 International Solid-State Circuits Conference, pp.44-45, February 1993.

(5) Hiroyuki Yamauchi, Toshiki Yabu, Toshio Yamada Michihiro Inoue, "A Circuit Design to Suppress Asymmetrical Characteristics in 16-Mbit DRAM Sense Amplifier", in Technical Digest of 1989 Symposium on VLSI Circuits, pp.109-110, May 1989.

(6) Michihiro Inoue, Toshio Yamada, Hisakazu Kotani, Hiroyuki Yamauchi, Atsushi Fujiwara, Junko Matsushima, Hironori Akamatsu, Masanori Fukumoto, Masafumi Kubota, Ichiro Nakao, Nobuo Aoi, Genshu Fuse, Shin-ichi Ogawa, Shinji Odanaka, Atsushi Ueno, Hiroshi Yamamoto, "A 16-Mbit DRAM with a Open-Bit-Line Architecture", in Technical Digest of 1988 International Solid-State Circuits Conference, pp.246-247, February 1988.

(7) T. Ochi, H.Funakoshi, S.Takehashi, K.Hatada, H.Tani, H.Yamauchi, I.Okumura, H.Homma, "Study of Tape Materials and Mounting Conditions For LOC Package", in Technical Digest of International Microelectronics Conference, pp.88-93, April 1994. [Chapter-6]

(8) Hiroyuki Yamauchi, Toru Iwata, Hironori Akamatsu, Akira Matsuzawa, "A 0.8V/100MHz/sub-5mW-Operated Mega-bit SRAM Cell Architecture with Charge-Recycle Offset-Source Driving (OSD) Scheme", in Technical Digest of 1996 Symposium on VLSI Circuits, pp-126-127, June 1996.

(9) Hiroyuki Yamauchi, Toru Iwata, Hironori Akamatsu, Akira Matsuzawa, "A 0.5V/100MHz Over Vcc Grounded Data Storage (OVGS) SRAM Cell Architecture with Boosted Bit-line and Offset Source Over-Driving Schemes", in Technical Digest of 1996 International Symposium on Low Power Electronics and Design session No.3.1, August 1996.

(10) Hironori Akamatsu, Toru Iwata, Hiro Yamamoto, Takashi Hirata, Hiroyuki

Yamauchi, Hisakazu Kotani, Akira Matsuzawa, "A Low Power Data Holding Circuit with an Intermittent Power Supply Scheme for sub-IV MT-CMOS LSIs", in Technical Digest of 1996 Symposium on VLSI Circuits, pp-14-15, June 1996. (11) Toru Iwata, Hiroyuki Yamauchi, Hironori Akamatsu, Yuji Terada, Akira Matsuzawa, "Gate-Over-Driving CMOS Architecture for 0.5V Single-Power-Supply-Operated Devices", in Technical Digest of 1997 International Solid-State Circuits Conference, pp.290-291, February 1997.

### AT DOMESTIC CONFERENCE:

### [Chapter-2]

(1) Hiroyuki Yamauchi, Hironori Akamatsu, Tsutomu Fujita, "A Proposal of Charge-Recycling Bus Architecture for Local Data-Bus", in Proceedings of the 1994 IEICE Fall Conference C-521, p-199, October 1994. [Chapter-4]

(2) Hiroyuki Yamauchi, Toru Iwata, "A Proposal of Negative-Biased Word-Line Driver Scheme for Ultra-Long Refresh DRAM", in Proceedings of the 1995 IEICE Spring Conference C-638, p-231, March 1995.

(3) Toru Iwata, Hiroyuki Yamauchi, Hisakazu Kotani, "A Evaluation of Memory-Cell Leakage Current at 16Mbit DRAM", in Proceedings of the 1995 IEICE Spring Conference C-637, p-230, March 1995. [Chapter-5]

(4) Hiroyuki Yamauchi, Akihiro Sawada "An N&PMOS Cross-coupled Data-bus Amplifier for Realizing the High-speed and Low-power Operation", in Proceedings of the 1993 IEICE Fall Conference C-635, p-265, March 1993. (5) Toru Iwata, Hiroyuki Yamauchi, Tsutomu Fujita, "A Design of Write Control Circuit for High-Speed DRAMs", in Proceedings of the 1995 IEICE Spring Conference C-636, p-266, March 1993.

(6) Toshikazu Suzuki, Hiroyuki Yamauchi, Akinori Shibayama, Tsutomu Fujita, "Level Shift Circuit Reducing Power Consumption", in Proceedings of the 1995 IEICE Spring Conference C-639, p-269, March 1993. (7) Akihiro Sawada, Hirohito Kikukawa, Hiroyuki Yamauchi, Tsutomu Fujita, "Parallel

Column Access Redundancy Circuit", in Proceedings of the 1995 IEICE Spring Conference C-644, p-274, March 1993. (8) Toshiaki Tsuji, Hideo Asaka, Hiroyuki Yamauchi, Tsutomu Fujita, "A Technique for Reducing of Input Capacitance with Localized Channel Stopper", in Proceedings of the 1995 IEICE Spring Conference C-604, p-172, March 1994.

(9) Hiroyuki Yamauchi, Toshiki Yabu, Toshio Yamada, Michihiro Inoue "A Method for

Circuit Design to Suppress Asymmetrical Characteristics in High Density DRAM Sense Amplifier", in Proceedings of the 1993 IEICE Fall Conference C-366, p-317, March 1989.

### AT TECHNICAL SOCIETY OF IEICE:

### [Chapter-2]

(1) Hiroyuki Yamauchi, Hironori Akamatsu, Tsutomu Fujita, "An Ultra-Low Power Charge-Recycling Bus Architecture for Super High Data-Rate ULSI's", in Technical Report of IEICE ICD94, pp. 9-16 September 1994.

### [Chapter-4]

(2) Toru Iwata, Hiroyuki Yamauchi, "Circuit Techniques for Super Low Retention Current DRAM", in Technical Report of IEICE ICD95, pp 1-7, June 1995.

### [Chapter-5]

(3) Hiroyuki Yamauchi, Toshikazu Suzuki, Akihiro Sawada, Toru Iwata, Toshiaki Tsuji, Takashi Taniguchi, Masanori Fukumoto, Tsutomu Fujita, "A High-Speed Low-Voltage-Operated 16Mb CMOS DRAM", in Technical Report of IEICE ICD93, pp. 31-38 May 1993.

(4) Hiroyuki Yamauchi, Toshiki Yabu, Michihiro Inoue, "DC Model and Parameter Extraction of Submicron MOSFET for Circuit Simulation", in Technical Report of IEICE CAS 86, pp. 17-24 September 1986.

### [Chapter-6]

(5) Hiroyuki Yamauchi, Toru Iwata, Hironori Akamatsu, Akira Matsuzawa, "A Proposal of Low Power SRAM Cell Architecture for Realizing 0.8V/100MHz Operation", in Technical Report of IEICE ICD96, pp-9-16, August 1996.

(6) Hironori Akamatsu, Toru Iwata, Hiro Yamamoto, Takashi Hirata, Hiroyuki Yamauchi, Hisakazu Kotani, Akira Matsuzawa, "A Low Power Data Holding Circuit with an Intermittent Power Supply Scheme for sub-1V MT-CMOS LSIs", in Technical Report of IEICE ICD96, pp-17-24, August 1996.

### LIST OF REGISTERED PATENTS

### UNITED STATES OF PATENT:

### [Chapter-2]

USP-5581506:

"Level-Shifter, Semiconductor Integrated Circuit, and Control Methods Thereof"

Hiroyuki Yamauchi

### [Chapter-4]

| USP-5594701: | "Sem |
|--------------|------|
|              |      |

# Hiroyuki Yamauchi, Hideo Asaka [Chapter-5] USP-4920517: "Semiconductor Memory Device Having Sub-Bit Lines" Hiroyuki Yamauchi, Toshio Yamada USP-4904888: "Sense Amplifier Circuit" Hiroyuki Yamauchi, Toshio Yamada USP-5051957: "Sense Amplifier Circuit for Large-Capacity Semiconductor Memory" Hiroyuki Yamauchi USP-5010523: "Sensing Circuit for a Dynamic Random Access Memory" Hiroyuki Yamauchi USP-5229964: "Read Circuit for Large-scale Dynamic Random Access Memory" Hiroyuki Yamauchi Line Amplification Delay" Hiroyuki Yamauchi USP-5265058: "Dynamic Random Access Memory" Hiroyuki Yamauchi USP-5268874: "Reading Circuit for Semiconductor Memory" Hiroyuki Yamauchi USP-5291450: "Read Circuit of Dynamic Random Access Memory" Atsushi Fujiwara, Hiroyuki Yamauchi RAM" Hiroyuki Yamauchi MOSFETS" Akihiro Sawada, Hiroyuki Yamauchi USP-5389841: "Differential Transmission Circuit"

iconductor Memory Device Having a Plurality of Blocks"

USP-5251172: "Semiconductor Memory Apparatus Having Reduced Amount of Bit-

USP-5295103: "Read/Write Circuit Including Sense Amplifiers for Use in a Dynamic

USP-5389810: "Semiconductor Device Having at Leat One Symmetrical Pair of

Masashi Agata, Hiroyuki Yamauchi, Toshio Yamada USP-5396124: "Circuit Redunduncy Having a Variable Impedance Circuit" Masashi Agata, Hiroyuki Yamauchi, Atsushi Fujiwara



