## 九州大学学術情報リポジトリ Kyushu University Institutional Repository

# A Design for a Low-Power Digital Matched Filter Applicable to W-CDMA

後藤,章二 三洋電機株式会社

山田, 節 三洋電機株式会社

高山, 憲久 三洋電機株式会社

松下, 欣史 三洋電機株式会社

他

https://hdl.handle.net/2324/5844

出版情報: Proc. of Euromicro Symposium on Digital System Design Architectures, Methods and Tools, pp.210-217, 2002-09. IEEE Computer Society

バージョン: 権利関係:



### A Design for a Low-Power Digital Matched Filter Applicable to W-CDMA

Shoji Goto, Takashi Yamada, Norihisa Takayama, Yoshifumi Matsushita, Yasoo Harada<sup>†</sup> *SANYO Electric Co., Ltd. email: {gotoh, yamada}@ul.rd.sanyo.co.jp* 

Hiroto Yasuura

Kyushu University

#### **Abstract**

This paper presents a design for a low-power digital matched filter (DMF) applicable to Wideband-Code Division Multiple Access (W-CDMA), which is a Direct-Sequence Spread-Spectrum (DS-SS) communication system. The proposed architectural approach to reducing the power consumption focuses on the reception registers and the correlation-calculating unit (CCU), which dissipate the majority of the power in a DMF. The main features are asynchronous latch clock generation for the reception registers, parallelism of the correlation calculation operations and bit manipulation for chipcorrelation operations. A DMF is designed in compliance with the W-CDMA specifications incorporating the proposed techniques, and its properties are evaluated by computer simulations at the gate level using 0.18-um CMOS standard cell array technology. The results of the simulations show a power consumption of 9.3 mW (@15.6MHz, 1.6V), which is only about 30 % of the power consumption of conventional DMFs.

#### 1: Introduction

The block diagram of the baseband LSI for the W-CDMA system is shown in Figure 1. It consists of a Tx block and an Rx block followed by a channel codec. W-CDMA is a DS-SS communication system that utilizes intercell asynchronous cell site operation for flexible base-station allocation. Therefore, the process of acquiring the de-spreading timing is very complicated. In order to raise the acquisition accuracy, the acquisition block, and especially the DMF, operates several times faster than the chip frequency. In addition, a DMF correlates the received signal with the reference code sequence (which relates to the overall system) every clock cycle. Therefore, a large number of logic gates are switched at high frequency. Taking into account the desired application of base-band LSI to battery-powered commercial mobile terminals, the issue of low power consumption is one of the foremost concerns. So far, several different approaches have been proposed for MF implementation, such as SAW MF [1], digital MF (DMF) [2]-[7], CCD MF [8]-[11] and analog MF (AMF) [12]. An AMF is more power-efficient for shorter, faster MFs and a DMF is more power-efficient when the filters are longer or slower [13]. Recent advances in CMOS technology are making it possible to design a DMF for practical use. For application to code acquisition or tracking in DS-SS systems such as W-CDMA or post W-CDMA, in which relatively long reference codes are potentially present, a DMF is preferred for the design flexibility that it allows as well as for power efficiency.

The use of a DMF for code acquisition in DS-SS systems might consume a good half of the total power in the base-band processing circuit if no low-power measures are included. It is desirable that the power consumption in the DMF is kept below 10 mW in order to maintain a practical active standby time. In [5] and [7], the main focus is on low-power techniques, including the architecture, circuitry and layout necessary to implement a low-power DMF. In this paper, we propose an architecture dedicated to the low-power design of a DMF applicable to W-CDMA. We consider reducing the switching probability in both the reception registers and the correlation-calculating block. The main points of the proposal are the use of asynchronous latch clock generation for the reception registers, parallelism of the correlation calculation operations and bit manipulation for chip-correlation operations in the CCU.



Figure 1. Block diagram of W-CDMA chip set

 $<sup>^\</sup>dagger$  Presently with the Semiconductor Technology Academic Research Center (STARC), Yokohama, 222-0033 Japan.

The remainder of this paper is organized into the following sections. In Sect. 2, the basic function and structure of conventional approaches to building a DMF are explained. A design for a low power DMF is proposed and discussed in Sect. 3, and its output properties, gate count and power consumption are analyzed in Sect. 4. Lastly, the conclusion is presented in Sect. 5.

#### 2: Basic function and structure of a DMF

The most orthodox structure of a DMF is shown in Figure 2 (a). We call this type of DMF "a shift register First-In First-Out (FIFO) type" in the paper. The received signal is latched in the leftmost reception register and then shifted along to the next tapped delay line. On the other hand, each chip of the reference code sequence is stored in another register as filter tap coefficients. All of the samples in the reception registers are correlated with the corresponding tap coefficients and are added up. A large amount of power is consumed in the reception registers, because every register is activated to transmit the received signal (multiple bits) to the next register by a shifting operation.

Another structure for a DMF is shown in Figure 2 (b). This type of DMF is called "a register file FIFO type" in this paper. In this DMF, a shift register is not used for the reception registers, but is used for the coefficient registers. The reference code is usually 1 bit and the number of coefficient registers is equal to the code length, irrespective of the over-sampling number of the received signal, so the power-dissipating shift register can be smaller in both bit-width and length. The trade-offs are as follows:

- Control of the reception-register update,
- Increase of load capacitance owing to the received signal bus being shared with the reception registers,
- Selectors (MUXs) required for each tap (boxed as a tapping block in Figure 2 (b)) in order to select the sample being correlated with the corresponding coefficient, in the case of over-sampling (usual situation, not shown in Figure 2),
- Shifting operations of the coefficient registers.

These components negate some of the power reduction obtained in the reception registers.

Other major factors of power dissipation that are common to both types of DMF lie in the correlation calculation operation, which is composed of the chip-by-chip correlation operation and the summation operation.

The power reduction obtained by a register file FIFO approach is estimated to be 43% [6] (non over-sampling). In the over-sampling case on the other hand, the power efficiency would be smaller.



Figure 2. Configurations of the DMFs

#### 3: Low-power consumption DMF

In a CMOS circuit the power consumption is estimated by a summation of the dynamic power and the static power. The static power is dissipated by the leakage current in inactive gates, while a CMOS circuit consumes dynamic power when the circuit is activated, that is, a dynamic charge and discharge of the circuit parasitic capacitance occurs and a short-circuit current (direct current from VDD to GND) flows, accompanied by power dissipation when the input signals are in transit. This dynamic power is the dominant component of the total power consumption in existing CMOS technologies, and therefore it is important that we work on ways of minimizing it.

The dynamic power is given by the product of the switching probability, the clock frequency, the load capacitance and the square of the supply voltage. In this paper we aim to minimize the switching probability in order to reduce the dynamic power from the standpoint of proposing an appropriate filter architecture.

In the register file FIFO type of DMF, the power consumption in the reception registers is reduced, but there are trade-offs in the power overhead. Apart from the reception registers, the correlation-calculating block consumes the largest amount of power. Our low-power design concepts for each block are as follows:

- i) Reception registers:
  - a) use an asynchronous latch clock instead of the conventional gated clock,
  - b) reduce the load which is charged or discharged by the switching of the received signal bus,
  - ec) exclude the Least Significant Bit (LSB) of the reception registers (concerning the following item ii)-b),

- ii) Correlation-calculating block:
  - a) use parallel CCUs and remove the control to recursively select the sample for the correlation calculation,
  - simplify the chip-correlation operation in each CCU using bit manipulation instead of the conventional sign inversion of a 2's complement signal.

Figure 3 shows the detailed configuration of the proposed DMF, where N, L,  $f_c$ , n,  $R_i$  and  $C_i$  denote the over-sampling number, the length of the reference code sequence, the chip frequency, the bit width of the received signal, and the i-th tapped received signal and the coefficient (i = 1, 2,..., L), respectively. LCK<sub>j</sub> and MSK<sub>j</sub> are the latch clock signal and the masking signal for the j-th reception register (j = 1, 2,..., N·L), respectively. The maximum operating frequency of each block is also shown by the marking on the hatched boxes. A detailed explanation for each block (the reception block and the correlation-calculating block) will be given in the following sub-sections.



Figure 3. Configuration of the proposed DMF

#### 3.1: The reception block

The reception block comprises of a control unit and the reception registers. The control unit generates LCK i and MSK j for the i-th and j-th reception register (i = j mod  $(N\cdot L/2)$ ,  $j = 1,2,...,N\cdot L$ ), respectively. Figure 4 is a block diagram of the control unit. The system clock only operates the clock pulse counter. Each bit of the counter and its delayed signal, except for the Most Significant Bit (MSB) and the LSB, are utilized as clock signals of the lowest possible frequency  $(N \cdot f_c/4 \sim 2 \cdot f_c/L)$  in the delay line, which has the structure of shift register (1-bit, N·L/2-1 steps). The LSB is utilized as a delaying clock in the delay unit. The MSB is output to the reception register as LSK\_1 with a frequency of f<sub>c</sub>/L. This is also connected to the delay line and is delayed in succession by 1/ N·f<sub>c</sub>. Other latch clocks LCK i are obtained by tapping in the delay line.

Figure 5 shows an input/output timing diagram for the control unit. The switching frequency of each signal is attached in the parentheses. LCK\_2 is generated by delaying LCK\_1 at the negative edge of the clock, which is the delayed signal of the (log\_2(N·L-1))-th bit of the counter. The other LCK\_i are obtained by delaying LCK\_i-1' in the same manner. The received signal is denoted by  $D_t$ .  $D_t$  (t mod N·L < N·L/2) are stored in the reception registers at the positive edge of the clock LCK\_i (i = t), and  $D_t$  (N·L/2  $\leq$  t mod N·L  $\leq$  N·L-1) is stored at the negative edge of LCK\_i (i = t mod N·L-N·L/2). The number of LCK\_i, that is, the number of the delay-line steps is halved by introducing negative edge driven FFs for the latter half of the reception registers.

Figure 6 shows a slice of the delay line. a) is the proposed approach, and b) is a conventional clock-gating approach. As explained above, the operating frequency in the hatched part is lower (by a factor of  $4 \sim \text{N}\cdot\text{L}/2$ ) in the proposed approach. The clock frequency of an FF to delay LCK\_i is N·f<sub>c</sub>/4 (i:1,2,5,6, ..., 4k+1,4k+2), N·f<sub>c</sub>/8 (i:3,4,11,12, ...,8k+3,8k+4), N·f<sub>c</sub>/16 (i:7,8,23,24, ..., 16k+7, 16k+8), ..., and 2·f<sub>c</sub>/L (i:0, N·L/4.1, N·L/4), respectively, where k is a non-negative integer. The average clock frequency is given by N·f<sub>c</sub>/6.

On the other hand, the operating frequency of the clock-gating circuit is  $N \cdot f_c$ . Furthermore, the hatched part is necessary for  $N \cdot L$  reception registers, while it is needed for only  $N \cdot L/2$  reception registers in the proposed method, in which the number of latch clocks is halved. The reduction of the switching activity in a latch clock generator is therefore estimated to be around one twelfth (=1/6  $\cdot$  1/2).

Besides the clock control, data masking is proposed. The received signal bus is shared with N·L reception registers, increasing the input load of the FFs. The expected switching frequency for the bus is as high as N· $f_{\nu}/2$ . If the D-inputs of the FFs are suitably inactivated by a masking operation, for example with AND gates whose input load is smaller, the power dissipation is reduced.

LCK\_i can be utilized to create the masking signals. The masking signals are denoted by MSK\_j (j = 1,2, ..., N·L) in Figure 4. The data inputs of the FFs are masked by MSK\_j according to the allocated register number. Since the reception registers include as many as N·L·n FFs, the total power is reduced when such a masking operation is applied to each FF. The power consumption is reduced by N·L·n· $\alpha$ /2, where  $\alpha$  is the potential maximum power reduction per FF. For the technology library used in this work, a value of 0.6 is obtained for  $\alpha$ . In the case where such a masking operation is included in an FF cell in the technology library, the outer masking operation may not be necessary.



Figure 4. Block diagram of the control unit



Figure 5. Timing diagram for the control unit



Figure 6. Structure of a slice of the delay line

#### 3.2: The correlation-calculating block

Figure 7 shows a part of the interface structure between the reception registers and the correlation-calculating block, where  $R_j$  denotes the selected sample to be output from a tap numbered j, where j is an integer between 1 and L. A MUX (j) recursively selects the sample from N samples  $(r_j, r_{j+1}, ..., r_{j+N-1})$  stored in mutually neighboring reception registers. A  $(log_2N)$ -bit counter controls the MUX.  $r_j$  is selected when the counter value is "0", while  $r_{j+N-1}$  is selected when the counter value is "N-1", for example.



Figure 7. Structure of the interface between the reception block and the correlation-calculating block

In addition, the simultaneous transition of one bit and another bit in the counter, which occurs at every other sample, results in glitching of the MUX output (tapped sample). This glitching activity propagates into the correlation-calculating block. The lowest probability of glitching on a tapped sample is N·f<sub>c</sub>/2. The use of output latches can avoid this, but leads to an increasing overhead on power dissipation in the additional FFs.

Parallelism of arithmetic parts can be applied in order to reduce the overhead power dissipation. In the proposed DMF, multiple CCUs are deployed. The CCU numbered k, which takes a value between '0' and 'N-1', is given the sample  $(r_{(j-1)\cdot N+k})$  from the j-th tap. As shown in Figure 3, only one MUX is needed to recursively select the correct output from N CCUs for functional equivalence, and L MUXs are removed at the interface. By using this approach it is possible to reduce the glitching activity in the CCUs as well as a number of the additional MUXs. Furthermore, the operating frequency of the CCUs is lowered by a factor of nearly N.

Next, we consider the compensation for the gate-count increase, which accompanies the parallelism approach, and the enhancement of the power efficiency. The received signals that are stored in the reception registers are n-bit 2's complement. In an adder tree, the number of full adders can be reduced by using offset binary processing rather than 2's complement processing, since there is no concept of positive/negative sign bit in an offset binary signal and no precautions are needed for bit overflow in the addition result. We propose the following

bit manipulation for a transformation from an n-bit 2's complement  $r_i$  to an offset binary  $d_i$ , which also includes an operation for chip-correlation of  $r_i$  with  $C_i$ :

$$d_i[n-1] = r_i[n-1](C_i = 1), \sim r_i[n-1](C_i = 0)$$
 (1)

$$d_i[n-2:1] = \sim r_i[n-2:1] (C_i = 1), r_i[n-2:1] (C_i = 0)$$
 (2)

$$d_{i}[0] = r_{i}[0]$$
 (3)

where  $r_i$  is the received signal stored in a reception register,  $d_i$  is the offset binary signal to be added up in the tree-adders and the symbol '~' denotes a logic inversion. This transformation leads to a loss of equivalence for even  $r_i$ . The error of a correlation value is

$$2 \cdot N_{1\text{even}}$$
 (4)

If  $d_i$  is calculated for  $(r_i+1)$  when  $r_i$  is even, the correlation error is given by

$$\mid N_{0\text{even}} - N_{1\text{even}} \mid \tag{5}$$

where  $N_{0\text{even}}$  is the number of even  $r_i$  signals correlated with  $C_i$  (= 0) and  $N_{1\text{even}}$  is the number of even  $r_i$  signals correlated with  $C_i$  (= 1), respectively. Assuming that the number of 0's in the coefficient sequence is equal to the number of 1's, the maximum error is L/2 in (5) and L in (4), respectively. Therefore, the accuracy of a correlation value can be improved. It is also possible to reduce the size of each reception register by a factor of n/(n-1) (for LSB), making it easier to transform the 2's complement to the offset binary and to simplify the tree adders as a result.

#### 4: Evaluation

Figure 8 illustrates a block diagram of the simulation model for the correlation peak detection, which is composed of two DMFs, a square-law adder, an averaging unit (AVE) and a controller. A DMF incorporating the proposed technique (the configuration is shown in Figure 3) was designed. The basic-structured DMF shown in Figure 2 (a) was also designed for comparison. Stimuli for the simulations were generated in a base-station transmitter model, which is not shown in Figure 8. One DMF was used for correlating the in-phase component of the received signal (I-channel) with the primary synchronization code (PSC), and the other DMF was used for correlating the quadrature component (Q-channel) with the PSC [14]. The square-law adder squares the correlation values calculated in each DMF, adds them up, and outputs the result as the final correlation value. The AVE block averages the output signal from the square-law adder and reduces the effects of thermal noise and interference. The controller controls the operation of the AVE block.

Table 1 describes the specifications of the evaluated DMF. The PSC is the generalized hierarchical Golay code, consisting of 256 chips with a chip rate of 3.84 Mcps. The received signal is a 2's complement signal of 6-bit resolution, with a frequency of 15.6 MHz. The over-sampling number N is 4 (samples/chip), and 256 taps



Figure 8. Block diagram of the simulation model for correlation peak detection

Table 1. Specifications of the DMF

| Reference code             | PSC (Hierarchical Golay) |  |  |  |  |
|----------------------------|--------------------------|--|--|--|--|
| Code length                | 256 chips                |  |  |  |  |
| Chip rate                  | 3.84 Mcps                |  |  |  |  |
| Received signal            | 2's complement           |  |  |  |  |
| Bit width                  | 6 bits                   |  |  |  |  |
| Over-sampling              | 4 samples/chip           |  |  |  |  |
| Hardware                   | (per unit DMF)           |  |  |  |  |
| Taps                       | 256 taps                 |  |  |  |  |
| Filter steps               | 1024 steps               |  |  |  |  |
| Max operating<br>frequency | 15.6 MHz                 |  |  |  |  |



Figure 9. Slot format for down-link P-SCH

Table 2. Simulation conditions

|                                      | AWGN                                                      |  |  |  |  |
|--------------------------------------|-----------------------------------------------------------|--|--|--|--|
| Channels                             | one path Rayleigh fading $(f_D = 0 / 5 / 220 \text{ Hz})$ |  |  |  |  |
|                                      | CPICH = -10.2 dB                                          |  |  |  |  |
| Average transmit power               | PICH = -15.2 dB                                           |  |  |  |  |
| per chip                             | PCCPCH = -12.2 dB                                         |  |  |  |  |
|                                      | DPCH = -16.8 dB                                           |  |  |  |  |
|                                      | SCH = -12.2 dB                                            |  |  |  |  |
| Ratio of the received power to noise | -3.0 dB                                                   |  |  |  |  |
| Averaging Period                     | 10 slots (6.7 ms)                                         |  |  |  |  |

are implemented out of 1024-step reception registers.

Figure 9 illustrates the slot format for a down-link Primary-Synchronization Channel (P-SCH) [14]. The radio frame length is 10 ms and 1 frame comprises of 15 slots with a length of 0.667 ms. The transmission timing of the PSC, that is, the P-SCH, is duplicated on the first symbol (spread into 256 chips) in every slot. The simulation conditions are shown in Table 2. A single path suffering from Additive White Gaussian Noise (AWGN) and Rayleigh fading, with a maximum Doppler frequency of  $f_D=0\,$  Hz (static propagation conditions), 5 Hz (pedestrian propagation conditions) are assumed.

As mentioned above, the output correlation value of the proposed DMF includes some error. The PSC comprises of 120 1's and 136 0's. We examined the error distribution in the received signal patterns. Figure 10 shows the probability distribution (a) and the cumulative probability distribution (b) as a function of the magnitude of the errors. It is seen that the probability peak lies between 21 and 25 and that more than 90% of the errors are less than 50. The maximum error is 86.

Under ideal conditions (no noise or interference) the root mean squared value of the 6-bit received signal is 21 and the correlation peak value is 5376 (=  $21 \cdot 256$ ). In the proposed DMF, there is a degradation of only 0.07 dB (= $10 \cdot \log_{10}((5376-86)/5376))$  in the worst case.

Figure 11 shows the output waveform from the AVE block for the basic DMF (Figure 2 (a)) and for the proposed DMF (Figure 3). In both types of DMF, the main peak of the correlation value can be seen clearly. No degradation by the adoption of the proposed DMF is seen in the waveform, even before averaging.

The dynamic power consumption of a CMOS circuit is estimated by the following well-known equation:

$$P = p_s \cdot f_c \cdot C_l \cdot V_{dd}^2 \tag{6}$$

where  $p_s$ ,  $f_c$ ,  $C_l$  and  $V_{dd}$  represent the switching probability, the clock frequency, the load capacitance, and the supply voltage, respectively. In this paper, P at the gate level is brought down into the switching power and the internal power, and is calculated by the following equation based on (6),

$$P = \frac{V_{dd}^{2}}{2} \sum_{\forall net(i)} \left( C_{l_{i}} \cdot TR_{i} \right) + \sum_{\forall cell(j)} \left( E_{j} \cdot TR_{j} \right) \quad (7)$$

where  $Cl_i$  and  $TR_i$  are the total load capacitance and the switching probability for node i, and  $E_j$  and  $TR_j$  are the internal energy and the switching probability of the output node for logic cell j, respectively. The first term of (7) denotes the switching power and the second term denotes the internal power.  $Cl_i$  and  $E_j$  are characterized in the technology library.  $TR_{i(j)}$  are obtained from the gate-level logic simulation



Figure 10. Correlation error probability (a) and cumulative correlation error probability (b)



Figure 11. Output waveform from the AVE block

Figure 12 shows a comparison of the power consumption (A) and the gate count (B) between the basic DMF (Figure 2 (a)) and the proposed DMF (Figure 3). The total power consumption for the proposed DMF is estimated to be 9.3 mW, which is less than one third of the value of the basic DMF. The power consumption in the reception registers is drastically reduced. As a result, the CCUs dominate in terms of power consumption, but this is lower than the equivalent basic DMF. The gate count is a little over one and a half times the value of the basic DMF. owing to the non-sharing of the CCUs, but if the architecture for simplifying the correlation operation were not incorporated, we would require more than double the number of gates. In terms of the reception registers, the cut-off for the LSBs compensates for the gate-count increase due to the masking circuits, and the total number of gates is maintained.

Table 3 shows the properties of several previously reported DMFs for comparison with our proposed DMF. The DMFs in the table all share basically the same function, but the individual parameters, such as the use of CMOS technology, the number of filter taps, etc. are different. The power consumption of a DMF is a function of the operating frequency, the input quantization levels, the number of taps (filter steps) and the degree of technology scaling, and therefore the power consumption and the gate count estimated for the proposed DMF cannot easily be compared with other reports. Judging from the well-known fact that the power consumption of a CMOS digital circuit is proportional to the operating frequency, the load capacitance and the square of the supply voltage, the estimated power in the proposed DMF is thought to be advantageous.

One possible way of using the proposed DMF seems to be emerging in that the DMF can be reused for the neighboring cell search and for acquisition tracking, as well as for the initial cell search as mentioned in Sect. 1. The impact on low power usage may become much larger, especially for a longer active standby time, for a wide range of DMF applications.





Figure 12. Comparison of DMF properties, power consumption (A) and gate count (B)

Table 3. Comparison of DMF specifications and properties

|                            | technology | supply<br>voltage (V) | code length<br>(filter steps) | frequency<br>(MHz) | input<br>(bit) | pow er consumption<br>(mW) | gate count<br>(kgates)        | N.B.                              |
|----------------------------|------------|-----------------------|-------------------------------|--------------------|----------------|----------------------------|-------------------------------|-----------------------------------|
| Proposed DMF               | 0.18 μm    | 1.6                   | 256<br>(1024)                 | 15.6               | 6              | 9.3                        | 79.5                          |                                   |
| National Taiw an Univ. [7] | 0.80 μm    | 5.0                   | 32<br>(128)                   | 50                 | 4              | 184                        | NA<br>(9.38 mm²)              | I/Q dual ch.                      |
| Kobe University [5]        | 0.18 μm    | 1.8                   | 128<br>(256)                  | 40                 | 8              | 15.26                      | 54.8                          | carried out clock<br>optimization |
| National Taiw an Univ. [4] | 0.60 μm    | 2.0                   | 16<br>(32)                    | 2.5                | 4              | 1.6                        | NA<br>(2.25 mm <sup>2</sup> ) |                                   |
| University of Virginia [6] | 2.00 μm    | 5.0                   | 256<br>(256)                  | 25                 | 8              | 753                        | NA                            |                                   |

#### **5: Conclusion**

In this paper, we have proposed a low-power architectural approach to a DMF by focusing on the reception registers and the correlation-calculating unit. The key measures are 1) asynchronous latch clock generation for the reception registers, b) parallelism of the correlation-calculating unit and c) bit manipulation in the chip-correlation operations. The power consumption of the proposed DMF has been estimated to be as low as 9.3mW, which is nearly 30% of the estimated value for a basic DMF, when using 0.18-um standard CMOS cell array technology with a supply voltage of 1.6 V and a system clock frequency of 15.6 MHz. The total number of gates is increased because of the deployment of multiple correlation-calculating units. The application of a suitable DMF to correlation operations, such as neighboring cell search and acquisition tracking as well as initial code acquisition is capable of enhancing the impact of the power efficiency of the proposed DMF.

#### Acknowledgments

The authors would like to thank T. Tanaka and K. Shukuguchi for their cooperation and support.

#### References

- [1] Y. Takeuchi, H. Taguma, M. Nara and A. Tago, "SS demodulator using SAW device for wireless LAN applications," IEEE Technical Report, vol. CS94-50, pp. 7-12, 1994.
- [2] A. Baier, "A low-cost digital matched filter for arbitrary constant-envelope spread-spectrum waveforms," IEEE Trans., Commun., vol. COM-32, pp. 354-361, April 1984.
- [3] N. Kataoka, T. Kojima, M. Miyake and T. Fujino, "Performance of soft decision digital matched filter in direct-sequence spread-spectrum communication systems," IEICE Trans., vol.E74, pp. 1115-1122, May 1991.

- [4] She-Hwa Yen and Chorng-Kuang Wang, "A 2V CMOS programmable pipelined digital differential matched filter for DS-CDMA system," Proceedings of the 1st IEEE Asia-Pacific Conf. on ASIC, August 1999.
- [5] K. Kitamura, K. Taki, T. Ogata and Y. Murata, "Low power consumption CMOS digital matched filter -An application example of the Plastic Hard Macro technology," IPSJ J., vol. 42, No. 4, pp. 1016-1022, Apr. 2001 (in Japanese).
- [6] D. Garrett and M. Stan, "Power reduction techniques for a spread spectrum based correlator," Proceedings of the International Symposium on Low Power Electronics and Design, pp. 225-230, 1997.
- [7] M. L. Liou and T. D. Chiueh, "A low-power digital matched filter for direct-sequence spread-spectrum signal acquisition," IEEE J. Solid-State Circuits, vol. 36, No. 6, pp. 933-943, June 2001.
- [8] D. D. Buss, D. R. Collins, W. H. Bailey and C. R. Reeves, "Transversal filtering using charge-transfer device," IEEE J. Solid-State Circuits, vol. SC-8, pp. 138-146, 1973.
- [9] E. P. Herrmann and D. A. Gandolfo, "Programmable CCD correlator," IEEE Trans. Electron Devices, vol. ED-26, pp. 117-122, Feb. 1979.
- [10] W. A. Hill, E. A. Higgins, E. H. Martin and S. Pittman, "1 GHz sample rate GaAs transversal filter," IEEE GaAs IC Symp. Tech. Dig., pp. 27-30, 1985.
- [11] E. Nishimori, C. Kimura, A. Nakagawa, and K. Tsubouchi, "CCD Matched Filter in Spread Spectrum Communication," The 9th IEEE International Symposium on Personal, Indoor and Mobile Radio Communications, Sep. 1998.
- [12] T. Shibano, K. Iizuka, M. Miyamoto, M. Osaka, R. Miyama and A. Kito, "Matched filter for DS-CDMA of up to 50Mchip/s based on sampled analog signal processing," IEEE International Solid-State Circuits Conf., Dig. Tech. Papers, pp. 100-101, Feb. 1997.
- [13] M. D. Hahm, E. G. Friedman, and E. L. Titlebaum, "A comparison of analog and digital circuit implementations of low-power matched filters for use in portable wireless communication terminals," IEEE Trans. Circuits and Systems II: Analog and Digital Signal Processing, vol. CAS II-44, No. 6, pp. 498-506, June 1997.
- [14] 3GPP, TS 25. 213 v3.5.0, March 2001.