# An LSI chip set for DSP hardware implementation

| メタデータ | 言語: eng                          |
|-------|----------------------------------|
|       | 出版者:                             |
|       | 公開日: 2017-10-03                  |
|       | キーワード (Ja):                      |
|       | キーワード (En):                      |
|       | 作成者:                             |
|       | メールアドレス:                         |
|       | 所属:                              |
| URL   | http://hdl.handle.net/2297/18979 |

## AN LSI CHIP SET FOR DSP HARDWARE IMPLEMENTATION

A. Kanemasa, R. Maruta, K. Nakayama, Y. Sakamura and S. Tanaka

Nippon Electric Co., Ltd. 1-1 Miyazaki, Yonchome, Takatsu-ku, Kawasaki 213, Japan

ABSTRACT—This paper describes a new LSI chip set developed to provide a simple and cost-effective means for DSP hardware implementation. This chip set, consisting of two NMOS LSIs, contains enough logic and memory to perform such high level DSP functions as biquad filters and FFT butterflies at a high throughput rate, without any other external logic devices. It mploys serial arithmetic and operates at a clock rate up to more than 5 MHz. Throughput rate can be traded-off with processing accuracy. Architecture is designed to pursue self-sufficient applicability to high level DSP functions, while retaining generality in application.

## INTRODUCTION

Thanks to the intensive studies made by many researchers during the past one and a half decades, digital signal procesing (DSP) now has a mature theoretical base. Potential merits of using DSP are numerous and well recognized. Research and development on real time telecommunication equipment utilizing the DSP technique have been carried out extensively in recent years. Nevertheless, its employment in the actual telecommunication network has not gained much popularity so far.

One obvious reason why people have been reluctant to put actual employment into effect is that the implementation technology has not reached a satisfactory 'el as yet, especially in view of cost and power assumption. In fact, despite algorithmic optimizations, the amount of real time arithmetic operations required for various telecom equipment implementation is still a burden on commercially available standard ICs.

To cope with this problem, various LSIs for DSP hardware implementation have been developed [1]-[5] by integrating more functions on a single chip and improving throughput rate with recent advance in semiconductor technology. One of those LSIs developed recently is a signal processor [3]-[5], which is now gaining a reputation among those who are going to utilize DSP technology for telecommunication equipment. Because of software programmability, signal processors inherently have versatility in use. However, there are DSP applications wherein a large amount of data processing for one specific function at high speed is required. A TDM/FDM converter is a typical example. A hardware-oriented device is highly efficient in these

applications, compared with a signal processor. Therefore, a new DSP chip set has been designed and developed. It is designed to retain generality in applications in spite of a hardware-oriented device.

This paper describes design concept and employed architecture. Applications using the chip set are also presented.

#### **DESIGN CONSIDERATION**

Throughout LSI design consideration, a main target has been focused on both group and supergroup level TDM/FDM converters [6], for which high throughput rate modular components are urgently required to compete with conventional analog converters. In order to achieve a high throughput rate in hardware-oriented devices, there are two important factors to be considered, integration level on a chip and operating clock rate. Generally, high throughput rate is obtained by integrating many functions on a chip and/or by increasing operating clock rate. However, merely increasing the integration level on a chip would result in a lack of versatility under a constraint in the number of package pins and/or result in a very large chip area. Compatibility with external logic devices should be taken into consideration for operating clock rate. To reduce total equipment power consumption as well as to reduce cost, it is desirable that the chip be designed compatible with low power Schottky TTL ICs.

System consideration for TDM/FDM converters shows that new LSIs to be employed should have the ability to provide such high level functions as a biquad filter and an FFT butterfly module, which can process time-division multiplexed signals. Since the amount of memory required depends upon the number of processing channels as well as data length in a functional module, two chip configuration, wherein one chip serves as an arithmetic processor and the other a memory element, would be a good solution. Serial arithmetic is suitable to minimize the number of package pins and also to be used for any data length according to system requirements.

## **DEVELOPED LSIs**

A new DSP chip set, developed as a general-purpose device, consists of an Arithmetic Processor Unit (APU) and a Variable Delay Unit (VDU). The architec-





Fig.1 Arithmetic Processor Unit (APU).

ture is designed to pursue self-sufficient applicability to high level DSP functions without losing generality in applications. The chip set is designed to fit in such applications as TDM/FDM conversion, where high processing speed and high data accuracy are required.

The APU and VDU are single power supply, TTL compatible devices, and operate at a speed up to more than 5 MHz. They are realized with a proven 4.4 µm rule NMOS technology to guarantee high yield, high reliability and low cost.

APU: The APU is a function selectable device which contains enough logic to perform basic arithmetic operations plus such auxiliary tasks as scaling, coefficient polarity inversion, overflow correction, timing adjust, etc. A blockdiagram and a chip photograph of the APU are shown in Fig.1. The APU can realize one function out of eight functions, including:





Fig.2 Variable Delay Unit (VDU).

4

- A pair of 1D or 2D biquad filters
- An FFT butterfly
- Quad (AX+BY)
- Single  $\sum A_i X_i$  (i=0-7)

--- Complex (AX + Y)

To minimize the number of package pins, information bits for scaling and polarity inversion are combined with coefficient bits and then supplied serially to the chip through coefficient pins. Data length can be set at greater than or equal to 16 bits, while coefficient length is predetermined to be 14. The chip size is 20.39 mm and is mounted on a 28 pin dual-in-line package.

VDU: The VDU is functionally a set of four variable length (up to 519 bits) shift registers, which includes auxiliary switches. Figure 2 shows a block-diagram and a chip photograph. The chip is realized on a

20.79 mm<sup>2</sup> area. It is mounted on a 24 pin dual-in-line package.

The switches, which interchange time slots between a pair of data streams, can provide a useful means for an inter-stage buffer for a radix 2 FFT algorithm. Length for the four shift registers is determined by a 9-bit external code through two delay control circuits in the chip.

#### **APPLICATIONS**

## A. Digital Filters

Since digital filtering is a fundamental operation widely used in DSP applications, a chip set is designed to provide any forms of digital filter structure, i.e. recursive/non-recursive and 1D/2D. Figure 3 shows a pair of 1D biquad filters. It can also be modified to a complex biquad filter with purely real and imaginary coefficients by changing its connections slightly.

Four sets of an eighth-order filter, including coefent memories, are mounted on a printed cardboard, as shown in Fig.4. Each set hardware can process 28 independent signals with 20 bit data word length at 4.48 MHz clock rate.

#### **B.** FFT Processors

The DSP chip set developed is also useful for FFT processor implementation. Figure 5 shows an FFT butterfly module implemented by the chip set. The module can be used in three FFT structures, shown in Fig. 6, which cover different speeds. System parameters characterizing an FFT processor are the number of data points (N), processing speed (T) and data word length (L).

The standard FFT processor, shown in Fig.6 (a), is based upon well-known pipeline FFT structure [7], consisting of cascaded butterfly modules, which perform butterfly arithmetic for each stage in an FFT algorithm.

The inter-stage buffer between m-th and (m+1)-st stage can be realized by a VDU chip, in which variable shift register length is set to be M(=2<sup>m</sup>x L) bits. The titches in a VDU chip are used so that time slot rchange between a pair of complex data streams can be accomplished to output a new pair of data streams prepared for the next stage.

Since serial arithmetic is employed in an APU chip, the number of data points and/or data word length can be easily extended at the sacrifice of processing speed. The relationship among these three parameters is given by

# $T = (N/2) \times L / C$

where C denotes the operating clock rate.

To obtain high throughput rate maintaining the number of data points and data word length, FFT arithmetic operations in each stage can be shared among 2<sup>p-1</sup> FFT butterfly modules placed in parallel, where p=2,3, --, log<sub>2</sub>N. Figure 6 (b) shows a high speed FFT processor blockdiagram as an example of p=3. Inter-stage buffers in the last p stages are replaced by wired



Fig.3 Biquad Digital Filter Module



Fig.4 Four Sets of an Eighth-order Filter



Fig. 5 FFT Butterfly Module

connections, which results in VDU chip count reduction. Throughput rate for this structure gains 2<sup>p-1</sup> times, compared with the standard pipeline FFT processor shown in Fig.6 (a).

The FFT hardware for low processing speed applications can be minimized by eliminating the 2nd to the last stage FFT butterflies in Fig. 6 (a), and by feeding back a pair of output data streams from the last buffer to the first stage FFT butterfly module. Figure 6 (c) shows a compact FFT processor based upon the structure. One APU chip processes all the butterfly arithmetic operations in an FFT algorithm. Each stagebuffer not only interchanges time slots between a pair of complex data streams but also operates as delay elements in time division.

Processing speed realized by the three FFT structures described above is summarized in TABLE 1.

#### CONCLUSION

A new LSI chip set, consisting of an APU and a VDU chip, has been developed as a general-purpose device to provide high level DSP functions, such as a pair of biquad filters and an FFT butterfly, which are widely used in DSP applications. The chip set is suitable for a time division multiplex applications use wherein a single specific function is executed at high speed and at high accuracy for a number of different signals. Applications to digital filters and FFT processors show that the chip set is very useful for DSP hardware implementation. Using the chip set, a compact and cost-effective TDM/FDM converter has been successfully developed [8].

# **ACKNOWLEDGMENT**

The authors wish to acknowledge the project managerial efforts of Y. Kato, A. Sawai, A. Tomozawa, Y. Katagiri and Y. Ishizaki and the technical contribution, fruitful discussion and experimental assistance from M. Hibino, H. Sakaguchi, Y. Morimura and U. Ishikawa.

# REFERENCES

- M. Bellanger et G. Bonnerot, "Circuits de Traitment Numerique du Signal pour Equipments de Telecommunications", Telecom 1979.
- 2. N. Ohwada, T. Kimura and M. Doken, "LSI's for Digital Signal Processing", IEEE Journ. Solid-State
- Circuits, Vol. SC-14, No.2, 1979. R. W. Blasco, "V-MOS chip joins microprocessor to 3. handle signals in real time", Electronics Aug. 30, 1979.
- J. S. Thompson and J. R. Boddie, "An LSI Digital 4.
- Signal Processor", ICASSP 80.

  T. Nishitani, Y. Kawakami, R. Maruta and A. Sawai, "LSI Signal processor Development", ICASSP 80.
- 6. R. Maruta and A. Tomozawa, "An Improved Method for Digital SSB-FDM Modulation and Demodulation", IEEE Trans. Communications, Vol. COM-26, No.5, May 1978.
- 7. H. L. Groginsky and G. H. Works, "A Pipeline Fast

- Fourier Transform", IEEE Trans. Computers, Vol. C-19, No.11, Nov. 1970.
- R. Maruta, A. Kanemasa, H. Sakaguchi, M. Hibino and N. Kawayachi, "A 24-CHANNEL LSI TRANS-MULTIPLEXER", to be presented at ICC'81.



Standard FFT Processor



High Speed FFT Processor



Compact FFT Processor (c)

A:APU, V:VDU

Fig. 6 FFT Processor Structure

TABLE 1. FFT Processor Speed (ms)

| struc-<br>ture<br>data<br>length<br>data<br>points | Compact |       | Standard |      | High<br>Speed<br>(p=5) |      |
|----------------------------------------------------|---------|-------|----------|------|------------------------|------|
|                                                    | 16      | 24    | 16       | 24   | 16                     | 24   |
| 512                                                | 7.37    | 11.06 | 0.82     | 1.23 | 0.05                   | 0.08 |
| 1024                                               | 16.38   | 24.58 | 1.64     | 2.46 | 0.10                   | 0.15 |
| 2048                                               | 36.04   | 54.07 | 3.28     | 4.92 | 0.20                   | 0.31 |