# Designing A Re-Configurable Fractional Fourier Transform Architecture Using Systolic Array

Anal Acharya<sup>1</sup>and Soumen Mukherjee<sup>2</sup>

<sup>1</sup> Dept. of Computer Science, St. Xavier's College, Kolkata, West Bengal-700016, India.

<sup>2</sup> Dept. of Computer Application, RCC Institute of Information Technoogy, Kolkata, West Bengal-700015, India.

### Abstract

FRFT (Fractional Fourier Transforms) algorithm, which has been derived from DFT, computes the angular domains within the time and frequency domains. This algorithm is increasingly used in the field of signal filtering, quantum mechanics and optical physics. In this paper we develop an efficient, systolic, reconfigurable architecture for a particular type of FRFT called MA-CDFRFT (Multi Angle Centered Discrete FRFT). The benefit of this particular type of FRFT is that it computes all the signal components within equally spaced angles. Systolic architecture is used for this computation as it has certain advantages over the other forms like simplicity, regularity, concurrency and computation intensive The resultant product so developed should meet the challenges of today's market like marketable and cheap along with meeting customer demands. This calls for the architecture to be re-configurable. Reconfigurable computer consist of a standard processor and an array of re-configurable hardware. The main processor would control the behavior of the re-configurable hardware. The reconfigurable hardware would then be tailored to perform a specific task, such as image processing or pattern matching applications, as if it was built to perform this task exclusively. Keywords: MA-CDFRFT, Systolic Array, Up/down array, Reconfigurable PE.

# **1. Introduction**

The DFT algorithm has been replaced by FFT algorithm by the signal processing researchers for its lower computational complexity. Also DCT and DWT algorithms are finding increasing importance in the field of signal compression. DFT had only one basic definition and a variety of algorithms have been devised for its fast computation. But when FRFT is analyzed in discrete domain, there are many definitions of discrete fractional Fourier transform (DFRFT) [3]. We first define Centered DFRFT (CDFRFT) and extend this definition to Multi-Angle CDFRFT (MA-CDFRFT). All through we shall use the definition given by [1]. Our proposed architecture can handle real time data and has reduced computational complexity using systolic up/down array in FFT computation. We then propose a re-configurable architecture for FRFT. The last of these steps corresponds to FFT implementation. Finally we construct a re-configurable PE, which will work for each stage. The various stages of the PE is generated by a set of signals from the control unit.

## 2. Related works

An architecture of FRFT has been proposed by Sinha et. al. [16] but that method is not suitable for real time data. Dick has proposed a method of computing DFT on FPGA based Systolic Arrays [8]. Dick also proposed a method for computing multi-dimensional DFT using Xilinx FPGA [17]. Cho et.al. discussed a implementation of DCT algorithm for parallel architecture[18]. A re-configurable architecture has been discussed by Acharya et. al.[19][20].

## **3.** Computation of MA-CDFRFT

CDFRFT can be expressed [1] using Equation 1 as follows N-1

$$\{A_{\alpha}\}_{kn} = \sum_{p=0} V_{kp} V_{np} e^{-jp\alpha}$$
(1)

Where  $V_{kp}$  is the k-th element of eigenvector p. Multiplying  $A_{\alpha}$  by the signal element x[n] and rearranging, we obtain

$$X_{\alpha}[k] = \sum_{p=0}^{N-1} V_{kp} \sum_{n=0}^{N-1} X_{np} e^{jp\alpha}$$
(3)

For a set of equally spaced values of a given by:

 $\alpha_r = 2\pi r/N$  r=0,1, ,N-1 (4) that correspond to the cases for which the trace of Aa becomes zero, we can rewrite the transform in terms of index r as

$$\begin{array}{ccc} N-1 & N-1 \\ X_{\alpha}[k] = \Sigma & V_{kp} \sum x[n] V_{np} e^{-j(2\pi/N)pr} \\ p=0 & n=0 \end{array}$$
(5)

For the ease of computation we define  $Z_k^+[p]$  as N-1

$$Z_{k}^{+}[p] = \sum_{n=0}^{\Sigma} x[n] V_{np}$$
(6)

Again defining Z<sub>k</sub>[p] as

$$Z_{k}[p] = V_{kp} \sum_{n=0}^{N-1} x[n] V_{np}$$
(7)

we can see that the transform can be expressed as a DFT, that is

$$X_{\alpha}[k] = \sum_{p=0}^{N-1} Z_{k}[p] e^{-j(2\pi/N)pr}$$
(8)

where  $r=0,1,\ldots,N-1$  and  $k=0,1,\ldots,N-1$ . Since Xk[r] contains all the CDFRFTs' corresponding to the discrete set of angles ar. [1] suggested that this matrix be called Multi-angle-CDFRFT or MA-CDFRFT.

# 4. Proposed FRFT Architecture

In the first step (Figure 1) the elements of the Eigen vector element Vnp are loaded into the processing elements [20].



#### Fig 1. Loading of Constants Vnp

In the second step (Figure 3) each of the signal elements are multiplied with the Eigen vector element Vnp. In the first cycle  $V_{00}$ ,  $V_{10}$ ,...,  $V_{n0}$  is multiplied by x[0], x[1], ...., x[N-1] respectively. In the next cycle these multiplied values move to the second row whereas the first row multiplies the next set of signal elements with  $V_{11}$ ,...,  $V_{n1}$ . Finally in cycle N the value derived from each of the processing element are added at  $\Sigma$  and the value  $Z_k^+[0]$  is derived.



Fig 2. Multiplying each single element by  $V_{\mbox{\tiny np}}$  & taking the sum

We now discuss the third step (Figure 5) of FRFT architecture [20].. The elements derived at  $\Sigma$  (the adder) of step 2 of FRFT are transferred systolically to the processing elements containing the elements containing  $V_{00}$ ,  $V_{01}$ , ...,  $V_{N-10}$ . Thus the elements  $Z_k[0]$ ,  $Z_k[1], \ldots, Z_k[N-1]$  are derived(Figure 5).



Fig 3. Calculation of  $Z_k[P]$ 

This element multiplied by the twiddle factor summed from 0 to N-1 gives the corresponding FRFT component for the rth plane. This computation can be done in a fast manner using FFT. We propose this computation be done using a UP/DOWN systolic array [19], which essentially consists of one upwards path for the bottom N/2, sets of input data and a downward path for the top N/2 sets of input data.



Fig 4: Detailed Working of the up/down array in FFT architecture

# 5. Computational Complexity of the Architecture

First we define Utilization Factor: Let there be N number of processing elements. In a particular cycle i, if  $N_i$  be the number of processing elements used ( $N_i \le N$ ) then the utilization factor at cycle i is defined as

$$U_i = N_i/N$$

We divide the computation into the following stages:

In the first stage  $v_{np}$  is loaded in the systolic array. We assume there are N cells which stores  $V_{0p}$  to  $V_{N-1p}$  and  $T_1$ is the transfer time of the signal element from the left to the right cell. Then the time needed to fill the two dimensional systolic array is NT<sub>1</sub>. Here the utilization factor of the systolic array after NT<sub>1</sub> time is 100 percent. In the second stage computation of  $\sum x[n] v_{np}$  is done. We assume that clock period of real addition is T<sub>2</sub> and clock period to switch between real to imaginary signal components the selector switch [20] is T<sub>3</sub> then the total time is:

N (N-1)/2\*
$$T_1 + T_2 \log_2 N + T_3$$
.

The PE utilization is 50% in this stage.

In the third stage  $f \sum v_{kp} \sum x[n] v_{np}$  is computed. Again we assume the time for real multiplication and real addition is  $T_4$  and to configure the required connection is  $T_5$ . So the total time required in this stage is  $T_4+T_5$ . The PE utilization is 50% in this stage.

In the final step Computation of FFT components are done. As the size of the longest up/down is N/2 and the number of stages is  $log_2N$  the total computation time is:

$$[1 + N/2 up/down] log_2(N)$$

1 unit of time is chosen for each addition. Here the PE utilization is 100% after  $[1 + N/2 \text{ up/down}] \log_2(N)$  time unit.

The above information may be summarized as follows

Table 1: FRFT calculation of Time complexity of different stages

| Step | Time Required                                                  | Time Complexity |
|------|----------------------------------------------------------------|-----------------|
| 1    | NT <sub>1</sub>                                                | O (N)           |
| 2    | N (N-1)/2*T <sub>1</sub> + T <sub>2</sub> log <sub>2</sub> N + | $O(N^2)$        |
|      | T <sub>3</sub>                                                 |                 |
| 3    | $T_4 + T_5$                                                    | 0(1)            |
| 4    | $[1 + N/2 up/down] log_2(N)$                                   | $O(N \log_2 N)$ |

# 6. Re-configurable FRFT Architecture

In our proposed architecture we propose a re-configurable processing element, which can be dynamically reconfigured for different stages of FRFT. The processing element is controlled by a control unit, which generates control signal to do the necessary reconfiguration. As the architecture is systolic, each processing element has two inputs Hin & Vin used for sending necessary reconfiguration signals and two outputs Hout & Vout [20]. The process of reconfiguration is discussed stepwise Firstly, eigen vector elements  $V_{np}$  is systolically transferred through  $H_{in}$  and the vector components is stored in the register R1 and R2. R3 will initially store the zero and R1 is bypassed to Hout. Secondly, the signal component V<sub>np</sub> transferred through H<sub>in</sub> is multiplied with the register value of R1 and then this value is added with the signal component X (n, p-1) transferred through V<sub>in</sub> and stored to the register R3. Then the value is by passed through V<sub>out</sub>. In the third stage the signal H<sub>in</sub> is stored in the register R1. Then this valued is transferred to H<sub>out</sub>. Next the value of the register R1 and R2 is multiplied and stored in the register R3. The output is the vector  $Z_K[p]$ . Finally, the value of the signal element H<sub>in</sub> is stored to the register R2 and the value of the register R3 and R2 is multiplied and stored to the register R4. This finally yields the transform vector  $X_{\alpha}[k]$ 



Fig 4: Reconfigurable Processing Element for computation of MA-CDFRFT

# 7. Conclusions

In this paper we discussed a re-configurable architecture that computes MA-CDFRFT transforms in four stages. The algorithm developed has a complexity of order  $(N^2 \log$ N). We discuss some other features that could make the architecture more versatile. Firstly the architecture could be made more Fault Tolerant so that a failed PE is disabled. The rest of the system continues to function as usual. Secondly since there is multiple PEs, there can be multiple simultaneous users of the system, each executing a different task. Thirdly we could use systolic rings to improve inter PE communication instead of systolic array. This architecture could be extended to computation of other image processing algorithms like DCT, DWT, FFT [15] and DFT. This could lead to the development of a generalized transform processor in which a single PE, upon the effect of a control signal, could compute various transforms.

IJCSI International Journal of Computer Science Issues, Vol. 7, Issue 6, November 2010 ISSN (Online): 1694-0814 www.IJCSI.org

#### Acknowledgments

We are indebted to Prof. Amitabha Sinha, Director, School of Information Technology, West Bengal University of Technology, Kolkata for his help in this work.

# References

- [1]Juan Gaspar Vargas-Rubio, "The Central Discrete Fractional Fourier Transform, properties, Computation and application to linear chirp signals", Ph. D thesis, The Univ. of New Mexico, Albuquerque, New Mexico, Dec. 2004.
- [2] Machiraju Vijay and C. Siva Ram Murthy, "Real-Time Simulation of Dynamic Systems on Systolic Arrays", IEEE Transactions on Industrial Electronics, Vol. 45, No.2, April 1998, pp. 326-332.
- [3] Rajiv Saxena and Kulbir Singh, "Fractional Fourier transform: A novel tool for signal processing" J. Indian Inst. Sci., Jan.–Feb. 2005, 85, pp. 11–26.
- [4] T Willey, T S Durrani and R Chapman "An FFT Systolic Processor And Its Applications". [5] Griselda Saldaña and Miguel Arias-Estrada, "FPGA-Based Customizable Systolic Architecture for Image Processing Applications", Proceedings of the 2005 International Conference on Reconfigurable Computing and FPGAs (ReConFig 2005), IEEE Computer Society.
- [6] S. Barua, J. E. Carletta, K. A. Kotteri and A. E. Bell, "An Efficient Architecture for Lifting-based Two-Dimensional Discrete Wavelet Transforms" GLSVLSI' 04, April 26–28, 2004, Boston, Massachusetts, USA.
- [7] Rafael C. Gonzalez, Richard E. Woods, Steven L. Eddins "Digital Image Processing using MATLAB" Pearson Education.
- [8] Chris Dick "Computing the Discrete Fourier Transform on FPGA Based Systolic Arrays".
- [9] Xilinx, "Introduction and overview", Virtex-II Pro Platform FPGAs, March 9th, 2004.
- [10] K.Sapiecha and R.Jarocki, "Modular Architecture For High Performance Implementation Of FFT Algorithm", 1986 IEEE.
- [11] Rami A. AL Na'mneh, W. David Pan and B. Earl Wells, "Two Parallel Implementations for One Dimension FFT On Symmetric Multiprocessors", ACMSE '04, April 2-3, 2004, Huntsville, Alabama, USA, pp-273-278.
- [12] S.Y. Kung, "VLSI Array Processor", Prentice Hall International Inc., ISBN:013942749X, 1988.
- [13] Pavel Sinha, Amitabha Sinha, Dhruba Basu, "A Novel Architecture of a Re-configurable Parallel DSP Processor", IEEE Int. Conf Proc. NEWCAS 05, June 19-22, 2005, pp. 71-74.
- [14] Kai Hwang and Faye A. Briggs, "Computer Architecture and Parallel Processing" McGraw-Hill, 1985.
- [15] Preston A. Jackson, Cy P. Chan, Jonathan E. Scalera, Charles M. Rader, and M. Michael Vai, MIT Lincoln Laboratory, "A Systolic FFT Architecture for Real Time

FPGA Systems".

- [16] Amitava Sinha, Pavel Sinha, Santanu Chatterjee and Dhruba Basu "An Efficient Re-Configurable Architecture of Centered Discrete Fractional Fourier Transform Processor".
- [17] C. Dick, "Computing multidimensional DFTs using Xilinx FPGAs," ICSPAT 98, Toronto, Canada, Sept. 1998.
- [18] N.I. Cho and S.U. Lee, "DCT algorithms for VLSI parallel implementations", IEEE Trans. Acoust., Speech, Signal Processing, vol. 38, Jan. 1990, pp. 121-127.
- [19] Soumen Mukherjee, Anal Acharya, "An Efficient Reconfigurable Architecture for Fractional Fourier Transforms", International Conference in Signals, Systems and Automation (ICSSA-09) held on 28-29 December, 2009 in CGET, Anand, Universal Publisher, Page 85-88, ISBN – 10: 1599428695, ISBN -13: 9781599428697.
- [20] Anal Acharya Soumen Mukherjee, "Designing Fractional Fourier Transforms using Systolic Arrays" National Conference on Emerging Trends in Computer Science and Information Technology (ETCSIC) -2010 Nashik held from 29-30th January 2010, in K K Wagh Institute of Engineering Education and Research, Page 163-166.

Anal Acharya is currently the Head of the Department of Computer Science in St. Xavier's College, Kolkata, India His present research interests include Distributed Computing, Object Oriented Modeling & Signal Processing Architecture. He has several published papers in National & International conferences and Journals. He has above 10 years of experience in undergraduate and postgraduate teaching & supervised several post graduate dissertations.

Soumen Mukherjee is with RCC Institute of Information Technology Kolkata, India His present research interests include Object Oriented Modeling & Signal Processing Architecture and collaborative learning. He has eleven published papers in National & International conferences and Journals. He has supervised several postgraduate dissertations. He is a Life Member of CSI, IETE, ISTE, ISCA and FOSET. He has also served as a co-opted member in the Executive Committee in the IETE Kolkata Center, India.