# A CDMA Based Scalable Hierarchical Architecture for Network-On-Chip

Ahmed A. El Badry<sup>1</sup> and Mohamed A. Abd El Ghany<sup>2</sup>

<sup>1</sup> Communications Engineering Dept. , German University in Cairo, Cairo, Egypt

<sup>2</sup> Electronics Engineering Dept, German University in Cairo, Cairo, Egypt Darmstadt University, Darmstadt, Germany

### Abstract

A Scalable hierarchical architecture based Code-Division Multiple Access (CDMA) is proposed for high performance Network-on-Chip (NoC). This hierarchical architecture provides the integration of a large number of IPs in a single on-chip system. The network encoding and decoding schemes for CDMA transmission are provided. The proposed CDMA NoC architecture is compared to the conventional architecture in terms of latency, area and power dissipation. The overall area required to implement the proposed CDMA NoC design is reduced by 24.2%. The design decreases the latency of the network by 40%. The total power consumption required to achieve the proposed design is also decreased by 25%.

**Keywords:** Scalable, CDMA, NoC, Hierarchical Architecture, Paper dissipation.

# 1. Introduction

Designing a bus-based architecture for a system which contains a large number of intellectual property (IP) blocks is considered time consuming. Thus, researches focused on NoC architectures due to their scalability and efficiency in the construction of complex systems [1]-[4]. NoC designs which follow a packet-switching paradigm for the routing of information throughout the network are capable of achieving very high aggregate bandwidth within the chip [5],[6]. The problem of varying packet transfer latency rises when designing a packet-switching based NoC which applies a point-to-point connection. This is due to the of different transmission using routes when communicating with different destination points or with the same destination point [7], [8].

CDMA has been widely used in wireless communication because of its high bandwidth efficiency and multiple access capability [9]. The encoding and decoding of the CDMA transmission process is done by spreading the user's information with orthogonal codes. The multiple access properties of CDMA are used in the proposed design for the routing of the data between the connected blocks. The proposed NoC is shown in Fig. 1.



Fig. 1 Proposed local switch

Multiple studies have been done on applying the CDMA technique for NoC. Paper [6] proposed a star topology NoC design that is scalable to handle large systems. An Arbiter-based switch was proposed in [7], to perform the routing among the nodes. Also, it was concerned with comparing the results of a PTP packet-switched NoC and a CDMA NoC. A scalable CDMA router design was proposed in [10], which allowed the construction of star+star and star+mesh network architectures, having the capability to be scaled to meet the requirements of high

performance applications. In paper [11], the authors presented a hybrid mesh-star topology and a CDMA switch that is capable of solving the 2D-mesh topology hot-spot problem and enable multicasting within the mesh topology.

The main focus of this paper is to apply a scalable CDMA based NoC, which applies the encoding and decoding schemes proposed in this paper. The aim of the proposed design is to improve the area cost, latency and power dissipation of the system.

The rest of this paper is arranged as follows. In Section II, the aspects of applying the CDMA technique on an onchip network is discussed, including the design and operation of the encoding and decoding scheme. In Section III, the components, architecture and functionality of the local switches is discussed. The structure of the central switch and the behavior of the hierarchy are explained in Section IV. In Section V the simulation results is provided. Finally, the conclusion is presented in Section VI.

# 2. Applying CDMA in NoC

The CDMA technique functions at the principle of encoding the original data with a set of orthogonal codes. The encoded data from different users are then added together for transmission without interfering with each other due to the orthogonal property of the spreading codes. The meaning of being orthogonal is that the normalized auto-correlation of the spreading codes is 1. However the cross-correlation value of the spreading codes is 0. The receiving user can then extract or decode the original data by multiplying the received sum with the corresponding unique spreading code used for encoding. In the following subsections the spreading code selection is explained and the proposed encoding and decoding circuits and algorithms are presented in detail.

## 2.1 Spreading Code selection

Many spreading code types have been proposed for CDMA communication, such as Kasami sequence, M-sequence, gold sequence and Walsh codes, etc. [12]. 8-bit Walsh codes are used in the proposed design, as for an X number of bits Walsh code there are X-1 non-zero codes. Thus X-1 users are able to communicate with each other. Each IP block in the design is assigned a unique code for encoding its data with.

# 2.2 Digital Encoding scheme

Several on-chip digital encoding algorithms have been proposed in [6], [7]. The issue with the previous schemes

is the requirement of computing with decimal values, representing each value with its binary equivalent. For example, if a result of a summation is 7 it will be transmitted as "111", thus affecting the aggregate bandwidth of the system. Another issue is the need of extra hardware to perform the encoding and decoding operations, such as arithmetic adders, accumulators and multipliers. The encoding and decoding scheme is proposed for the design. The proposed scheme is implemented with simple logic gates, reducing the logic delay of the system. The encoding operation is done as follows. Data from each sending node undergoes an XOR operation with its corresponding unique spreading code. After data from all sending users are encoded, they are merged with each other using an OR gate, resulting in a stream of single value binary bits with holding the data of each user. A simple example of the encoding operation of two nodes is shown in Fig. 2.



Fig. 2 Encoding Example

In Figure 2 two users transmit data, the first transmits 1 and the second transmits 0; each original data bit is encoded with the senders unique spreading code which is '00110011' for the first and '01011010' for the second. After the encoding operation is complete, the different encoded data from the users are operated together using an OR gate resulting in an 8-bit output '01111011'. This output with holds the data of all senders.

# 2.3 Digital Decoding scheme

The decoding scheme makes use of the orthogonal and balance properties of the Walsh codes. If the Walsh code of the sender is applied on the received encoded stream, the output should be the bit originally sent by the corresponding sender. The decoding operation is as follows. First, the transmitted stream is compared with the sender's spreading code. If the spreading code bit is 0, the



first received bit value enters the 0-Accumulator. Else if the spreading code bit is 1, it enters the 1-Accumulator. The Accumulators used are designed to AND the bits entering them with each other. Because of the orthogonal and balance properties of Walsh codes, there can only be one of the two Accumulators with all ones in it. Thus if the 0-Accumulator has a value of 1; then the original bit sent is 1. While if the 1-Accumulator has the value of 1, then the original bit sent was 0. An example of the decoding operation is shown in Fig. 3.



Fig. 3 Decoding Example

The input of the decoder in Fig. 3 is the output of the encoding example presented in Fig. 2 for illustration. In Fig. 3, each decoder circuit uses the spreading code of the corresponding sender. According to the bit comparison, the accumulators are filled. The outputs of the accumulators are then compared and the decoded bit is obtained.

## 3. Local Switch Architecture

The proposed CDMA NoC operates using 8-bit Walsh codes as the spreading code used for the encoding and decoding processes. This allows the local network to have seven IP blocks communicating over a single switch. The local switch consists of three main components, the Network node, Network Arbiter, CDMA transmitter and the Central out/in line. Each component will be explained in the following subsections.

#### 3.1 The Network Node

Each IP block in connected to the local switch network through the Network node. The Network node consists of three components, a FIFO buffer, the "Sender block" and the Receiver and decoder. Each component is explained in the following paragraphs.

1. The FIFO buffer: The FIFO buffer is used to store information from the IP block and send them to the sender block when ready to avoid head-of-line blocking (HOL). The length of the buffer is 32-bits and four buffers are used; each buffer is referred to as a stage. The buffer design is based on shift-registers with 'Full' and 'Empty' signals. When a stage is full it sets the 'Full' signal to high and the write pointer shifts to the next stage. The 'Empty' signal is high when there is no data in the current stage and the read pointer shifts to the next stage. Also, the design is based in circular buffering. For example, if the write pointer reaches the last stage in the buffer (4<sup>th</sup> stage in case of this design), it will point back to the first stage.

2. The Sender block: This block acts as a negotiator between the Network Arbiter and the "Network node" for proper transmission. The Sender block extracts the routing fields from the packet stored in the FIFO buffer and sends them to the Network Arbiter. When the receiving node is ready the Network Arbiter sends an ACK to the Sender block, and transmission begins.

*3. The Receiver and decoder:* The receiver acquires the sender node information from the Network Arbiter and uses this information to initialize the decoder.

#### 3.2 The CDMA Transmitter

After the Sender block of the sending node receives the ACK signal from the Network Arbiter, it starts to extract the information payload from the FIFO buffer and sends them to the CDMA transmitter for the encoding operation.

#### 3.3 The Network Arbiter

The Network Arbiter acts as a traffic cop for the network. This block is responsible for the routing of data throughout the network. The routing fields in the packet are used to ensure proper transmission between the sending node and the receiving node. Fig. 4 shows the structure of the packet.

| End | Num. of<br>Packets<br>in MSG | Source<br>ID | Destination<br>ID | Switch<br>ID |
|-----|------------------------------|--------------|-------------------|--------------|
|-----|------------------------------|--------------|-------------------|--------------|

Fig. 4 Packet Structure

Each message is divided into packets; the first packet is defined as the routing packet. The routing packet contains all the details required for proper transmission of the message.

The Network Arbiter checks the switch ID first to identify if the transmission is local or a central transmission. The destination ID is then used to check if the requested node free or busy receiving from another node. If the requested node was found busy, then the Arbiter will send an NACK back to the sender node, forcing the sender to wait and listen until the requested node is free for receiving. When the requested node is free, the Arbiter will send the sender ID to it. After the receiver uses the sender ID to fetch the corresponding unique spreading code of the sender and is ready for receiving, it sends an ACK back to the Arbiter. The Arbiter then sends an ACK back to the sender node granting transmission. The Number of packets field indicates the number of packets contained in the message sent. The receiving node will be dedicated to the sender until all the packets of the message are received. When the message is done the End bit is set to 1.

## 3.4 The Central IN/OUT Line

When the local switch is receiving a central transmission, the Central IN line transmits the data received from the central switch over the requested node specified line. The node receiver then decodes the data according to the local switch unique spreading code.

If the local switch wants to initiate a central transmission. The Arbiter will negotiate with the central switch, when transmission is granted the CDMA transmitter will send the encoded data through the central OUT line. The data is then handled by the central switch and is responsible for routing it to the requested switch.

# 4. Central Switch and Hierarchy

As more IP blocks are connected in a single on-chip system. An efficient scalable hierarchy is needed to form a large NoC. By scaling the local switch architecture, a larger network can be obtained. The hierarchy is shown in Fig. 5.

Since a single local network has seven IP blocks connected through it. Then by assuming each local network as a peripheral in a network, there can be seven local networks connected together through a central switch. The central switch behavior and hierarchy is explained in the following subsections.



Fig. 5 CDMA Hierarchy

# 4.1 The Central Switch

Similar to the structure of the local switch, the central switch contains a Network Arbiter and a CDMA transmitter. An extra decoder is added in the central switch structure, its functionality will be explained further on.

When the local Arbiter identifies that the switch ID is not similar to the local one. The Arbiter will send the routing data to the central switch to initialize a central transmission. The central switch then identifies the requested switch and sends the destination and source ID to it. After the requested node is ready an ACK is sent to the central switch through the requested local switch, which then grants transmission to the sender switch. Data is then sent to the central OUT line in their raw form, to be encoded with in the central switch according to the spreading code of the local switch.

The encoded data is then added with other central transmission and sent to the local switches. The central IN/OUT line in the local switch then routes the received data to their requested nodes for decoding.

# 4.2 Scalable Hierarchy

The proposed hierarchy can be expanded in two ways. One way is to use a larger Walsh codebook, in this design 8-bit Walsh code is used, resulting in a hierarchy connecting 49 IP blocks. By using 16-bit Walsh code the hierarchy can connect up to 15 IP blocks per local switch, resulting in a total of 225 connected IP blocks in the hierarchy. This method will increase the latency needed for the encoding



and decoding processes. Also, it will introduce an increase in the area cost of the encoder and decoder.

Another way is to scale the hierarchy to another level, treating a single central network as a peripheral connected to a larger global switch, resulting in a network connecting 343 IP blocks using 8-bit Walsh codes. The later method needs extra routing fields in the packet such as the central switch ID. However, this method shows more efficiency for large systems. This is due to that the scaling depends on the routing rather than the delay introduced by the encoding and decoding operations. Since, the routing delay in this design is minimal; therefore this method shows higher capability.

## 5. Simulation Results

The proposed architecture was described using VHDL language. 90 nm technology node is used to synthesize the proposed design using Leonardo spectrum tool by Mentor Graphics. The results provide values for the area, latency and power dissipation. A comparison between a hierarchy applying the CDMA encoding and decoding schemes proposed and a hierarchy applying a conventional encoding scheme is shown in Table 1 in terms of area cost and power. Given the operating frequency of 460 MHz and supply voltage of 1 V, the power dissipation is obtained. The results show that the proposed design decreases the overall area cost of the hierarchy by 24.2%. Also, the power dissipation is decreased by 25%.

|  | Table 1: Com | parison in tern | ns of area an | nd power |
|--|--------------|-----------------|---------------|----------|
|--|--------------|-----------------|---------------|----------|

| Design                    | Area            |                | Power<br>Dissipation |                |
|---------------------------|-----------------|----------------|----------------------|----------------|
|                           | (Gate<br>Count) | %<br>reduction | W                    | %<br>reduction |
| Proposed<br>Hierarchy     | 723572          | 24.2           | 1.2                  | 25             |
| Conventional<br>Hierarchy | 953942          | -              | 1.6                  | -              |

Both of the hierarchies are tested for two different transmission scenarios. The first scenario is the initiation of a local transmission. Where one node sends a message to another node connected to the same local switch. The second scenario is initiating a central transmission. Where a node transmits a message to another node but connected to a different local switch. The message used for testing consists of 10 packets. The data packets are randomly generated. The routing packet latency is calculated separately as it is treated in a different manner, where no encoding and decoding operations are involved. Given the operating frequency of 460 MHz, the latency of transmitting the generated message was calculated for both scenarios and the results are shown in Table 2.

The proposed design shows an improvement of 40% in the local transmission latency. Also, the latency is reduced by 38.1% in the central transmission case.

| Table 2: Latency co | omparison for loca | and centra | l transmissions |
|---------------------|--------------------|------------|-----------------|
|                     |                    |            |                 |

| Design                    | Local<br>Transmission<br>Latency |                | Central<br>Transmission<br>Latency |                |
|---------------------------|----------------------------------|----------------|------------------------------------|----------------|
|                           | μs                               | %<br>reduction | μs                                 | %<br>reduction |
| Proposed<br>Hierarchy     | 1.2                              | 40             | 1.3                                | 38.1           |
| Conventional<br>Hierarchy | 2                                | -              | 2.1                                | -              |

## 6. Conclousion

An on-chip scalable hierarchal network that applies CDMA technique is presented. The proposed CDMA NoC applies an effective encoding and decoding scheme. The proposed architecture shows high operational speed with low area and power overhead compared with the conventional design. The proposed design shows an improvement of 24.2% in area cost with a reduction of 25% in the power consumption. After performing a case study for the local and central transmission scenarios, the proposed design show a reduction of 40% in the local transmission latency. Also, shows an improvement of 38.1% in the central transmission latency. The CDMA approach provides an effective and reliable method for the implementation of high performance NoCs.

## References

- [1] F. Clermidy, C. Bernard, R. Lemaire, J. Martin, I. Miro-Panades, Y. Thonnart, P. Vivet, and N. Wehn, "A 477mW NoC-based digital baseband for MIMO 4G SDR," *in ISSCC Dig. Tech.* February 2010. *Papers*, pp. 278-279.
- [2] M. A. Abd El Ghany, M. A. El-Moursy, Darek Korzec and M. Ismail, "Asynchronous BFT for Low Power Networks on Chip," *Proceedings of the IEEE International Symposium on Circuits and Systems*. May 2010. pp. 3240-3243.
- [3] P. P. Pande, C. Grecu, M. Jones, A. Lvanov, and R. Saleh, "Performance Evaluation and Design Trade-Offs for Network-on-Chip Interconnect Architectures", *IEEE Tranaction on Computers*. August 2005. vol. 54, no. 8, pp. 1028-1040.

www.IJCSI.org

- [4] M.A. Anders, H. Kaul, S.K. Hsu, A. Agarwal, S.K. Mathew, F. Sheikh, R.K. Krishnamurthy, and S. Borkar, "A 4.1Tb/s Bisection-Bandwidth 560Gb/s/W Streaming Circuit-Switched 8×8 Mesh Network-on-Chip in 45nm CMOS," *in Proc. IEEE Int. Solid-State Circuits Conf.* February 2010. pp. 110–112.
- [5] D. Kim, M. Kim, G. E. Sobelman, "CDMA Based Network On Chip Architecture," *IEEE Asia-Pacific Conference on Circuits and Systems*. December 2004. pp. 137-140.
- [6] A. Ganguly, K. Chang, S. Deb, P.P. Pande, B. Belzer, C. Teuscher," Scalable Hybrid Wireless Network-on-Chip Architectures for Multi-Core Systems," IEEE Transactions on Computers, October, 2011. Vol. 60, No. 10, pp. 1485-1502.
- [7] X. Wang, T. Ahonen, Jari Nurmi, "Applying CDMA technique to Network-on-Chip," *IEEE Trans. VLSI*. October 2007. pp. 1091-1100.
- [8] K. Goossens, J. Dielissen, and A. Radulescu, "AEthereal network on chip: Concepts, architectures, and implementations," *IEEE Des. Test Comput.*, October 2005. vol. 22, pp. 414–421.
- [9] A. A. El Badry, M. A. Abd El Ghany, "CDMA technique for Network-on-Chip," *IEEE DDECS*. April 2012, pp. 163-166.
- [10] M. Kim, D. Kim, and G. Sobelman, "Design of a highperformance scalable CDMA router for on-chip switched networks," *in Proc. Int. Soc Des. Conf.*, 2005. pp. 32-35.
- [11] W. Lee, G. E. Sobelman, "Mesh-Star Hybrid NoC Architecture with CDMA Switch," *IEEE International Symposium on Circuits and Systems. ISCAS*, May 2009, pp. 1349-1352.
- [12] D. Kim, M. Kim, G. E. Sobelman, "FPGA based CDMA switch for Network-on-Chip", *IEEE 13th Annual* Symposium in Field-Programmable Custom Computing Machines. October 2005, PP. 283-284.

Ahmed A. El-Badry received the B.S degree in Communication engineering from the German University in Cairo (GUC), Cairo, Egypt in 2012. His research interest is in Network on Chip design. He was the author of a paper based on his B.S project in 2012.

**Mohamed A. Abd El Ghany** received the B.S. degree in electronics and communications engineering (with honors) and the Masters degrees in electronics engineering from Cairo University, Cairo, Egypt, in 2000 and 2006, respectively, and Ph.D degree in the area of high-performance VLSI/IC design from the German University, Cairo, Egypt in 2010. From 2003 to 2006, he was in National Space Agency of Ukraine, EGYPTSAT-1 project. From 2008 to 2009, he was an International Scholar at the Ohio State University, Electrical Engineering Dept., Columbus, USA. He is currently working as a Lecturer in German University in Cairo, Egypt. His research interest is in Network on Chip design and related circuit level issues in high performance VLSI circuits, clock distribution network design, and low-power design. He is the author of about 20papers and two book chapters in the fields of high throughput and low-power NoC design.

Copyright (c) 2012 International Journal of Computer Science Issues. All Rights Reserved.

