Implement Synthesizable Square Root Algorithm On Fpga Engineering Essay

Implement Synthesizable Square Root Algorithm On Fpga Engineering Essay

The chief aim of this paper is to implement synthesizable square root algorithm on FPGA. As square root map is non synthesizable on Silicon, this paper proposes optimized non reconstructing square root algorithm for unsigned 8 spot figure on ED2C20F484C7 device in Cyclone II household. This algorithm is implemented in gate degree abstraction of Verilog HDL. The basic edifice block of the design is CSM ( Controlled Subtract Multiplex ) block. It makes usage of lone subtract operation and add on 01 which is an betterment over reconstructing algorithm.

Keyword: FPGA, CSM, Verilog HDL, fixed point


The square root map is a basic operation in computing machine graphic and scientific computation application. Due to its algorithm complexness, the square root operation is difficult to be designed in digital system. As known, digital system has been used in day-to-day life or industrial intent that may hold been in demand of square root operation to to the full put to death its maps. Scientists have developed assorted algorithms for square root computation. But the execution of algorithms is hard because of their complexnesss and therefore consequences into long holds for its completion.

There are two chief households of algorithms that can be used to pull out square roots. The first household is that of digit return, which provides one figure ( frequently one spot ) of the consequence at each loop [ 6 ] . Each loop consists of add-ons and digit-by-number generations ( which have comparable cost )

Such algorithms have been widely used in microprocessors that did n’t include hardware multipliers. Most of the FPGA executions in seller tools or in the literature usage this attack. Second household of algorithms utilizations generations. It includes quadratic convergence returns derived from the Newton-Raphson loop [ 5 ] . The digit return attacks allow one to construct minimum hardware, while multiplicative attacks allow one to do the best usage of available resources when these include multipliers. Besides there are appraisal method and digit-by-digit method. Digit-by-digit method is classified into two distinguishable categories: restoring and non- restoring algorithm [ 1 ] .

In reconstructing algorithm, balance is restored in the regular flow. So its execution needs more hardware. Compared to the reconstructing algorithm, the non reconstructing algorithm does non reconstruct the balance, which can be implemented with fewest hardware resource and the consequence is hardware simple execution. It is most suited for FPGA execution.

Restoring and non reconstructing square root computation

Restoring Algorithm

Measure 1: If it is a 2n spot figure so split it in a group of 2 spots

Step2: Subtract 1 from the first 2 figures ( get downing from MSB )

Step3: Whenever the consequence of the minus is positive so the developed root is ‘1 ‘ otherwise ‘0 ‘

Step4: Whenever the consequence is negative, compose it as it is. We have to reconstruct the incorrect conjecture by add oning

’01 ‘ and guessed square root.

Step5: Now take the following two figures

Step6: Append ’01 ‘ ( to be subtracted from following two figures of dividend ) and guessed square root to

subtract from the balance.

Step7: If the consequence of minus is negative so restore old balance by adding incorrect conjecture by

add oning ’01 ‘ and guessed square root.

Step8: Every clip guessed square root has to be updated while add oning ’01 ‘ .

Step9: Continue the stairss until the group of two figures end

1 0 0 1.1 0 1 0

01 01 11 01.00 00 00 00

– 01

00 01 return following two figures from dividend

– 1 01 Append 01

Negative value 11 00

+ 1 01

0 0 01 11

-10 01

11 10 Negative value

+ 10 01

01 11 01

– 10 00 01

11 00 00

– 10 01 01

01 01 1 00

10011 01


+ 1001101

010110 00

– 010011001

000010111 00

– 100110101


Figure 1: The illustration of reconstructing algorithm to work out square root

B. Proposed Modified Non Restoring Algorithm

A small alteration in non reconstructing algorithm makes computation faster. It uses merely subtract operation and appends “ 01 ” . It uses n phase pipelining to happen square root of 2n spot figure. The undermentioned algorithm describes the modified non reconstructing square root algorithm.

Step1: Start

Step2: Initialize the radicand ( P ) which is 2n spot figure. Divide the radicand in two spots get downing at

binary point in both waies.

Step3: Beginning on the left ( most important ) , select the first group of one or two figures ( If n is uneven

so first group is one figure, else two spots )

Step4: Choose the first group of spots and subtract ‘ 01 ‘ from it. If borrow is zero, consequence is positive and

quotient is 1 else it is 0.

Step5: Append 01 ( to be subtracted following two figures of dividend ) and guessed square root to deduct

from balance of old phase

Step6: If consequence of minus is negative, write old balance as it is and quotient is considered as

0, else compose the difference as balance and quotient as 1.

Step7: Repeat step 5 and step 6 until end group of two figures.

Step8: End

1 0 0 1.1 0 1 0

01 01 11 01.00 00 00 00 00

– 01

00 01 return following two figures from dividend

– 1 01 Append 01


– 10 01

01 11 01

100 01

11 00 00

1001 01

001011 00


00101100 00

– 10011001

0000010111 00



Figure 2: The illustration of modified non reconstructing algorithm to work out square root

Basic Building Block for Non reconstructing algorithm

Input signals of the edifice block are x, Y, B and U while vitamin D and b0 ( borrow ) are end products.

If b0=0, so d & lt ; =x-y-b else vitamin D & lt ; =x ;

b0= ( ~ x.y ) + ( b.~x ) + ( by ) ;

d= ( ~x.y.~b.~u ) + ( ~x.~y.b.~u ) + ( x~y.~b ) + ( x.u ) + ( x.y.b ) ;


Figure 3: RTL schematic of CSM block

The generalisation of simple execution of non reconstructing figure by digit algorithm for unsigned 6 spot square root by array construction is shown in Fig.4. Each row of the circuit executes one loop of non reconstructing figure by digit square algorithm, where it merely uses subtract operation and appends ’01 ‘ .

Figure 4: Pipelined construction of 6 spot unsigned square root figure

The design can be optimized by minimising the logic looks and can be implemented by modifying CSM block. The specialised entities A, B, C, D, E, F, G and H are derived from CSM block and are defined as follows:

For csmA, ybu = 100

b0 = ~x

vitamin D = ~x

For csmB, yu = 00

b0 = ~x.b

vitamin D = ~x.b + ~b.x

For csmC, u = 0

b0 = ~x.y + ~x.b + y.b

vitamin D = ~x.y.~b + ~x.~y.b + x.~y.~b + x.y.b

For csmD, yb = 10

b0 = ~x

vitamin D = ~x.~u + x.u

For csmE, y = 0

b0 = ~x.b

vitamin D = ~x.b.~u + ~b.x + x.u

For csmF, xy = 00

b0 = B

vitamin D = b.~u

For csmG, xyb = 010

b0 = ~x

vitamin D = ~u

For csmH, xyu = 000

b0 = B

Figure 5: Optimized Pipelined construction of 8 spot unsigned square root figure

Consequences and analysis

The Non Restoring algorithm can be implemented with least hardware resources and the consequence will be the faster than reconstructing square rooting techniques. The beginning codification is implemented in such a manner that it can be extended harmonizing to user ‘s demand to cipher complicated square root in FPGA.

Figure 6: Simulation consequence of 8 spot square root utilizing non reconstructing algorithm

The DE1 kit has 4 seven section shows merely so the maximal figure which can be displayed is 9999d and besides it does n’t hold a denary point. Hence end product obtained is less precise if one of the shows is considered as a denary point.

Table 1 shows the list of Logic Elements use for 8 spot execution. This indicates the size of the

implemented circuit “ hardware resource ” .

Table 1: Comparison of LE ‘s use in 8 spot execution


Execution of non reconstructing algorithm for 8 spot

Lupus erythematosus


8 spot ( with seven section )



8 spot ( without seven section )



optimized 8 spot ( with seven section )



optimized 8 spot ( without seven section )


Table 2: PowerPlay Power Analyzer Status


PowerPlay Power Analyzer Status

8 spot with optimisation ( mW )

8 spot without optimisation ( mW )





Entire Thermal Power Dissipation





Core Dynamic Thermal Power Dissipation





Core Static Thermal Power Dissipation





I/O Thermal Power Dissipation





This execution and analysis shows that proposed method is most efficient of hardware resource. This is sensible, because it merely uses subtract operation and add on 01. The consequence shows that the proposed algorithm is easy to implement and besides uses less resources. The consequence is extended for square root execution of 8 spot floating point figure and besides it can be expanded to larger Numberss to work out complicated square root job in FPGA execution.