Hardware implementation of square root?

Asked 15/1, 2012 at 17:1 Answered 16/1, 2012 at 2:9

I'm trying to find a little bit more information for efficient square root algorithms which are most likely implemented on FPGA. A lot of algorithms are found already but which one are for example from Intel or AMD? By efficient I mean they are either really fast or they don't need much memory.

EDIT: I should probably mention that the question is generally a floating point number and since most of the hardware implements the IEEE 754 standard where the number is represented as: 1 sign bit, 8 bits biased exponent and 23 bits mantissa.

Thanks!

Sharpen answered 15/1, 2012 at 17:1 Comment(11)

#1529227 has detailed information. – Genip 15/1, 2012 at 17:7

Why not implement this? You only do shifts and adds and no extra memory is needed for things like look-up tables. Looks like a good candidate for an FPGA. – Femmine 15/1, 2012 at 18:36

Thanks for the comment @Alex. I'll try to find some more resources, because I still don't how can I implement that in VHDL. One more question, doesn't that find just the integer part of sqrt? – Sharpen 15/1, 2012 at 19:49

Do you want to solve one square root as fast as possible, or solve a continuous stream of square roots as fast as possible? – Jadeite 16/1, 2012 at 2:8

@DimitarPetrov: right, that particular piece of code calculates the integer square root of an integer. But you can reuse it for floating-point values too because sqrt(mantissa*2^exponent)=sqrt(mantissa)*2^(exponent/2) and you can always represent your number as an integer mantissa times some even power of 2. You should have included the details about floating-point square root and VHDL in the question. Actually, the VHDL probably deserves a separate question. – Femmine 16/1, 2012 at 5:15

@GregS: Doesn't it mean that solving one as fast as possible will solve a continious stream also fast as possible? I was asking for one square root. Alex: The VHDL part is not so important right now, so that's why I didn't add that to the question. I'll try to implement it later. – Sharpen 16/1, 2012 at 8:48

@DimitarPetrov: No. For example one may take 8 ticks per square root, while the other may take 16 ticks before the first answer comes out of the pipeline, but then each successive answer comes out one tick later. – Jadeite 16/1, 2012 at 13:16

Thanks for the naswer @GregS. So any recommendations for algorithm calculating fast single square root? – Sharpen 16/1, 2012 at 13:38

@DimitarPetrov: No, I'm useless in that regard, though I think Paul S has the correct answer. – Jadeite 16/1, 2012 at 13:39

@Alex: "sqrt(mantissa*2^exponent)=sqrt(mantissa)*2^(exponent/2)" is true, but sqrt(mantissa) might be outside of the range 0.25->0.5. If that's the case you'll need to re-normalise it, and have to adjust the exponent +/- 1. This is just a final stage on the output though. – Krakow 16/1, 2012 at 17:11

@PaulS: Of course one will need to make sure the result is properly normalized if possible. – Femmine 16/1, 2012 at 17:41

Not a full solution, but a couple of pointers.

I assume you're working in floating point, so point 1 is remember that floating point is stored as a mantissa and exponent. The exponent of the square root will be approximately half the exponent of the original number thanks to logarithms.

Then the mantissa can be approximated with a look-up table, and then you can use a couple of newton-raphson rounds to give some accuracy to the result from the LUT.

I haven't implemented anything like this for about 8 years, but I think this is how I did it and was able to get a result in 3 or 4 cycles.

Krakow answered 16/1, 2012 at 2:9 Comment(3)

Thanks Paul! Can you point me to particular algorithm? Yep, I'm working in floating point and just editted my question... or could you expand a little bit your explanation because I didn't quite understand everyhing :) – Sharpen 16/1, 2012 at 14:40

Unfortunately it's been so long that I can't remember the precise details, and it was for a previous employer so I can't look it up either. If you have specific questions I'll do my best to answer them. – Krakow 16/1, 2012 at 16:0

Thanks @Paul. I've marked your question as best answer, since that gave me some ideas and point me (hopefully) in the right direction. Thanks – Sharpen 16/1, 2012 at 22:1

This is a great one for fast inverse-quare root.
Have a look at it here. Notice it's pretty much about the initial guess, rather amazing document :)

Jotham answered 15/1, 2012 at 17:7 Comment(4)

Thanks a lot! I've seen that already and looks pretty impresive, however the "magic number" scared me a little bit in the beginning. I will take a look again :) – Sharpen 15/1, 2012 at 17:41

I don't feel this answer is really relevant to the question. This algorithm only computes a rough approximation of the inverse of the square root, and definitely is not was is implemented on FPGA's – Overwind 15/1, 2012 at 18:9

Thanks Sven. Does most implementations on FPGAs just use some variant on the Newton-Raphson method? Like some inversion to get rid from the division, which itself is a expensive operation? – Sharpen 15/1, 2012 at 18:13

@DimitarPetrov The magic number is only required for the inverse square root. The square root can be achieved using a right shift and a few rounds of Newton-Raphson. – Elmaelmajian 26/6, 2020 at 15:31

Recommended topics

Hot tags