Use Floats With Prudence

In the world of mathematics real numbers have infinite precision and are therefore continuous and lossless. Floating-point numbers, which are used in computing, are limited with a finite precision and are not evenly spaced throughout their range. If don’t keep this in mind you can get into many numerical blunders.

To illustrate, assign the highest 32-bit integer value (2147483647) to a 32-bit float variable (let’s call it “x”), and print it. You’ll see

Now print x-64. Still 2147483648. Now print x-65, and you’ll get 2147483520! Why? Because the spacing between adjacent floats in that range is 128, and floating-point operations round to the nearest floating-point number.

Unlike integers, which are stored in the memory as single values and thus have fixed incremental step (equals 1) throughout their whole range, floating-point numbers are stored in two separate values: mantissa and exponent.

The sign bit tells whether the number is positive or negative, the exponent gives its order of magnitude, and the fraction (mantissa) specifies the actual digits of the number. This is expressed as follows:

mantissa x 2 exponent

In this way floating-point numbers achieve much greater range for the same storage space but at the cost of precision. A 32-bit float only offers about 7 digits of precision and can be scaled from ~1.2×10-38 to ~3.4×1038. For the double these values are: 16 digits of precision and range from ~1.8×10-308 to ~2.2×10308.

Additionally, we’re all used to work with numbers in base of 10, but, as you have noticed, floating-point numbers use base 2 internally. All these conversions cannot take place without a small loss of precision. This can lead to confusing results: for example, (int) ((0.1+0.7)*10) will usually return 7 instead of the expected 8, since the internal representation will be something like 7.9999999999999991118….

So never trust floating number results to the last digit and never compare floating point numbers for equality. Also it should go without saying that you shouldn’t use floating-point numbers for financial applications—that’s what decimal classes in languages like Java and C# are for. Floating-point numbers are intended for efficient scientific computation. But efficiency is worthless without accuracy, so remember the source of rounding errors, and code accordingly!

computing

Max Drobotov

Software Developer

Boost your business with custom software

Tell us about your business needs and we’ll suggest a solution

Thank you!

We have received your request and will get back to you within 1 business day

Chat with us on WhatsApp