Reasons for Loss of Precision in Java Floating-Point Numbers - Java Programming - Programming Development - Eden Network

by henxue on 2010-07-16 13:57:25

Due to improper use of `float` or `double`, precision loss may occur. The general situation can be understood through the following code:

### Java Code:

```java

public class FloatDoubleTest {

public static void main(String[] args) {

float f = 20014999;

double d = f;

double d2 = 20014999;

System.out.println("f=" + f);

System.out.println("d=" + d);

System.out.println("d2=" + d2);

}

```

The results obtained are as follows:

```

f=2.0015E7

d=2.0015E7

d2=2.0014999E7

```

From the output, it can be seen that `double` can correctly represent `20014999`, while `float` cannot and instead produces an approximate value. This result is quite surprising. Even though `20014999` is a relatively small number, it cannot be accurately represented using `float`. With this issue in mind, let's explore floating-point numbers briefly, hoping it will help deepen your understanding of Java's floating-point numbers.

### 1. Representation of `float` and `double` in Java

Java supports two basic floating-point types: `float` and `double`. Both follow the IEEE 754 standard. IEEE 754 defines two standards for floating-point binary fractions: 32-bit single-precision and 64-bit double-precision.

IEEE 754 uses base-2 scientific notation to represent floating-point numbers.

For a 32-bit `float`, the first bit represents the sign, bits 2-9 represent the exponent, and the last 23 bits represent the mantissa (fractional part).

**Float (32-bit):**

For a 64-bit `double`, the first bit represents the sign, 11 bits represent the exponent, and 52 bits represent the mantissa.

**Double (64-bit):**

Both consist of three parts:

1. A single sign bit `s` directly encodes the sign.

2. An `E` exponent with `k` bits, using bias encoding.

3. An `n`-bit fraction, encoded in original form.

### 2. When does representation fail?

Any number in Java must be converted into scientific notation at the lowest level. Let's consider when a number might fail to be represented:

1. **Exponent overflow:** This occurs when the number is too large, exceeding the range that the exponent can handle. For example, if the maximum exponent is 10 but the number requires an exponent greater than 10, it cannot be represented.

2. **Mantissa overflow:** This happens when the number has too much precision, such as `1.3434343233332`. Although it is a small number and the exponent can handle it, the mantissa cannot store so many decimal places.

### 3. Why can't `20014999` be accurately represented by `float`?

Based on the above analysis, it should be clear that `20014999` is not a large number, and its exponent in IEEE 754 scientific notation is within the acceptable range. However, the mantissa lacks sufficient precision to represent the exact value.

By analyzing the binary representation of `20014999` under both `float` and `double`, we can understand why. The following program provides the binary representations of `20014999` for both `double` and `float`.

### Java Code:

```java

public class FloatDoubleTest3 {

public static void main(String[] args) {

double d = 20014999;

long l = Double.doubleToLongBits(d);

System.out.println(Long.toBinaryString(l));

float f = 20014999;

int i = Float.floatToIntBits(f);

System.out.println(Integer.toBinaryString(i));

}

```

Output:

```

Double: 100000101110011000101100111100101110000000000000000000000000000

Float: 1001011100110001011001111001100

```

Analysis of the output:

For `double`, adding the sign bit `0` on the left gives exactly 64 bits. According to the `double` format, it is divided into three parts: sign, exponent, and mantissa:

```

0 10000010111 0011000101100111100101110000000000000000000000000000

```

For `float`, adding the sign bit `0` on the left gives exactly 32 bits. According to the `float` format, it is also divided into three parts: sign, exponent, and mantissa:

```

0 10010111 00110001011001111001100

```

Key observations:

- The green part is the sign bit.

- The red part is the exponent.

- The blue part is the mantissa.

Comparison shows:

- The sign bits are both `0`.

- The exponents, represented in bias form, are equal.

- The only difference lies in the mantissa.

For `double`, the mantissa is: `001100010110011110010111 0000000000000000000000000000`, which requires at least 24 bits to be accurately represented.

For `float`, the mantissa is: `00110001011001111001100`, which consists of only 23 bits.

Why is this? Clearly, because the `float` mantissa can only represent up to 23 bits, the 24-bit value `001100010110011110010111` is rounded to 23 bits (`00110001011001111001100`) in `float`. Therefore, `20014999` becomes `20015000` in `float`.

In other words, although `20014999` is within the range of `float`, the precision of the IEEE 754 `float` format cannot represent `20014999` accurately and instead produces an approximate value through rounding.

### Conclusion

Floating-point operations are rarely exact. Errors occur when numbers exceed the precision limits of the type. Often, these errors are not due to the size of the number but rather its precision. As a result, the computed result is close but not exactly equal to the desired value. Extra care must be taken when performing precise calculations using `float` and `double`.

Alternative solutions can be considered, such as using `BigDecimal` combined with `String` or converting values using the `long` type.

Article Source: [Eden Network](http://www.edenw.com/tech/devdeloper/java/2010-07-16/4752.html)

tags users