Explanation of the Reasons for Loss of Precision in Java Floating-Point Numbers - Java Programming - Programming Development - Eden Network

by henxue on 2010-07-16 13:56:54

Due to improper use of `float` or `double`, there may be issues with precision loss. The general situation can be understood through the following code:

### Java Code

```java

public class FloatDoubleTest {

public static void main(String[] args) {

float f = 20014999;

double d = f;

double d2 = 20014999;

System.out.println("f=" + f);

System.out.println("d=" + d);

System.out.println("d2=" + d2);

}

```

The results obtained are as follows:

```

f=2.0015E7

d=2.0015E7

d2=2.0014999E7

```

From the output, it can be seen that `double` can correctly represent the number `20014999`, while `float` cannot represent `20014999` and instead produces an approximate value. This result is quite surprising. Such a small number as `20014999` cannot be represented by `float`. With this question in mind, let's learn about floating-point numbers and share some simple insights, hoping it will help improve your understanding of Java floating-point numbers.

### 1. Representation of `float` and `double` in Java

Java supports two basic floating-point types: `float` and `double`. Both floating-point types in Java follow the IEEE 754 standard. IEEE 754 defines 32-bit single-precision and 64-bit double-precision binary floating-point standards.

IEEE 754 uses scientific notation with base 2 to represent floating-point numbers.

For 32-bit floating-point numbers (`float`), the first bit represents the sign of the number, bits 2 to 9 represent the exponent, and the last 23 bits represent the mantissa (fractional part).

**Float (32-bit):**

For 64-bit double-precision floating-point numbers, the first bit represents the sign of the number, 11 bits represent the exponent, and 52 bits represent the mantissa.

**Double (64-bit):**

Both consist of three parts:

1. A single sign bit `s` directly encodes the sign `s`.

2. `k` bits for the exponent `E`, represented using offset binary.

3. `n` bits for the fraction, represented in raw binary.

### 2. When does a number become unrepresentable?

Any number in Java must be converted into scientific notation at the lower level. So when would a number become unrepresentable? There are only two scenarios:

1. **Exponent overflow:** This occurs when the number is too large, exceeding the range that the exponent can handle. For example, if the maximum exponent is 10, but the number requires an exponent greater than 10, it becomes unrepresentable.

2. **Mantissa overflow:** This occurs when the precision of the number is too long, such as `1.3434343233332`. Although this number is small and less than 2, the exponent satisfies the requirement, but the mantissa cannot represent such a long precision.

### 3. Why can't `20014999` be accurately represented by `float`?

Based on the analysis above, it should already be clear that this number is not too large. After converting it into the IEEE754 scientific notation, the exponent meets the requirements, but the mantissa cannot represent such a precise number.

By analyzing the binary representation of `20014999` using both `float` and `double`, we can find the answer.

The following program can determine the binary representation of `20014999` under `double` and `float`.

### Java Code

```java

public class FloatDoubleTest3 {

public static void main(String[] args) {

double d = 20014999;

long l = Double.doubleToLongBits(d);

System.out.println(Long.toBinaryString(l));

float f = 20014999;

int i = Float.floatToIntBits(f);

System.out.println(Integer.toBinaryString(i));

}

```

The output is as follows:

```

Double: 100000101110011000101100111100101110000000000000000000000000000

Float: 1001011100110001011001111001100

```

Analysis of the output results:

For `double`, adding the sign bit `0` on the left forms a complete 64-bit binary number. According to the `double` representation method, it is divided into three parts: the sign bit, the exponent, and the mantissa, as shown below:

```

0 10000010111 0011000101100111100101110000000000000000000000000000

```

For `float`, adding the sign bit `0` on the left forms a complete 32-bit binary number. According to the `float` representation method, it is also divided into three parts: the sign bit, the exponent, and the mantissa, as shown below:

```

0 10010111 00110001011001111001100

```

Green represents the sign bit, red represents the exponent, and blue represents the mantissa.

Comparison yields:

- The sign bits are both `0`.

- The exponents, represented using offset binary, are equal in both cases.

- The only difference lies in the mantissa.

In `double`, the mantissa is: `0011000101100111100101110000000000000000000000000000`, omitting the trailing zeros, requiring at least 24 bits to represent it correctly.

In `float`, the mantissa is: `00110001011001111001100`, consisting of 23 bits.

Why is this the case? The reason is obvious: the mantissa in `float` can represent at most 23 bits. Therefore, the 24-bit number `001100010110011110010111` gets rounded to 23 bits as `00110001011001111001100` in `float`. As a result, `20014999` becomes `20015000` in `float`.

This means that although `20014999` is within the representable range of `float`, the precision length of the IEEE 754 `float` representation cannot represent `20014999` and instead provides an approximate value through rounding.

### Summary

Floating-point operations are rarely exact. Errors occur whenever the precision exceeds what can be represented. These errors often arise not because of the size of the number but because of its precision. Therefore, the result is close to but not exactly the desired result, especially when performing precise calculations with `float` and `double`.

Alternative solutions can be considered, such as using `BigDecimal` combined with `String` or using the `long` type for conversion.

Article URL: 【Eden Network】http://www.edenw.com/tech/devdeloper/java/2010-07-16/4752.html

tags users