Java 浮點數精確性探討（IEEE754 / double / float）與 BigDecimal 解決方案-有解無憂

一、拋磚引玉

一個簡單的示例：

double a = 0.0;
IntStream.range(0,3).foreach(i->a+=0.1);
System.out.println(a); // 0.30000000000000004
System.out.println(a == 0.3); //false

可以看到計算機因二進制&浮點數造成的問題離我們并不遙遠，一個double經過簡單的相加，便出現了影響正常性的結果，
我們可以通過 BigDecimal 來更詳細展示：

BigDecimal _0_1 = new BigDecimal(0.1);
BigDecimal x = _0_1;
for(int i = 1; i <= 10; i ++) {
	System.out.println( x + ", as double "+x.doubleValue());
	x = x.add(_0_1);
}

輸出：

0.1000000000000000055511151231257827021181583404541015625, as double 0.1
0.2000000000000000111022302462515654042363166809082031250, as double 0.2
0.3000000000000000166533453693773481063544750213623046875, as double 0.30000000000000004
0.4000000000000000222044604925031308084726333618164062500, as double 0.4
0.5000000000000000277555756156289135105907917022705078125, as double 0.5
0.6000000000000000333066907387546962127089500427246093750, as double 0.6000000000000001
0.7000000000000000388578058618804789148271083831787109375, as double 0.7000000000000001
0.8000000000000000444089209850062616169452667236328125000, as double 0.8
0.9000000000000000499600361081320443190634250640869140625, as double 0.9
1.0000000000000000555111512312578270211815834045410156250, as double 1.0

二、不精確的原因

常聽說double&float不精確，ieee754標準什么的，難道是標準導致的問題嗎？
原因：問題是多綜合因素導致的，而當下 iEEE754 標準則是各方面權衡下的盡可能逼近正確結果的一種方案

1. 二進制的必然局限

正如10進制下 1/3 = 0.333…無法精確表示，在二進制中若想表示1/10，則也是無限回圈小數
具體的 \(0.1_{(10)}=0.0010011001100110011..._{(2)}\)
這就本質上造成了若不以分數表示，一些其他進制中的精確數值在二進制中無法以有限位精確表示

2. 計算機中數值存盤方案

計算機中CPU對數值的存盤&運算沒有分數表示，而是以有有限位bit進行，（當然，可能會疑問為什么不以一定規則用分數精確存盤，并附上相應的一套運算規則？可參考這個討論）

因此對于無限小數，存盤位數一定的情況下必然會造成數值丟失，
如：\(0.1_{(10)}*3\) 在二進制 8bit 規則（若是單純截斷，沒有舍入）下，結果為 \(0.00011001_{(2)}* 3=0.01001011_{(2)}=0.29296875_{(10)}\) 而不會是 0.3
這就如 \(0.1_{(3)}*3\) 在十進制計算機中（若是單純截斷）結果是 0.99999999 而不會是 1

3. 計算機數值表示規范 IEEE-754

根據上述討論，便能認知到對于數值的存盤和計算規則是可以千變萬化的，
因此 IEEE 協會為了規范統一（方便CPU指令制造，各平臺兼容等等）出臺了 IEEE Standard for Floating-Point Arithmetic（IEEE-754）二進制浮點數算數標準，選用了浮點數作為儲存和算數標準，
該標準描述了包括"浮點數的格式"、"一些特殊數值"、"浮點數的運算"、"舍入規則與例外情況" 等等內容

三、IEEE-754 標準"部分"概述

1. 它定義了5種基本格式：

binary32、binary64、binary128、decimal64、decimal128
其中 binary32、binary64 便是常說的 float、double

2. float、double決議：

以 binary64（double）為例：
它具有以下格式：

sign：符號位，0為正，1為負
exponent：無符號整數，此處范圍為[0,2047]，實際應用時會加上一固定的偏移量，該偏移量根據exponent長度有所不同，而此處double 為 -1023，因此實際應用范圍為[-1022,1023]（缺少-1023和+1024是因為全0全1為特殊保留字）
precision：精度值，存盤有效數字（隱式的整數位1并不包含其中）

其最終值結果運算式為： \((-1)^{sign}*1.fraction_{(2)}*2^{e-1023}\)
基于這種格式，這也是為什么數越大精度越低，越小精度越高，因為越大則fraction中整數占位越多，而小數占位則越少，（下圖可見，小數部分已全部舍去，整數部分都開始舍入）

binary 32（float）同理：偏移量為 -127

3. 舍入規則：

IEEE-754 僅提供了一些舍入規則，但沒有強制說選用某種規則，具體規則的選用由具體實作決定，
以下是一些規則：

Roundings to nearest 就近舍入
- Round to nearest, ties to even：就近舍入，若數字位于中間，則偏向舍入到偶數最低有效位
- Round to nearest, ties away from zero：就近舍入，偏向遠離0，即四舍五入，
Directed roundings 定向舍入
- Round toward 0：朝向0舍入
- Round toward +∞：朝向+∞舍入
- Round toward ?∞：朝向-∞舍入

而在 Java 中，默認舍入模式為 RoundingMode.HALF_EVEN，即 "Round to nearest, ties to even"
該舍入模式也被稱為 "Banker's rounding"，在統計學上這種模式可以使累計的誤差最小

4.手動計算IEEE754值示例

以常見的 0.1 和 float 為例：
\(0.1_{(10)}=0.0001100110011..._{(2)}=(-1)^0*1.100110011...01_{(2)}*2^{(123-127)}\)

因此 IEEE-754，存盤的實際值為 0.10000000149011611938

可見，有效數字其實已經盡最大可能的去保留精度，無奈位數有限，并在最后做了舍入，

5.其他解決方案探討

IEEE-754 浮點數不過是一種標準，它是性能&存盤空間&表示范圍&精度各方面權衡下的一個結果，正如上述和stackexchange所討論的，若對精度或其他方面有著更高的需求，則可以另一套規則定義數值的存盤和計算，

Decimal 便是其中的一種，摘一段網上的介紹

Decimal types work much like floats or fixed-point numbers, but they assume a decimal system, that is, their exponent (implicit or explicit) encodes power-of-10, not power-of-2. A decimal number could, for example, encode a mantissa of 23456 and an exponent of -2, and this would expand to 234.56. Decimals, because the arithmetic isn't hard-wired into the CPU, are slower than floats, but they are ideal for anything that involves decimal numbers and needs those numbers to be exact, with rounding occurring in well-defined spots - financial calculations, scoreboards, etc. Some programming languages have decimal types built into them (e.g. C#), others require libraries to implement them. Note that while decimals can accurately represent non-repeating decimal fractions, their precision isn't any better than that of floating-point numbers; choosing decimals merely means you get exact representations of numbers that can be represented exactly in a decimal system (just like floats can exactly represent binary fractions).

Decimal（十進制）的作業方式與 fixed-point（定點數）非常相似，只是以十進制為基礎（指乘數為10的冪，而非2的冪），例如 234.56=23456*10^(?2) 可以擴展為 23456 與 -2，因為都是整數所以精確存盤，
但 Decimal 并不會就比浮點數精確度高，正如其名十進制，它僅可以精確表示能在十進制中精確表示的數，而十進制中本身就無法精確表示的數，如 \(0.1_{(3)}\)，其依然無法精確保存，

四、Java 中 BigDecimal 實作概述

不可變的，任意精度的有符號十進制數，

因十進制小數對二進制的轉化是不精確的，因此它將 \(原值*10^{(scale)}\) 擴展為整數后，后通過 long intCompat 來存盤擴展后部分，
并在需要真實值時，再計算還原 \(intCompact * 10^{(-scale)}\)

BigDecimal 常見API&情形：

setScale(int newScale, RoundingMode roundingMode)
設定該BigDecimal的小數點后精度位數，若涉及到數值舍入，必須指定舍入規則，否則報錯，
如：保留2位小數，截斷式：.setScale(2, RoundingMode.DOWN)

五、延申

1. 定點數(fixed-point)解決方案

定點數在實作上并不是字面意思固定某位為小數點分別存整數和小數
同Decimal實作一樣，先將原值擴展到到足夠大的整數，并存下scale，以后續還原真實值

2. 各語言情況及解決概覽

https://0.30000000000000004.com

3. 為什么資料庫MYSQL SELECT (0.2+0.1)=0.3 回傳 true？

參考：https://stackoverflow.com/a/55309851/9908241
答：在顯式精確數值計算時，Mysql 可能會使用 Precision Math 計算（ https://dev.mysql.com/doc/refman/8.0/en/precision-math-examples.html ）
即 SELECT (0.1+0.2) = 0.3 或多或少可能以如下方式執行實際查詢:SELECT CAST((0.1 + 0.2) AS DECIMAL(1, 1)) = CAST((0.3) AS DECIMAL(1, 1));

IEEE 754 標準浮點數的精度問題是仍然存在的，以下通過顯式宣告浮點型別可復現：

create table test (f float);
insert into test values (0.1), (0.2);
select sum(f) from test; // 輸出經典 0.30000000447034836

4. 浮點數為什么會這樣設計，為什么exponent需要偏移量

可參考：IEEE 754格式是什么? - wuxinliulei的回答 - 知乎

撰文參考：
- 0.1d相加多次例外展示： https://stackoverflow.com/questions/26120311/why-does-adding-0-1-multiple-times-remain-lossless
- 數值存盤&計算多種解決方案討論： https://softwareengineering.stackexchange.com/questions/167147/why-dont-computers-store-decimal-numbers-as-a-second-whole-number/167151#167151
- 十轉二進制計算教學 How to Convert a Number from Decimal to IEEE 754 Floating Point： https://www.wikihow.com/Convert-a-Number-from-Decimal-to-IEEE-754-Floating-Point-Representation
- 計算IEEE-754全步驟（可自定數字） https://binary-system.base-conversion.ro/convert-real-numbers-from-decimal-system-to-32bit-single-precision-IEEE754-binary-floating-point.php
- CSDN https://blog.csdn.net/weixin_44588495/article/details/97615664
- https://en.wikipedia.org/wiki/IEEE_754
- https://en.wikipedia.org/wiki/Double-precision_floating-point_format
- https://en.wikipedia.org/wiki/Single-precision_floating-point_format
- http://cr.openjdk.java.net/~darcy/Ieee754TerminologyUpdate/2020-04-21/specs/float-terminology-jls.html
- IEEE754 在線轉換網站： https://www.binaryconvert.com/result_float.html
- 十進制-二進制（可小數）在線轉換： https://www.mathsisfun.com/binary-decimal-hexadecimal-converter.html
- https://0.30000000000000004.com

轉載請註明出處，本文鏈接：https://www.uj5u.com/houduan/292845.html

標籤：Java

上一篇：高并發先操作資料庫，還是先操作快取？5 個方案告訴你！

下一篇：Python辦公自動化之檔案合并