Consider the following code: int main (void) { volatile double x = 0.0; volatile _Decimal128 i = x; return i != 0; } On x86_64: zira:~> gcc-snapshot -O2 tst.c -o tst zira:~> ll --human-readable tst -rwxr-xr-x 1 vinc17 vinc17 3.3M 2020-07-12 01:44:05 tst* With _Decimal64 instead of _Decimal128, tst is a bit smaller: 2.9M Tested with gcc (Debian 20200616-1) 11.0.0 20200616 (experimental) [master revision beaf12b49ae:aed76232726:b70eeb248efe2b3e9bdb5e26b490e3d8aa07022d]
_Decimal128 and _Decimal64 are all software based so there are functions which implement everything. The main thing which is actually taking up the space is the functions for the conversions. The functions for the conversions are all stored in one .o file: .text 0x00000000004005d0 0x10567 /home/apinski/upstream-gcc/lib/gcc/x86_64-pc-linux-gnu/11.0.0/libgcc.a(bid_binarydecimal.o) 0x00000000004005d0 __bid32_to_binary32 0x0000000000400bf0 __bid64_to_binary32 0x0000000000401460 __bid128_to_binary32 0x0000000000401f10 __bid32_to_binary64 0x0000000000402460 __bid64_to_binary64 0x0000000000402b60 __bid128_to_binary64 0x0000000000403680 __bid32_to_binary80 0x0000000000403e30 __bid64_to_binary80 0x0000000000404870 __bid128_to_binary80 0x0000000000405640 __bid32_to_binary128 0x0000000000405c10 __bid64_to_binary128 0x0000000000406430 __bid128_to_binary128 0x0000000000406fc0 __binary32_to_bid32 0x0000000000407650 __binary64_to_bid32 0x0000000000408100 __binary80_to_bid32 0x0000000000408be0 __binary128_to_bid32 0x0000000000409990 __binary32_to_bid64 0x0000000000409fb0 __binary64_to_bid64 0x000000000040a730 __binary80_to_bid64 0x000000000040b280 __binary128_to_bid64 0x000000000040c040 __binary32_to_bid128 0x000000000040cfa0 __binary64_to_bid128 0x000000000040e160 __binary80_to_bid128 0x000000000040f330 __binary128_to_bid128 Since most of the time when using _Decimal128/_Decimal64, you will be using many of these files, it won't change the size overall to have these functions in different .o file.
IMHO, the implementation is highly inefficient. Even with all these functions (which are similar, thus should share most code), 3 MB seems a lot to me. In particular, some user complained that the size of the GNU MPFR library (which now uses such conversions) has been multiplied by 5. This is even worse with the GCC 11 snapshot, using ./configure CC=gcc-snapshot CFLAGS="-O2": 663880 with --disable-decimal-float 4836016 with --enable-decimal-float 1914376 with --enable-decimal-float and hardcoded values instead of conversions 668240 with --enable-decimal-float and even more hardcoded values Note that this is MPFR that does the binary-to-decimal conversion itself (MPFR uses _Decimal128 operations just for the format conversion, to generate either NaN/±Inf/±0 from a double or some regular value from a decimal character sequence). If MPFR can do this conversion within its few hundreds of KB[*], I don't see why this can't be done by GCC. [*] This does not include the small part of GMP on which MPFR is based, but this includes much unrelated code, for all the functions MPFR implements.
The code is all located in libgcc/config/libbid/bid_binarydecimal.c It looks to be precalulated tables which increase the size.
(In reply to Andrew Pinski from comment #3) > The code is all located in libgcc/config/libbid/bid_binarydecimal.c > > It looks to be precalulated tables which increase the size. That is the code size is simple, just the tables are huge and go into the readonly section (text section).
I expect there's a speed/space trade-off here. You can use large tables for the conversions with less computation, or small tables with more computation (and the BID implementation in libgcc uses large tables). The DPD implementation avoids the whole question of how to convert efficiently between decimal and binary FP by doing such conversions via strings (which may end up using large tables or less efficient algorithms in the libc code used for binary FP / string conversions; if you know the source and target formats in advance, there's more scope for statically determined bounds on how much internal precision is needed to get correctly rounded results for all inputs of the given floating-point format).