Created attachment 35298 [details] preprocessed source code example. Compiling a module with 4.8.1 and -Os results in a binary more than triple the size of compiling the same module with 4.3.2, due to bad decisions about inlining a function. Dumping inline debug info (-fdump-ipa-inline) shows that the compiler is coming up with a negative "badness" for the especially-bad function (move3Servos), and the number is "suspicious" (0xBFFFFFF8) (without understanding what the badness is supposed to be.) Considering void move3Servos(Servo, float, Servo, float, Servo, float, float) with 87 size to be inlined into void loop() in oiOSoul.ino:325 Estimated growth after inlined into all is +412 insns. Estimated badness is -1073741824, frequency 0.01. Inlined into void loop() which now has time 351 and size 628,net change of +0. (also not that that was about the 40th call of move3Servos() from loop() analyzed, each with "growth" ~400 insns, and it now thinks the total size is 628, which is ridiculous.) Might be a duplicate of https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57218 Except that that seems to describe a more trivial and less awful ocurrence. Originally noticed withing Arduino (avr-g++), but NOT cpu specific, and not C++ specific. The attached example is x86 gcc 4.7.2: /sw/lib/gcc4.7/bin/gcc-fsf-4.7 -c -Os -w foo.ii -v -saveemps -fdump-ipa-inline Using built-in specs. COLLECT_GCC=/sw/lib/gcc4.7/bin/gcc-fsf-4.7 Target: x86_64-apple-darwin11.4.2 Configured with: ../gcc-4.7.2/configure --prefix=/sw --prefix=/sw/lib/gcc4.7 --mandir=/sw/share/man --infodir=/sw/lib/gcc4.7/info --enable-languages=c,c++,fortran,lto,objc,obj-c++,java --with-gmp=/sw --with-libiconv-prefix=/sw --with-ppl=/sw --with-cloog=/sw --with-mpc=/sw --with-system-zlib --x-includes=/usr/X11R6/include --x-libraries=/usr/X11R6/lib --program-suffix=-fsf-4.7 --enable-cloog-backend=isl Thread model: posix gcc version 4.7.2 (GCC) COLLECT_GCC_OPTIONS='-mmacosx-version-min=10.9.4' '-c' '-Os' '-w' '-v' '-save-temps' '-fdump-ipa-inline' '-mtune=core2' /sw/lib/gcc4.7/libexec/gcc/x86_64-apple-darwin11.4.2/4.7.2/cc1plus -fpreprocessed foo.ii -fPIC -quiet -dumpbase foo.ii -mmacosx-version-min=10.9.4 -mtune=core2 -auxbase foo -Os -w -version -fdump-ipa-inline -o foo.s GNU C++ (GCC) version 4.7.2 (x86_64-apple-darwin11.4.2) compiled by GNU C version 4.7.2, GMP version 5.1.0, MPFR version 3.1.1, MPC version 1.0.1 warning: GMP header version 5.1.0 differs from library version 5.1.1. warning: MPFR header version 3.1.1 differs from library version 3.1.2. GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072 GNU C++ (GCC) version 4.7.2 (x86_64-apple-darwin11.4.2) compiled by GNU C version 4.7.2, GMP version 5.1.0, MPFR version 3.1.1, MPC version 1.0.1 warning: GMP header version 5.1.0 differs from library version 5.1.1. warning: MPFR header version 3.1.1 differs from library version 3.1.2. GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072 Compiler executable checksum: 712698abe851e71eccb1a409a1351965 COLLECT_GCC_OPTIONS='-mmacosx-version-min=10.9.4' '-c' '-Os' '-w' '-v' '-save-temps' '-fdump-ipa-inline' '-mtune=core2' as -arch x86_64 -force_cpusubtype_ALL -o foo.o foo.s COMPILER_PATH=/sw/lib/gcc4.7/libexec/gcc/x86_64-apple-darwin11.4.2/4.7.2/:/sw/lib/gcc4.7/libexec/gcc/x86_64-apple-darwin11.4.2/4.7.2/:/sw/lib/gcc4.7/libexec/gcc/x86_64-apple-darwin11.4.2/:/sw/lib/gcc4.7/lib/gcc/x86_64-apple-darwin11.4.2/4.7.2/:/sw/lib/gcc4.7/lib/gcc/x86_64-apple-darwin11.4.2/ LIBRARY_PATH=/sw/lib/gcc4.7/lib/gcc/x86_64-apple-darwin11.4.2/4.7.2/:/sw/lib/gcc4.7/lib/gcc/x86_64-apple-darwin11.4.2/4.7.2/../../../:/usr/lib/ COLLECT_GCC_OPTIONS='-mmacosx-version-min=10.9.4' '-c' '-Os' '-w' '-v' '-save-temps' '-fdump-ipa-inline' '-mtune=core2' BillW-MacOSX-2<10422> size foo.o __TEXT __DATA __OBJC others dec hex 29936 209 0 0 30145 75c1 #(without inlining) BillW-MacOSX-2<10423> /sw/lib/gcc4.7/bin/gcc-fsf-4.7 -c -Os -w foo.ii -fno-inline-small-functions BillW-MacOSX-2<10424> size foo.o __TEXT __DATA __OBJC others dec hex 8725 209 0 0 8934 22e6
Confirmed with 4.8.4. 4.9.2, 5.0 are fine.
Negative badness values are expected (it is really a negation of goodness). Independently on that the inliner should skip inlining when it thinks code size will grow: Considering void move3Servos(Servo, float, Servo, float, Servo, float, float) with 87 size to be inlined into void loop() in oiOSoul.ino:325 Estimated growth after inlined into all is +412 insns. Estimated badness is -1073741824, frequency 0.01. Inlined into void loop() which now has time 351 and size 628,net change of +0. Here it thinks the size increase is 0 (net change). Size estimates are context sensitie and for this particular context it apparently ended up being 0. GCC 4.8 (unlike mainline) seem to think that a lot of code is optimized out under various conditions on parameters: Inline summary for void move3Servos(Servo, float, Servo, float, Servo, float, float)/90 inlinable self time: 4586 global time: 0 self size: 87 global size: 0 self stack: 0 global stack: 0 size:0.000000, time:0.000000, predicate:(true) size:2.000000, time:0.000000, predicate:(not inlined) size:2.000000, time:2.000000, predicate:(op1 changed) size:2.000000, time:0.990000, predicate:(op1 changed) && (op1 not constant) size:2.000000, time:2.000000, predicate:(op3 changed) && (op1 < 6.0e+2 || op1 > 2.4e+3 || op1 not constant) && (op1 < 6.0e+2 || op1 not constant) size:2.000000, time:0.990000, predicate:(op3 changed) && (op3 not constant) && (op1 < 6.0e+2 || op1 > 2.4e+3 || op1 not constant) && (op1 < 6.0e+2 || op1 not constant) size:2.000000, time:2.000000, predicate:(op5 changed) && (op3 < 6.0e+2 || op3 > 2.4e+3 || op3 not constant) && (op3 < 6.0e+2 || op3 not constant) && (op1 < 6.0e+2 || op1 > 2.4e+3 || op1 not constant) && (op1 < 6.0e+2 || op1 not constant) size:2.000000, time:0.990000, predicate:(op5 changed) && (op5 not constant) && (op3 < 6.0e+2 || op3 > 2.4e+3 || op3 not constant) && (op3 < 6.0e+2 || op3 not constant) && (op1 < 6.0e+2 || op1 > 2.4e+3 || op1 not constant) && (op1 < 6.0e+2 || op1 not constant) size:2.000000, time:2.000000, predicate:(op6 changed) && (op5 < 6.0e+2 || op5 > 2.4e+3 || op5 not constant) && (op5 < 6.0e+2 || op5 not constant) && (op3 < 6.0e+2 || op3 > 2.4e+3 || op3 not constant) && (op3 < 6.0e+2 || op3 not constant) && (op1 < 6.0e+2 || op1 > 2.4e+3 || op1 not constant) && (op1 < 6.0e+2 || op1 not constant) size:2.000000, time:1.450000, predicate:(op6 changed) && (op6 not constant) && (op5 < 6.0e+2 || op5 > 2.4e+3 || op5 not constant) && (op5 < 6.0e+2 || op5 not constant) && (op3 < 6.0e+2 || op3 > 2.4e+3 || op3 not constant) && (op3 < 6.0e+2 || op3 not constant) && (op1 < 6.0e+2 || op1 > 2.4e+3 || op1 not constant) && (op1 < 6.0e+2 || op1 not constant) size:48.000000, time:1980.374000, predicate:(op6 < 0.0 || op6 > 5.0e+0 || op6 not constant) && (op6 < 0.0 || op6 not constant) && (op5 < 6.0e+2 || op5 > 2.4e+3 || op5 not constant) && (op5 < 6.0e+2 || op5 not constant) && (op3 < 6.0e+2 || op3 > 2.4e+3 || op3 not constant) && (op3 < 6.0e+2 || op3 not constant) && (op1 < 6.0e+2 || op1 > 2.4e+3 || op1 not constant) && (op1 < 6.0e+2 || op1 not constant) size:1.000000, time:1.982000, predicate:(op6 < 0.0 || op6 > 5.0e+0 || op6 not constant) && (op6 < 0.0 || op6 not constant) && (op5 < 6.0e+2 || op5 > 2.4e+3 || op5 not constant) && (op5 < 6.0e+2 || op5 not constant) && (op3 < 6.0e+2 || op3 > 2.4e+3 || op3 not constant) && (op3 < 6.0e+2 || op3 not constant) && (op1 < 6.0e+2 || op1 > 2.4e+3 || op1 not constant) && (not inlined I will verify tomorrow if that match reality at all.
Bug 66122 contains more information, and a recipe how to find many examples using linux kernel build. For one, this is not limited to -Os (it does happen with -Os way more easily).