Bug 65740 - spectacularly bad inlinining decisions with -Os
Summary: spectacularly bad inlinining decisions with -Os
Status: NEW
Alias: None
Product: gcc
Classification: Unclassified
Component: ipa (show other bugs)
Version: 4.8.4
: P3 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords: missed-optimization
Depends on:
Blocks:
 
Reported: 2015-04-11 05:38 UTC by Bill Westfield
Modified: 2015-05-12 13:18 UTC (History)
3 users (show)

See Also:
Host:
Target:
Build:
Known to work: 4.9.2, 5.0
Known to fail: 4.8.4
Last reconfirmed: 2015-04-11 00:00:00


Attachments
preprocessed source code example. (9.27 KB, text/plain)
2015-04-11 05:38 UTC, Bill Westfield
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Bill Westfield 2015-04-11 05:38:05 UTC
Created attachment 35298 [details]
preprocessed source code example.

Compiling a module with 4.8.1 and -Os results in a binary more than triple the size of compiling the same module with 4.3.2, due to bad decisions about inlining a function.
Dumping inline debug info (-fdump-ipa-inline) shows that the compiler is coming up with a negative "badness" for the especially-bad function (move3Servos), and the number is "suspicious" (0xBFFFFFF8) (without understanding what the badness is supposed to be.)

Considering void move3Servos(Servo, float, Servo, float, Servo, float, float) with 87 size
 to be inlined into void loop() in oiOSoul.ino:325
 Estimated growth after inlined into all is +412 insns.
 Estimated badness is -1073741824, frequency 0.01.
 Inlined into void loop() which now has time 351 and size 628,net change of +0.


(also not that that was about the 40th call of move3Servos() from loop() analyzed, each with "growth" ~400 insns, and it now thinks the total size is 628, which is ridiculous.)

Might be a duplicate of https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57218
Except that that seems to describe a more trivial and less awful ocurrence.

Originally noticed withing Arduino (avr-g++), but NOT cpu specific, and not C++ specific.  The attached example is x86 gcc 4.7.2:

/sw/lib/gcc4.7/bin/gcc-fsf-4.7 -c -Os -w foo.ii -v -saveemps -fdump-ipa-inline 
Using built-in specs.
COLLECT_GCC=/sw/lib/gcc4.7/bin/gcc-fsf-4.7
Target: x86_64-apple-darwin11.4.2
Configured with: ../gcc-4.7.2/configure --prefix=/sw --prefix=/sw/lib/gcc4.7 --mandir=/sw/share/man --infodir=/sw/lib/gcc4.7/info --enable-languages=c,c++,fortran,lto,objc,obj-c++,java --with-gmp=/sw --with-libiconv-prefix=/sw --with-ppl=/sw --with-cloog=/sw --with-mpc=/sw --with-system-zlib --x-includes=/usr/X11R6/include --x-libraries=/usr/X11R6/lib --program-suffix=-fsf-4.7 --enable-cloog-backend=isl
Thread model: posix
gcc version 4.7.2 (GCC) 
COLLECT_GCC_OPTIONS='-mmacosx-version-min=10.9.4' '-c' '-Os' '-w' '-v' '-save-temps' '-fdump-ipa-inline' '-mtune=core2'
 /sw/lib/gcc4.7/libexec/gcc/x86_64-apple-darwin11.4.2/4.7.2/cc1plus -fpreprocessed foo.ii -fPIC -quiet -dumpbase foo.ii -mmacosx-version-min=10.9.4 -mtune=core2 -auxbase foo -Os -w -version -fdump-ipa-inline -o foo.s
GNU C++ (GCC) version 4.7.2 (x86_64-apple-darwin11.4.2)
        compiled by GNU C version 4.7.2, GMP version 5.1.0, MPFR version 3.1.1, MPC version 1.0.1
warning: GMP header version 5.1.0 differs from library version 5.1.1.
warning: MPFR header version 3.1.1 differs from library version 3.1.2.
GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072
GNU C++ (GCC) version 4.7.2 (x86_64-apple-darwin11.4.2)
        compiled by GNU C version 4.7.2, GMP version 5.1.0, MPFR version 3.1.1, MPC version 1.0.1
warning: GMP header version 5.1.0 differs from library version 5.1.1.
warning: MPFR header version 3.1.1 differs from library version 3.1.2.
GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072
Compiler executable checksum: 712698abe851e71eccb1a409a1351965
COLLECT_GCC_OPTIONS='-mmacosx-version-min=10.9.4' '-c' '-Os' '-w' '-v' '-save-temps' '-fdump-ipa-inline' '-mtune=core2'
 as -arch x86_64 -force_cpusubtype_ALL -o foo.o foo.s
COMPILER_PATH=/sw/lib/gcc4.7/libexec/gcc/x86_64-apple-darwin11.4.2/4.7.2/:/sw/lib/gcc4.7/libexec/gcc/x86_64-apple-darwin11.4.2/4.7.2/:/sw/lib/gcc4.7/libexec/gcc/x86_64-apple-darwin11.4.2/:/sw/lib/gcc4.7/lib/gcc/x86_64-apple-darwin11.4.2/4.7.2/:/sw/lib/gcc4.7/lib/gcc/x86_64-apple-darwin11.4.2/
LIBRARY_PATH=/sw/lib/gcc4.7/lib/gcc/x86_64-apple-darwin11.4.2/4.7.2/:/sw/lib/gcc4.7/lib/gcc/x86_64-apple-darwin11.4.2/4.7.2/../../../:/usr/lib/
COLLECT_GCC_OPTIONS='-mmacosx-version-min=10.9.4' '-c' '-Os' '-w' '-v' '-save-temps' '-fdump-ipa-inline' '-mtune=core2'
BillW-MacOSX-2<10422> size foo.o
__TEXT  __DATA  __OBJC  others  dec     hex
29936   209     0       0       30145   75c1

#(without inlining)
BillW-MacOSX-2<10423> /sw/lib/gcc4.7/bin/gcc-fsf-4.7 -c -Os -w foo.ii -fno-inline-small-functions
BillW-MacOSX-2<10424> size foo.o
__TEXT  __DATA  __OBJC  others  dec     hex
8725    209     0       0       8934    22e6
Comment 1 Markus Trippelsdorf 2015-04-11 06:54:15 UTC
Confirmed with 4.8.4. 
4.9.2, 5.0 are fine.
Comment 2 Jan Hubicka 2015-04-13 09:06:28 UTC
Negative badness values are expected (it is really a negation of goodness). Independently on that the inliner should skip inlining when it thinks code size will grow:

Considering void move3Servos(Servo, float, Servo, float, Servo, float, float) with 87 size
 to be inlined into void loop() in oiOSoul.ino:325
 Estimated growth after inlined into all is +412 insns.
 Estimated badness is -1073741824, frequency 0.01.
 Inlined into void loop() which now has time 351 and size 628,net change of +0.

Here it thinks the size increase is 0 (net change). Size estimates are context sensitie and for this particular context it apparently ended up being 0.

GCC 4.8 (unlike mainline) seem to think that a lot of code is optimized out under various conditions on parameters:

Inline summary for void move3Servos(Servo, float, Servo, float, Servo, float, float)/90 inlinable
  self time:       4586
  global time:     0
  self size:       87
  global size:     0
  self stack:      0
  global stack:    0
    size:0.000000, time:0.000000, predicate:(true)
    size:2.000000, time:0.000000, predicate:(not inlined)
    size:2.000000, time:2.000000, predicate:(op1 changed)
    size:2.000000, time:0.990000, predicate:(op1 changed) && (op1 not constant)
    size:2.000000, time:2.000000, predicate:(op3 changed) && (op1 < 6.0e+2 || op1 > 2.4e+3 || op1 not constant) && (op1 < 6.0e+2 || op1 not constant)
    size:2.000000, time:0.990000, predicate:(op3 changed) && (op3 not constant) && (op1 < 6.0e+2 || op1 > 2.4e+3 || op1 not constant) && (op1 < 6.0e+2 || op1 not constant)
    size:2.000000, time:2.000000, predicate:(op5 changed) && (op3 < 6.0e+2 || op3 > 2.4e+3 || op3 not constant) && (op3 < 6.0e+2 || op3 not constant) && (op1 < 6.0e+2 || op1 > 2.4e+3 || op1 not constant) && (op1 < 6.0e+2 || op1 not constant)
    size:2.000000, time:0.990000, predicate:(op5 changed) && (op5 not constant) && (op3 < 6.0e+2 || op3 > 2.4e+3 || op3 not constant) && (op3 < 6.0e+2 || op3 not constant) && (op1 < 6.0e+2 || op1 > 2.4e+3 || op1 not constant) && (op1 < 6.0e+2 || op1 not constant)
    size:2.000000, time:2.000000, predicate:(op6 changed) && (op5 < 6.0e+2 || op5 > 2.4e+3 || op5 not constant) && (op5 < 6.0e+2 || op5 not constant) && (op3 < 6.0e+2 || op3 > 2.4e+3 || op3 not constant) && (op3 < 6.0e+2 || op3 not constant) && (op1 < 6.0e+2 || op1 > 2.4e+3 || op1 not constant) && (op1 < 6.0e+2 || op1 not constant)
    size:2.000000, time:1.450000, predicate:(op6 changed) && (op6 not constant) && (op5 < 6.0e+2 || op5 > 2.4e+3 || op5 not constant) && (op5 < 6.0e+2 || op5 not constant) && (op3 < 6.0e+2 || op3 > 2.4e+3 || op3 not constant) && (op3 < 6.0e+2 || op3 not constant) && (op1 < 6.0e+2 || op1 > 2.4e+3 || op1 not constant) && (op1 < 6.0e+2 || op1 not constant)
    size:48.000000, time:1980.374000, predicate:(op6 < 0.0 || op6 > 5.0e+0 || op6 not constant) && (op6 < 0.0 || op6 not constant) && (op5 < 6.0e+2 || op5 > 2.4e+3 || op5 not constant) && (op5 < 6.0e+2 || op5 not constant) && (op3 < 6.0e+2 || op3 > 2.4e+3 || op3 not constant) && (op3 < 6.0e+2 || op3 not constant) && (op1 < 6.0e+2 || op1 > 2.4e+3 || op1 not constant) && (op1 < 6.0e+2 || op1 not constant)
    size:1.000000, time:1.982000, predicate:(op6 < 0.0 || op6 > 5.0e+0 || op6 not constant) && (op6 < 0.0 || op6 not constant) && (op5 < 6.0e+2 || op5 > 2.4e+3 || op5 not constant) && (op5 < 6.0e+2 || op5 not constant) && (op3 < 6.0e+2 || op3 > 2.4e+3 || op3 not constant) && (op3 < 6.0e+2 || op3 not constant) && (op1 < 6.0e+2 || op1 > 2.4e+3 || op1 not constant) && (not inlined

I will verify tomorrow if that match reality at all.
Comment 3 Denis Vlasenko 2015-05-12 13:18:01 UTC
Bug 66122 contains more information, and a recipe how to find many examples using linux kernel build.

For one, this is not limited to -Os (it does happen with -Os way more easily).