Bug 47255 - Missed CSE optimization with inline functions, and __attribute__((const))
Summary: Missed CSE optimization with inline functions, and __attribute__((const))
Status: NEW
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 4.5.2
: P3 enhancement
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords: missed-optimization
: 114305 116333 (view as bug list)
Depends on:
Blocks:
 
Reported: 2011-01-11 02:35 UTC by Seth Robertson
Modified: 2024-08-11 15:17 UTC (History)
3 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2021-11-27 00:00:00


Attachments
Test program exhibiting missed optimization (206 bytes, text/x-c)
2011-01-11 02:35 UTC, Seth Robertson
Details
Revised test program exhibiting missed optimization (212 bytes, text/x-c)
2011-01-11 18:40 UTC, Seth Robertson
Details
Correct version of revised test program exhibiting missed optimization (223 bytes, text/x-c)
2011-02-07 16:50 UTC, Seth Robertson
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Seth Robertson 2011-01-11 02:35:43 UTC
Created attachment 22942 [details]
Test program exhibiting missed optimization

Using gcc 4.5.2 on gentoo (or gcc 4.1.2 on RHEL 5.5), gcc fails to optimize a case where a const function is called in a const way inside a loop, if the const function is elgible for inlining.

The test case should print three lines if it succeeds.  In fact, the "World" line is printed once per loop iteration.

How-To-Repeat:

Compile the attach test-case with -O3 or "-O -finline-small-functions -finline-functions" -- more than three lines are printed.  Compile with -O2, the correct number of lines (three) for full optimization are printed.

Release:
----------------------------------------------------------------------
Using built-in specs.
Target: x86_64-redhat-linux
Configured with: ../configure --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --enable-shared --enable-threads=posix --enable-checking=release --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-libgcj-multifile --enable-languages=c,c++,objc,obj-c++,java,fortran,ada --enable-java-awt=gtk --disable-dssi --enable-plugin --with-java-home=/usr/lib/jvm/java-1.4.2-gcj-1.4.2.0/jre --with-cpu=generic --host=x86_64-redhat-linux
Thread model: posix
gcc version 4.1.2 20080704 (Red Hat 4.1.2-48)
----------------------------------------------------------------------
Using built-in specs.
COLLECT_GCC=/usr/x86_64-pc-linux-gnu/gcc-bin/4.5.2/gcc
COLLECT_LTO_WRAPPER=/usr/libexec/gcc/x86_64-pc-linux-gnu/4.5.2/lto-wrapper
Target: x86_64-pc-linux-gnu
Configured with: /n/startide/proj/startide/var-tmp/portage/sys-devel/gcc-4.5.2/work/gcc-4.5.2/configure --prefix=/usr --bindir=/usr/x86_64-pc-linux-gnu/gcc-bin/4.5.2 --includedir=/usr/lib/gcc/x86_64-pc-linux-gnu/4.5.2/include --datadir=/usr/share/gcc-data/x86_64-pc-linux-gnu/4.5.2 --mandir=/usr/share/gcc-data/x86_64-pc-linux-gnu/4.5.2/man --infodir=/usr/share/gcc-data/x86_64-pc-linux-gnu/4.5.2/info --with-gxx-include-dir=/usr/lib/gcc/x86_64-pc-linux-gnu/4.5.2/include/g++-v4 --host=x86_64-pc-linux-gnu --build=x86_64-pc-linux-gnu --disable-altivec --disable-fixed-point --without-ppl --without-cloog --disable-lto --enable-nls --without-included-gettext --with-system-zlib --disable-werror --enable-secureplt --enable-multilib --enable-libmudflap --disable-libssp --enable-libgomp --enable-cld --with-python-dir=/share/gcc-data/x86_64-pc-linux-gnu/4.5.2/python --enable-checking=release --disable-libgcj --enable-languages=c,c++ --enable-shared --enable-threads=posix --enable-__cxa_atexit --enable-clocale=gnu --with-bugurl=http://bugs.gentoo.org/ --with-pkgversion='Gentoo 4.5.2 p1.0, pie-0.4.5'
Thread model: posix
gcc version 4.5.2 (Gentoo 4.5.2 p1.0, pie-0.4.5)
----------------------------------------------------------------------
Comment 1 Richard Biener 2011-01-11 12:38:09 UTC
There is a duplicate somewhere for this bug.  Inlining happens before any
CSE is done which is what you see.  If you remove the side-effect the
multiplications will be CSEd instead after inlining.
Comment 2 Seth Robertson 2011-01-11 18:40:36 UTC
Created attachment 22946 [details]
Revised test program exhibiting missed optimization

Well, my point is that even if I remove the obvious side-effect, gcc cannot take advantage of the knowledge I am providing it.  If instead of a printf() I make a system call which will return a (what I know/guarantee is, but is not declared as, a) ((const)) value (like getgid(), getuid(), or geteuid()) and use it in the expression, it is likewise not optimized.  I'm attaching a second test program using these system calls.  Test with `strace ./x 2>&1 | grep get | wc -l`.  When you compile w/`gcc -O2` you should get the number 12.  When you compile w/`gcc -O3` or with "-finline-functions -finline-small-functions", you get 21 (ie. it did more work than necessary).  Of course without any optimization you get the maximum of 30.

I am perfectly willing to believe that inlining happens first and throws away the information such that it will miss the opportunity to CSE.  I also believe that if gcc could intuit that it was const (e.g. pure multiplication operations) that it wouldn't duplicate the work. I'm just saying this is a missed optimization.

There could be a dup bug somewhere, but none of the tickets with "pure" or "const" and "__attributes__" subjects discuss it.  Nor "inline' and 'cse'.
Comment 3 Richard Biener 2011-01-12 11:10:22 UTC
Sure, it is a missed optimization.  It's even "easy" to fix when you accept
a general compile-time slowdown (just schedule some CSE passes before inlining).
When you want to avoid the compile-time slowdown then it's not so easy
(it's a usual trade-off with the case where there is no CSE opportunity but
earlier inlining would result in better code).

You can force GCC to not inline the function with using
__attribute__((const,noinline)) (but it of course will then be not
inlined at all).
Comment 4 Paolo Bonzini 2011-02-07 10:57:54 UTC
I think this is invalid.  const attributes are a hint to GCC regarding parts of the program that it cannot see, but IMHO the const/pure/nothrow on a function that is static and a leaf should have no effect on code generation (since GCC can infer just as much).

So, in the first example GCC is "fixing" a wrong usage of const on part of the program.

In the second example attached, there is no use of syscalls and GCC properly optimizes out square2 and square3.  If syscalls were added, the bug would be about missed attributes on the syscalls.  BTW, getgid, getuid etc. are pure but not const.
Comment 5 Seth Robertson 2011-02-07 16:50:45 UTC
Created attachment 23265 [details]
Correct version of revised test program exhibiting missed optimization

Hmm.  I somehow managed to not attach the correct second example, I must have changed the program between the attach and submit steps or something.obsoleting the incorrect test progam

But I as a developer have additional information not known to the compiler.  I know that getgid and getuid are const FOR ME because I'm not going to be running setuid and friends and those are not changible through external force through standard APIs (unlike, say, current priority).  We have a way to provide this sort of information to the compiler.  Why shouldn't the compiler take advantage of the information I provide?
Comment 6 Andrew Pinski 2024-03-11 12:04:29 UTC
*** Bug 114305 has been marked as a duplicate of this bug. ***
Comment 7 Andrew Pinski 2024-08-11 15:17:25 UTC
*** Bug 116333 has been marked as a duplicate of this bug. ***