Created attachment 22942 [details] Test program exhibiting missed optimization Using gcc 4.5.2 on gentoo (or gcc 4.1.2 on RHEL 5.5), gcc fails to optimize a case where a const function is called in a const way inside a loop, if the const function is elgible for inlining. The test case should print three lines if it succeeds. In fact, the "World" line is printed once per loop iteration. How-To-Repeat: Compile the attach test-case with -O3 or "-O -finline-small-functions -finline-functions" -- more than three lines are printed. Compile with -O2, the correct number of lines (three) for full optimization are printed. Release: ---------------------------------------------------------------------- Using built-in specs. Target: x86_64-redhat-linux Configured with: ../configure --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --enable-shared --enable-threads=posix --enable-checking=release --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-libgcj-multifile --enable-languages=c,c++,objc,obj-c++,java,fortran,ada --enable-java-awt=gtk --disable-dssi --enable-plugin --with-java-home=/usr/lib/jvm/java-1.4.2-gcj-1.4.2.0/jre --with-cpu=generic --host=x86_64-redhat-linux Thread model: posix gcc version 4.1.2 20080704 (Red Hat 4.1.2-48) ---------------------------------------------------------------------- Using built-in specs. COLLECT_GCC=/usr/x86_64-pc-linux-gnu/gcc-bin/4.5.2/gcc COLLECT_LTO_WRAPPER=/usr/libexec/gcc/x86_64-pc-linux-gnu/4.5.2/lto-wrapper Target: x86_64-pc-linux-gnu Configured with: /n/startide/proj/startide/var-tmp/portage/sys-devel/gcc-4.5.2/work/gcc-4.5.2/configure --prefix=/usr --bindir=/usr/x86_64-pc-linux-gnu/gcc-bin/4.5.2 --includedir=/usr/lib/gcc/x86_64-pc-linux-gnu/4.5.2/include --datadir=/usr/share/gcc-data/x86_64-pc-linux-gnu/4.5.2 --mandir=/usr/share/gcc-data/x86_64-pc-linux-gnu/4.5.2/man --infodir=/usr/share/gcc-data/x86_64-pc-linux-gnu/4.5.2/info --with-gxx-include-dir=/usr/lib/gcc/x86_64-pc-linux-gnu/4.5.2/include/g++-v4 --host=x86_64-pc-linux-gnu --build=x86_64-pc-linux-gnu --disable-altivec --disable-fixed-point --without-ppl --without-cloog --disable-lto --enable-nls --without-included-gettext --with-system-zlib --disable-werror --enable-secureplt --enable-multilib --enable-libmudflap --disable-libssp --enable-libgomp --enable-cld --with-python-dir=/share/gcc-data/x86_64-pc-linux-gnu/4.5.2/python --enable-checking=release --disable-libgcj --enable-languages=c,c++ --enable-shared --enable-threads=posix --enable-__cxa_atexit --enable-clocale=gnu --with-bugurl=http://bugs.gentoo.org/ --with-pkgversion='Gentoo 4.5.2 p1.0, pie-0.4.5' Thread model: posix gcc version 4.5.2 (Gentoo 4.5.2 p1.0, pie-0.4.5) ----------------------------------------------------------------------
There is a duplicate somewhere for this bug. Inlining happens before any CSE is done which is what you see. If you remove the side-effect the multiplications will be CSEd instead after inlining.
Created attachment 22946 [details] Revised test program exhibiting missed optimization Well, my point is that even if I remove the obvious side-effect, gcc cannot take advantage of the knowledge I am providing it. If instead of a printf() I make a system call which will return a (what I know/guarantee is, but is not declared as, a) ((const)) value (like getgid(), getuid(), or geteuid()) and use it in the expression, it is likewise not optimized. I'm attaching a second test program using these system calls. Test with `strace ./x 2>&1 | grep get | wc -l`. When you compile w/`gcc -O2` you should get the number 12. When you compile w/`gcc -O3` or with "-finline-functions -finline-small-functions", you get 21 (ie. it did more work than necessary). Of course without any optimization you get the maximum of 30. I am perfectly willing to believe that inlining happens first and throws away the information such that it will miss the opportunity to CSE. I also believe that if gcc could intuit that it was const (e.g. pure multiplication operations) that it wouldn't duplicate the work. I'm just saying this is a missed optimization. There could be a dup bug somewhere, but none of the tickets with "pure" or "const" and "__attributes__" subjects discuss it. Nor "inline' and 'cse'.
Sure, it is a missed optimization. It's even "easy" to fix when you accept a general compile-time slowdown (just schedule some CSE passes before inlining). When you want to avoid the compile-time slowdown then it's not so easy (it's a usual trade-off with the case where there is no CSE opportunity but earlier inlining would result in better code). You can force GCC to not inline the function with using __attribute__((const,noinline)) (but it of course will then be not inlined at all).
I think this is invalid. const attributes are a hint to GCC regarding parts of the program that it cannot see, but IMHO the const/pure/nothrow on a function that is static and a leaf should have no effect on code generation (since GCC can infer just as much). So, in the first example GCC is "fixing" a wrong usage of const on part of the program. In the second example attached, there is no use of syscalls and GCC properly optimizes out square2 and square3. If syscalls were added, the bug would be about missed attributes on the syscalls. BTW, getgid, getuid etc. are pure but not const.
Created attachment 23265 [details] Correct version of revised test program exhibiting missed optimization Hmm. I somehow managed to not attach the correct second example, I must have changed the program between the attach and submit steps or something.obsoleting the incorrect test progam But I as a developer have additional information not known to the compiler. I know that getgid and getuid are const FOR ME because I'm not going to be running setuid and friends and those are not changible through external force through standard APIs (unlike, say, current priority). We have a way to provide this sort of information to the compiler. Why shouldn't the compiler take advantage of the information I provide?
*** Bug 114305 has been marked as a duplicate of this bug. ***
*** Bug 116333 has been marked as a duplicate of this bug. ***