Created attachment 43392 [details] Program showing that the single optimization flags don't work as well as O1 My program reduces its runtime from 20s to 5s when using -O1. So I wanted to know which optimization is responsible for that and used the optimizations flags found here: https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html But not even when copy-pasting ALL flags up to -O3 listed there, can I reproduce the speedup to 5s! See the attached file which also contains code comments on how I did compile it. This seems to be a very long-standing bug (5+ years): https://stackoverflow.com/questions/12769173/selecting-gcc-optimisation-flags-equivalent-to-o1 https://stackoverflow.com/questions/20246357/gcc-using-o1-and-spelling-the-o1-options-out-leads-to-different-result-one-w And even after trying to find the difference by using -Q --help=optimizers which showed this diff: 17d16 < -fdelayed-branch 100a100 > -ftree-builtin-call-dce Even when adding -ftree-builtin-call-dce I still don't get the same speedup ?!?! In fact nothing changes... g++ "${O1Flags[@]}" -ftree-builtin-call-dce -std=c++11 optimizeFlags.cpp && ./a.out Tested with: g++ (Debian 7.3.0-3) 7.3.0 g++-8 (Debian 8-20180207-2) 8.0.1 20180207 (experimental) [trunk revision 257435]
This bug becomes more important for the actual real-life example which becomes slower at -O2 compared to -O1! Actually in the earlier attached file you only have to replace the `interleaveZeros` function with this one: unsigned int interleaveTwoZeros( unsigned int n ) { n&= 0x000003ff; n = (n ^ (n << 16)) & 0xFF0000FF; n = (n ^ (n << 8)) & 0x0300F00F; n = (n ^ (n << 4)) & 0x030C30C3; n = (n ^ (n << 2)) & 0x09249249; return n; } I.e. the only difference are slightly different constants, nothing else! The timings: 1234567890 iterations took 19.151s and resulted in 806157809 -O0 1234567890 iterations took 19.1547s and resulted in 1772082360 -O1 1234567890 iterations took 5.69619s and resulted in 2085417644 -O2 1234567890 iterations took 6.21504s and resulted in 32256352 -O3 1234567890 iterations took 6.14414s and resulted in 357018037 Not sure if this is worth another bug. Can reproduce this for the following compiler versions: for GPP in g++-4.9 g++-5 g++-6 g++-7 g++-8; do $GPP --version | head -1 for flag in ' ' -O0 -O1 -O2 -O3; do echo -n "$flag " $GPP $flag -std=c++11 optimizeFlags.cpp && ./a.out done done g++-4.9 (Debian 4.9.4-2) 4.9.4 1234567890 iterations took 19.1979s and resulted in 1918993912 -O0 1234567890 iterations took 19.1785s and resulted in 710267642 -O1 1234567890 iterations took 5.6609s and resulted in 1898524753 -O2 1234567890 iterations took 5.71375s and resulted in 1117037030 -O3 1234567890 iterations took 5.67933s and resulted in 1451088646 g++-5 (Debian 5.5.0-8) 5.5.0 20171010 1234567890 iterations took 19.2387s and resulted in 999898210 -O0 1234567890 iterations took 19.1464s and resulted in 1358121256 -O1 1234567890 iterations took 5.64181s and resulted in 642760018 -O2 1234567890 iterations took 5.65094s and resulted in 191105767 -O3 1234567890 iterations took 5.68849s and resulted in 1555980094 g++-6 (Debian 6.4.0-12) 6.4.0 20180123 1234567890 iterations took 19.1786s and resulted in 1613186065 -O0 1234567890 iterations took 19.2001s and resulted in 424276129 -O1 1234567890 iterations took 5.73263s and resulted in 1828427433 -O2 1234567890 iterations took 6.16005s and resulted in 814826690 -O3 1234567890 iterations took 6.1438s and resulted in 867162058 g++-7 (Debian 7.3.0-3) 7.3.0 1234567890 iterations took 19.1302s and resulted in 1147954921 -O0 1234567890 iterations took 19.1694s and resulted in 734785107 -O1 1234567890 iterations took 5.72652s and resulted in 1133709951 -O2 1234567890 iterations took 6.15633s and resulted in 352136223 -O3 1234567890 iterations took 6.14089s and resulted in 1468150013 g++-8 (Debian 8-20180207-2) 8.0.1 20180207 (experimental) [trunk revision 257435] 1234567890 iterations took 19.1278s and resulted in 694826541 -O0 1234567890 iterations took 19.1454s and resulted in 249938642 -O1 1234567890 iterations took 5.72959s and resulted in 365780913 -O2 1234567890 iterations took 6.20064s and resulted in 2033700921 -O3 1234567890 iterations took 6.12829s and resulted in 1244532281 => seems like this is somehow a regression bug since g++ 6! Actually a mix of -O1 with the additional O2-flags seems to work to reproduce the weird slowdown! g++ -O1 "${O2Flags[@]}" -std=c++11 optimizeFlags.cpp && ./a.out => 6.16161s Actually by bisecting the additional O2-flags this can be traced down to -finline-small-functions ... I will open another bug for this.
(In reply to xyzdr4gon333 from comment #0) > This seems to be a very long-standing bug (5+ years): > https://stackoverflow.com/questions/12769173/selecting-gcc-optimisation- > flags-equivalent-to-o1 > https://stackoverflow.com/questions/20246357/gcc-using-o1-and-spelling-the- > o1-options-out-leads-to-different-result-one-w This is expected, not a bug: https://gcc.gnu.org/wiki/FAQ#optimization-options
.
(In reply to xyzdr4gon333 from comment #1) > Actually by bisecting the additional O2-flags this can be traced down to > -finline-small-functions ... I will open another bug for this. I see you've opened Bug 84328 for this, so I'm closing this one because it's not a bug.
Too bad. Before I have to take a longer look at the assembler code, any quick thoughts about what optimization not available as any single option could lead to the speedup of 4x?
I think you misunderstand. Listing all the individual -fxxx options without -O1 results in NO OPTIMIZATION. The difference you see is due to all the passes enabled by -O1, not by the ones without flags. As I wrote on stackoverflow: If you don't use one of the -O1, -O2, -O3, -Ofast, or -Og optimization options (and not -O0) then no optimization happens at all, so adjusting which optimization passes are active doesn't do anything. To find which optimization pass makes the difference you can turn on -O1 and then disable individual optimization passes until you find the one that makes a difference. i.e. instead of gcc -fxxx -fyyy -fzzz ... use gcc -O1 -fno-xxx -fno-yyy -fno-zzz
Ah, thank you very much! And sorry for misusing the bug tracker out of lack of knowledge.