[Bug target/56200] queens benchmark is faster with -O0 than with any other optimization level
hjl.tools at gmail dot com
gcc-bugzilla@gcc.gnu.org
Tue Feb 5 23:51:00 GMT 2013
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56200
H.J. Lu <hjl.tools at gmail dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |areg.melikadamyan at gmail
| |dot com
--- Comment #5 from H.J. Lu <hjl.tools at gmail dot com> 2013-02-05 23:50:35 UTC ---
Optimized alignments are enabled for -O2 and above. For -O2, there are:
.p2align 4,,10
.p2align 3
.L19:
cmpl file(,%rbx,4), %ebp
jg .L18
cmpl 0(%r13,%rbx,4), %ebp
jg .L18
cmpl (%r12), %ebp
jle .L22
.p2align 4,,10
.p2align 3
.L18:
and generate
400ab6: 66 2e 0f 1f 84 00 00 00 00 00 nopw %cs:0x0(%rax,%rax,1)
400ac0: 3b 2c 9d a0 1a 60 00 cmp 0x601aa0(,%rbx,4),%ebp
400ac7: 7f 17 jg 400ae0 <find+0x70>
400ac9: 41 3b 6c 9d 00 cmp 0x0(%r13,%rbx,4),%ebp
400ace: 7f 10 jg 400ae0 <find+0x70>
400ad0: 41 3b 2c 24 cmp (%r12),%ebp
400ad4: 7e 32 jle 400b08 <find+0x98>
400ad6: 66 2e 0f 1f 84 00 00 00 00 00 nopw %cs:0x0(%rax,%rax,1)
Branch Predict Unit fetches 32-byte at a time. There are 3 back-to-back
fused cmp/jcc instructions in 32-byte window, which causes misprediction.
We can add a nop after the first cmp/jcc to avoid back-to-back cmp/jccs.
More information about the Gcc-bugs
mailing list