This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug target/56200] queens benchmark is faster with -O0 than with any other optimization level
- From: "rguenth at gcc dot gnu.org" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: Wed, 06 Feb 2013 09:57:13 +0000
- Subject: [Bug target/56200] queens benchmark is faster with -O0 than with any other optimization level
- Auto-submitted: auto-generated
- References: <bug-56200-4@http.gcc.gnu.org/bugzilla/>
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56200
--- Comment #6 from Richard Biener <rguenth at gcc dot gnu.org> 2013-02-06 09:57:13 UTC ---
(In reply to comment #5)
> Optimized alignments are enabled for -O2 and above. For -O2, there are:
>
> .p2align 4,,10
> .p2align 3
> .L19:
> cmpl file(,%rbx,4), %ebp
> jg .L18
> cmpl 0(%r13,%rbx,4), %ebp
> jg .L18
> cmpl (%r12), %ebp
> jle .L22
> .p2align 4,,10
> .p2align 3
> .L18:
>
> and generate
>
> 400ab6: 66 2e 0f 1f 84 00 00 00 00 00 nopw %cs:0x0(%rax,%rax,1)
> 400ac0: 3b 2c 9d a0 1a 60 00 cmp 0x601aa0(,%rbx,4),%ebp
> 400ac7: 7f 17 jg 400ae0 <find+0x70>
> 400ac9: 41 3b 6c 9d 00 cmp 0x0(%r13,%rbx,4),%ebp
> 400ace: 7f 10 jg 400ae0 <find+0x70>
> 400ad0: 41 3b 2c 24 cmp (%r12),%ebp
> 400ad4: 7e 32 jle 400b08 <find+0x98>
> 400ad6: 66 2e 0f 1f 84 00 00 00 00 00 nopw %cs:0x0(%rax,%rax,1)
>
> Branch Predict Unit fetches 32-byte at a time. There are 3 back-to-back
> fused cmp/jcc instructions in 32-byte window, which causes misprediction.
> We can add a nop after the first cmp/jcc to avoid back-to-back cmp/jccs.
Yeah, I suppose if we bother with alignment we should do that. Can we
do it with a peephole to only do it between two consecutive cmp/jccs?