This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

[Bug target/56200] queens benchmark is faster with -O0 than with any other optimization level

From: "rguenth at gcc dot gnu.org" <gcc-bugzilla at gcc dot gnu dot org>
To: gcc-bugs at gcc dot gnu dot org
Date: Wed, 06 Feb 2013 09:57:13 +0000
Subject: [Bug target/56200] queens benchmark is faster with -O0 than with any other optimization level
Auto-submitted: auto-generated
References: <bug-56200-4@http.gcc.gnu.org/bugzilla/>

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56200

--- Comment #6 from Richard Biener <rguenth at gcc dot gnu.org> 2013-02-06 09:57:13 UTC ---
(In reply to comment #5)
> Optimized alignments are enabled for -O2 and above.  For -O2, there are:
> 
>         .p2align 4,,10
>         .p2align 3
> .L19:
>         cmpl    file(,%rbx,4), %ebp
>         jg      .L18
>         cmpl    0(%r13,%rbx,4), %ebp
>         jg      .L18
>         cmpl    (%r12), %ebp
>         jle     .L22
>         .p2align 4,,10
>         .p2align 3
> .L18:
> 
> and generate
> 
>   400ab6:       66 2e 0f 1f 84 00 00 00 00 00   nopw   %cs:0x0(%rax,%rax,1)
>   400ac0:       3b 2c 9d a0 1a 60 00    cmp    0x601aa0(,%rbx,4),%ebp
>   400ac7:       7f 17                   jg     400ae0 <find+0x70>
>   400ac9:       41 3b 6c 9d 00          cmp    0x0(%r13,%rbx,4),%ebp
>   400ace:       7f 10                   jg     400ae0 <find+0x70>
>   400ad0:       41 3b 2c 24             cmp    (%r12),%ebp
>   400ad4:       7e 32                   jle    400b08 <find+0x98>
>   400ad6:       66 2e 0f 1f 84 00 00 00 00 00   nopw   %cs:0x0(%rax,%rax,1)
> 
> Branch Predict Unit fetches 32-byte at a time.  There are 3 back-to-back
> fused cmp/jcc instructions in 32-byte window, which causes misprediction.
> We can add a nop after the first cmp/jcc to avoid back-to-back cmp/jccs.

Yeah, I suppose if we bother with alignment we should do that.  Can we
do it with a peephole to only do it between two consecutive cmp/jccs?

References:
- [Bug target/56200] New: queens benchmark is faster with -O0 than with any other optimization level
  - From: abel at gcc dot gnu.org

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]