This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug target/56309] -O3 optimizer generates conditional moves instead of compare and branch resulting in almost 2x slower code
- From: "rguenth at gcc dot gnu.org" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: Fri, 15 Feb 2013 09:33:22 +0000
- Subject: [Bug target/56309] -O3 optimizer generates conditional moves instead of compare and branch resulting in almost 2x slower code
- Auto-submitted: auto-generated
- References: <bug-56309-4@http.gcc.gnu.org/bugzilla/>
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56309
--- Comment #23 from Richard Biener <rguenth at gcc dot gnu.org> 2013-02-15 09:33:22 UTC ---
(In reply to comment #20)
> (In reply to comment #12)
> > --- by-val-O3.s.orig 2013-02-14 18:06:56.000000000 +0100
> > +++ by-val-O3.s 2013-02-14 18:07:23.000000000 +0100
> > @@ -357,9 +357,8 @@
> > shrq $32, %rdi
> > cmpq %r8, %rdx
> > cmovbe %r11, %rdi
> > - addq $1, %rax
> > - cmpq %r8, %rdx
> > cmovbe %rdx, %rcx
> > + addq $1, %rax
> > cmpq %rbp, %rax
> > movq %rcx, -8(%rsi,%rax,8)
> > jne .L50
> >
> > unmodified: Took 14.31 seconds total.
> > modified: Took 13.04 seconds total.
> >
> > So re. comment #9: it's not the problem but it'd be a small improvement.
>
> FWIW this comes from not eliminating the condition expression in
> the conditional moves that ifcvt creates:
>
> tmp_97 = tmp_93 > 4294967295 ? tmp_95 : tmp_93;
> carry_105 = tmp_93 > 4294967295 ? carry_94 : 0;
>
> I'm surprised this form is allowed at all, I'd expect we only allow
> is_gimple_reg() for a COND_EXPR_COND in a RHS context.
Yeah, it's on my list (even with partial patches available ...).
Note that then vectorizing a COND_EXPR is not different from being
able to vectorize a comparison statement.
pred_2 = tmp_93 > 4294967295;
tmp_97 = pred_2 ? tmp_95 : tmp_93;
carry_105 = pred_2 ? carry_94 : 0;
this form, both vectorized and not vectorized has issues when doing
initial instruction selection during expand (with multiple uses of
pred_2 we don't TER it). So it might be that we present combine and
other RTL optimizers with initial code they will not be able to handle
as well as what we do right now.
> Anyway -- separate problem.
Indeed.