Bug 88428 - Fails to consider lea -1(%rax), %rax compared to sub 1, %rax failing to CSE test
Summary: Fails to consider lea -1(%rax), %rax compared to sub 1, %rax failing to CSE test
Status: NEW
Alias: None
Product: gcc
Classification: Unclassified
Component: rtl-optimization (show other bugs)
Version: 9.0
: P3 enhancement
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords: missed-optimization
Depends on:
Blocks:
 
Reported: 2018-12-10 12:09 UTC by Richard Biener
Modified: 2021-12-27 15:01 UTC (History)
3 users (show)

See Also:
Host:
Target: x86_64-*-*, i?86-*-*
Build:
Known to work:
Known to fail:
Last reconfirmed: 2021-12-27 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Richard Biener 2018-12-10 12:09:35 UTC
The following GIMPLE test shows non-optimal assembly

long mask;
void bar ();
__GIMPLE () void foo (int a, int b)
{
  long _3;
  _3 = a_1(D) < b_2(D) ? _Literal (long) -1l : 0l;
  mask = _3;
  if (a_1(D) < b_2(D))
    goto bb1;
  else
    goto bb2;

bb1:
    bar ();

bb2:
  return;
}

foo:
.LFB0:
        .cfi_startproc
        xorl    %eax, %eax
        cmpl    %esi, %edi
        setge   %al
        subq    $1, %rax
        movq    %rax, mask(%rip)
        cmpl    %esi, %edi
        jl      .L5
...

here subq clobbers flags and thus the cmpl has to be repeated.  I believe
we could use lea which also has the same size

        leaq    -0x1(%rax), %rax

here instead and elide the redundant cmpl.  For my purpose the store to
mask is unnecessary, it was placed to simplify the testcase.  A GIMPLE
testcase was necessary to get the COND_EXPR and non-jumpy code through
optimization.

I'm not sure at which point during RTL we commit to using a CC clobbering
sub vs. a non-CC clobbering lea, but maybe cmpelim could replace one
with the other here?
Comment 1 Andrew Pinski 2021-12-27 05:05:50 UTC
Confirmed, we get now:

        xorl    %eax, %eax
        cmpl    %esi, %edi
        setl    %al
        negq    %rax
        movq    %rax, mask(%rip)
        cmpl    %esi, %edi
        jl      .L5

Because we produce similar to the following C testcase:
long mask;

void bar ();
void f (int a, int b)
{
  long _3;
  _3 = a < b;
  _3 = -_3;
  mask = _3;
  if (a < b)
    bar ();
  return;
}