88428 – Fails to consider lea -1(%rax), %rax compared to sub 1, %rax failing to CSE test

Bug 88428 - Fails to consider lea -1(%rax), %rax compared to sub 1, %rax failing to CSE test

Summary: Fails to consider lea -1(%rax), %rax compared to sub 1, %rax failing to CSE test

Status:	NEW

Alias:	None

Product:	gcc
Classification:	Unclassified
Component:	rtl-optimization (show other bugs)
Version:	9.0

Importance:	P3 enhancement
Target Milestone:	---
Assignee:	Not yet assigned to anyone

URL:
Keywords:	missed-optimization

Depends on:
Blocks:

Reported:	2018-12-10 12:09 UTC by Richard Biener
Modified:	2021-12-27 15:01 UTC (History)
CC List:	3 users (show)

See Also:
Host:
Target:	x86_64--, i?86--
Build:
Known to work:
Known to fail:
Last reconfirmed:	2021-12-27 00:00:00

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Richard Biener 2018-12-10 12:09:35 UTC

The following GIMPLE test shows non-optimal assembly

long mask;
void bar ();
__GIMPLE () void foo (int a, int b)
{
  long _3;
  _3 = a_1(D) < b_2(D) ? _Literal (long) -1l : 0l;
  mask = _3;
  if (a_1(D) < b_2(D))
    goto bb1;
  else
    goto bb2;

bb1:
    bar ();

bb2:
  return;
}

foo:
.LFB0:
        .cfi_startproc
        xorl    %eax, %eax
        cmpl    %esi, %edi
        setge   %al
        subq    $1, %rax
        movq    %rax, mask(%rip)
        cmpl    %esi, %edi
        jl      .L5
...

here subq clobbers flags and thus the cmpl has to be repeated.  I believe
we could use lea which also has the same size

        leaq    -0x1(%rax), %rax

here instead and elide the redundant cmpl.  For my purpose the store to
mask is unnecessary, it was placed to simplify the testcase.  A GIMPLE
testcase was necessary to get the COND_EXPR and non-jumpy code through
optimization.

I'm not sure at which point during RTL we commit to using a CC clobbering
sub vs. a non-CC clobbering lea, but maybe cmpelim could replace one
with the other here?

Comment 1 Andrew Pinski 2021-12-27 05:05:50 UTC

Confirmed, we get now:

        xorl    %eax, %eax
        cmpl    %esi, %edi
        setl    %al
        negq    %rax
        movq    %rax, mask(%rip)
        cmpl    %esi, %edi
        jl      .L5

Because we produce similar to the following C testcase:
long mask;

void bar ();
void f (int a, int b)
{
  long _3;
  _3 = a < b;
  _3 = -_3;
  mask = _3;
  if (a < b)
    bar ();
  return;
}