92342 – [10/11/12 Regression] a small missed transformation into x?b:0

Bug 92342 - [10/11/12 Regression] a small missed transformation into x?b:0

Summary: [10/11/12 Regression] a small missed transformation into x?b:0

Status:	RESOLVED FIXED

Alias:	None

Product:	gcc
Classification:	Unclassified
Component:	tree-optimization (show other bugs)
Version:	10.0

Importance:	P2 normal
Target Milestone:	13.0
Assignee:	Andrew Pinski

URL:	https://gcc.gnu.org/pipermail/gcc-pat...
Keywords:	missed-optimization, patch

Duplicates (1):	100072 (view as bug list)
Depends on:
Blocks:

Reported:	2019-11-04 01:22 UTC by Andrew Pinski
Modified:	2024-01-09 03:08 UTC (History)
CC List:	8 users (show)

See Also:	103354 111126
Host:
Target:	aarch64-linux-gnu
Build:
Known to work:	13.0
Known to fail:
Last reconfirmed:	2019-11-04 00:00:00

Attachments
Patch which I am testing (1.49 KB, patch) 2021-11-22 01:59 UTC, Andrew Pinski	Details \| Diff
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Andrew Pinski 2019-11-04 01:22:27 UTC

Take these two functions:
int f(int a,int b, int c)
{
  return a==c?b:0;
}

int g(int a, int b, int c)
{
  return b & -(a==c);
}
---- CUT ----
We used to produce the same code generation for both of them:
        cmp     w0, w2
        csel    w0, w1, wzr, eq
        ret


But in GCC 10 we produce for g:
        cmp     w0, w2
        csetm   w0, eq
        and     w0, w0, w1
        ret

I think this was introduced by:
2019-05-09  Segher Boessenkool  <segher@kernel.crashing.org>

       * combine.c (combine_simplify_rtx): Don't make IF_THEN_ELSE RTL.

Comment 1 prathamesh3492 2019-11-04 05:49:52 UTC

Hi,
I reverted Segher's commit in my local tree, but am still seeing the same code-gen for g().

Thanks,
Prathamesh

Comment 2 prathamesh3492 2019-11-04 05:58:54 UTC

(In reply to prathamesh3492 from comment #1)
> Hi,
> I reverted Segher's commit in my local tree, but am still seeing the same
> code-gen for g().
Oops I was modifying wrong branch :-/
I can confirm reverting the commit fixes this issue.
Sorry for the noise.

Regards,
Prathamesh
> 
> Thanks,
> Prathamesh

Comment 3 Andrew Pinski 2019-11-04 06:33:50 UTC

(In reply to prathamesh3492 from comment #2)
> Oops I was modifying wrong branch :-/
> I can confirm reverting the commit fixes this issue.
> Sorry for the noise.
Funny, I had did the same :).

Comment 4 Segher Boessenkool 2019-11-04 08:12:38 UTC

(That is r271047; confirmed).

As the commit message says, with that patch we generate better code on
average for all targets (except alpha).  Which isn't surprising at all,
given what it does.

Combine (or better, simplify-rtx) does not know how to handle

  (and:SI (neg:SI (eq:SI x y))
          (reg:SI z))


Btw, try

int h(int a, int b, int c, int d)
{
  return (c & -(a==b)) | (d & -(a!=b));
}

to see we have some way to go here.

Comment 5 Richard Earnshaw 2019-11-04 16:55:27 UTC

So if the AND-based idiom is now preferred, shouldn't the if-then-else variant be transformed into it?  Similarly for IOR, when we get

(IOR (NEG (<cond-op>)) (reg))

from

(IF_THEN_ELSE (<cond-op>)
  (reg)
  (const_int -1))

Comment 6 Richard Earnshaw 2019-11-04 17:01:46 UTC

(In reply to Richard Earnshaw from comment #5)
> So if the AND-based idiom is now preferred, shouldn't the if-then-else
> variant be transformed into it?  Similarly for IOR, when we get
> 
> (IOR (NEG (<cond-op>)) (reg))
> 
> from
> 
> (IF_THEN_ELSE (<cond-op>)
>   (reg)
>   (const_int -1))

except that should be 

(IF_THEN_ELSE (<cond-op>')
  (reg)
  (const_int -1))

Where <cond-op>' is the reversed condition.

Comment 7 Segher Boessenkool 2019-11-04 22:47:31 UTC

(In reply to Richard Earnshaw from comment #5)
> So if the AND-based idiom is now preferred, shouldn't the if-then-else
> variant be transformed into it?

Neither form is canonical currently.

The form you *really* want tests a CCmode var, but combine will not
give you that (unless it started with one), so you'll need md patterns
for it.

> Similarly for IOR, when we get
> 
> (IOR (NEG (<cond-op>)) (reg))
> 
> from
> 
> (IF_THEN_ELSE (<cond-op>)
>   (reg)
>   (const_int -1))

Yeah.  And some more variants.

I think the IF_THEN_ELSE version should be canonical, and it should be
formed in simplify_rtx, not at random spots in combine.

Comment 8 Segher Boessenkool 2019-11-04 22:50:36 UTC

(In reply to Richard Earnshaw from comment #6)
> (In reply to Richard Earnshaw from comment #5)
> > So if the AND-based idiom is now preferred, shouldn't the if-then-else
> > variant be transformed into it?  Similarly for IOR, when we get
> > 
> > (IOR (NEG (<cond-op>)) (reg))
> > 
> > from
> > 
> > (IF_THEN_ELSE (<cond-op>)
> >   (reg)
> >   (const_int -1))
> 
> except that should be 
> 
> (IF_THEN_ELSE (<cond-op>')
>   (reg)
>   (const_int -1))
> 
> Where <cond-op>' is the reversed condition.

Or just

(if_then_else (<cond-op>)
              (const_int -1)
              (reg))

(No order )like constant last) of those two arguments is prescribed, and
this makes sense).

Comment 9 Richard Earnshaw 2019-11-05 21:27:08 UTC

(In reply to Segher Boessenkool from comment #7)

> I think the IF_THEN_ELSE version should be canonical, and it should be
> formed in simplify_rtx, not at random spots in combine.

Why?  The and/ior variants are more likely to lead to a useful split if combining 3 insns and it doesn't match.

Comment 10 Segher Boessenkool 2019-11-06 01:00:32 UTC

There are a gazillion ways to write this without if_then_else, none
obviously better than any other, and it gets much worse if your b,c
have special values.


I don't think this optimisation should be done by combine at all.
combine has some serious problems doing this.

It is closely related to ifcvt, in some ways.

Comment 11 Jeffrey A. Law 2020-02-27 21:16:31 UTC

Couldn't we just fix this kind of stuff in gimple?

Comment 12 Segher Boessenkool 2020-02-27 23:05:45 UTC

Gimple can help writing silly expressions like this in a more canonical
form (whatever we decide to use for that) at least, yeah.  But you can
not do RTL's job in gimple ;-)

Comment 13 Andrew Pinski 2020-02-27 23:36:24 UTC

Note I don't remember where I found this originally.  I found this code somewhere back in May 2012 but I did not record where I found it though; It was the opposite way around really for Octeon2/3 (MIPS64).

Comment 14 Jeffrey A. Law 2020-02-27 23:53:40 UTC

That wouldn't be a big surprise Andrew.

My point is I think we could create some match.pd patterns to canonicalize the form in gimple and it wouldn't matter nearly as much what combine did or didn't do to this kind of code.

Comment 15 Jakub Jelinek 2020-02-28 09:19:40 UTC

For this, I agree we should canonicalize the second form to the first one
(well, any commutative x & -boolean_range into boolean_range ? x : 0 because at least in 95% of cases it will be written the former way, the latter form looks like a cute trick trying to workaround compiler deficiencies.
If the x & -boolean_range form results in better code than boolean_range ? x : 0 on some target, we should make sure to expand the ?: that way somewhere in RTL.

Comment 16 Jakub Jelinek 2020-02-28 09:42:47 UTC

Seems on x86_64-linux we also used to give the same code in 3.4.6 and earlier, but starting already with 4.0 (already r90000) we emit
	cmpl	%edx, %edi
	movl	$0, %eax
	cmove	%esi, %eax
for f and
	xorl	%eax, %eax
	cmpl	%edx, %edi
	sete	%al
	negl	%eax
	andl	%esi, %eax
for g.  We do it even for -Os, where the f version is clearly a win, for -O2 I'm afraid it depends on how soon the cmove result is used.  LLVM seems to use cmove in both cases, ICC does roughly what gcc does (honor what user wrote).

Not sure if we can do this transformation in match.pd though, because for a ? b : 0; the gimplifier doesn't emit a COND_EXPR, but GIMPLE_COND and PHI, I'm afraid we can't create new bbs from within match.pd.

Comment 17 Jakub Jelinek 2020-02-28 10:07:06 UTC

Tried timing:
__attribute__((noipa)) int f(int a,int b, int c)
{
  return a==c?b:0;
}

__attribute__((noipa)) int g(int a, int b, int c)
{
  return b & -(a==c);
}

int
main (int argc, const char **argv)
{
  unsigned int c = 0;
  if (argv[1][0] == 'f')
    {
      for (int i = 0; i < 1000000000; i++)
        c += f(1,1,i&1);
    }
  else
    {
      for (int i = 0; i < 1000000000; i++)
        c += g(1,1,i&1);
    }
  asm volatile ("" : : "r" (c));
  return 0;
}
and the same without the "i&", i.e. alternating results and always same, and both functions are the same speed in both cases, and cmov is shorter.
Without cmov, such as with -m32 -O2 -march=i586, g seems to be faster, so we probably should teach ifcvt.c to try it.

Comment 18 Jakub Jelinek 2020-05-07 11:56:31 UTC

GCC 10.1 has been released.

Comment 19 Richard Biener 2020-07-23 06:51:59 UTC

GCC 10.2 is released, adjusting target milestone.

Comment 20 Richard Biener 2021-04-08 12:02:24 UTC

GCC 10.3 is being released, retargeting bugs to GCC 10.4.

Comment 21 Andrew Pinski 2021-04-14 00:54:06 UTC

*** Bug 100072 has been marked as a duplicate of this bug. ***

Comment 22 Andrew Pinski 2021-11-22 00:57:11 UTC

Mine, we should do this transformation on the gimple level really. Let me do that.

Comment 23 Andrew Pinski 2021-11-22 01:59:14 UTC

Created attachment 51846 [details]
Patch which I am testing

Note it is really two patches, one which fixes up the multiply case and then one which does the same for &- too.

Comment 24 Andrew Pinski 2021-11-22 04:13:31 UTC

(In reply to Segher Boessenkool from comment #4)
> Btw, try
> 
> int h(int a, int b, int c, int d)
> {
>   return (c & -(a==b)) | (d & -(a!=b));
> }
> 
> to see we have some way to go here.

I filed that as PR 103354 which I will handle after I submit the patches for this one.

Comment 25 Andrew Pinski 2021-11-22 06:26:07 UTC

Submitted the sequence here:
https://gcc.gnu.org/pipermail/gcc-patches/2021-November/585111.html
https://gcc.gnu.org/pipermail/gcc-patches/2021-November/585112.html

Comment 26 Jakub Jelinek 2022-06-28 10:38:49 UTC

GCC 10.4 is being released, retargeting bugs to GCC 10.5.

Comment 27 Roger Sayle 2023-01-07 09:38:22 UTC

Revised versions of Andrew's patches from comment #25 were posted here:
https://gcc.gnu.org/pipermail/gcc-patches/2023-January/609349.html

Comment 28 GCC Commits 2023-01-12 21:49:16 UTC

The master branch has been updated by Roger Sayle <sayle@gcc.gnu.org>:

https://gcc.gnu.org/g:fd1f5373b8647a5da2f7f4b42282e676a4b04d98

commit r13-5128-gfd1f5373b8647a5da2f7f4b42282e676a4b04d98
Author: Roger Sayle <roger@nextmovesoftware.com>
Date:   Thu Jan 12 21:47:40 2023 +0000

    PR tree-optimization/92342: Optimize b & -(a==c) in match.pd
    
    This patch is an update/tweak of Andrew Pinski's two patches for
    PR tree-optimization/92342, that were originally posted by in November:
    https://gcc.gnu.org/pipermail/gcc-patches/2021-November/585111.html
    https://gcc.gnu.org/pipermail/gcc-patches/2021-November/585112.html
    
    Technically, the first of those was approved by Richard Biener, though
    never committed, and my first thought was to simply push it for Andrew,
    but the review of the second piece expressed concerns over comparisons
    in non-integral modes, where the result may not be zero-one valued.
    Indeed both transformations misbehave in the presence of vector mode
    comparisons (these transformations are already implemented for
    vec_cond elsewhere in match.pd), so my minor contribution is to limit
    these new transformations to scalars, by testing that both the operands
    and results are INTEGRAL_TYPE_P.
    
    2023-01-12  Andrew Pinski  <apinski@marvell.com>
                Roger Sayle  <roger@nextmovesoftware.com>
    
    gcc/ChangeLog:
            PR tree-optimization/92342
            * match.pd ((m1 CMP m2) * d -> (m1 CMP m2) ? d : 0):
            Use tcc_comparison and :c for the multiply.
            (b & -(a CMP c) -> (a CMP c)?b:0): New pattern.
    
    gcc/testsuite/ChangeLog:
            PR tree-optimization/92342
            * gcc.dg/tree-ssa/andnegcmp-1.c: New test.
            * gcc.dg/tree-ssa/andnegcmp-2.c: New test.
            * gcc.dg/tree-ssa/multcmp-1.c: New test.
            * gcc.dg/tree-ssa/multcmp-2.c: New test.

Comment 29 Gabriel Ravier 2023-01-14 18:55:20 UTC

Looks like the patch fixes this bug, unless I'm missing something.

Comment 30 Roger Sayle 2023-01-15 19:11:23 UTC

This has now been fixed on mainline (for GCC 13).

Comment 31 Andrew Pinski 2023-03-10 19:25:57 UTC

Fixed in GCC 13. I don't mind this not being fixed on other branches.