63568 – Missed optimization (a & ~mask) | (b & mask) = a ^ ((a ^ b) & mask)

Bug 63568 - Missed optimization (a & ~mask) | (b & mask) = a ^ ((a ^ b) & mask)

Summary: Missed optimization (a & ~mask) | (b & mask) = a ^ ((a ^ b) & mask)

Status:	RESOLVED FIXED

Alias:	None

Product:	gcc
Classification:	Unclassified
Component:	middle-end (show other bugs)
Version:	5.0

Importance:	P3 normal
Target Milestone:	5.0
Assignee:	Marek Polacek

URL:
Keywords:	missed-optimization

Depends on:
Blocks:

Reported:	2014-10-16 23:55 UTC by Oleg Endo
Modified:	2024-03-02 07:42 UTC (History)
CC List:	3 users (show)

See Also:	90323 94790 https://github.com/llvm/llvm-project/issues/83637
Host:
Target:	sh--*
Build:
Known to work:
Known to fail:
Last reconfirmed:	2014-10-17 00:00:00

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Oleg Endo 2014-10-16 23:55:17 UTC

As of 216350, compiling the following example on SH with -O2

unsigned int test (unsigned int a, unsigned int b, unsigned int m)
{
  return (a & ~m) | (b & m);
}

results in:
        not     r6,r0
        and     r0,r4
        and     r6,r5
        mov     r4,r0
        rts
        or      r5,r0

A shorter way is to do the same is:
        xor     r4,r5
        and     r5,r6
        mov     r6,r0
        rts
        xor     r4,r0

If this kind of stuff is done as part of tree optimization, then this is probably not SH specific, although I haven't checked with other targets.

Comment 1 Richard Biener 2014-10-17 08:31:36 UTC

Looks like three vs. four ops and thus is simpler in general.  Easy to implement
as simplification on match-and-simplify branch.

Comment 2 Marek Polacek 2014-12-16 11:18:24 UTC

I'll try to come up with a match-and-simplify simplification.

Comment 3 Marek Polacek 2014-12-16 13:44:50 UTC

FWIW, I used this to check the whether the transformation is correct:

int
main ()
{
  for (int i = -1000; i < 1000; ++i)
    for (int a = -1000; a < 1000; ++a)
      for (int b = -1000; b < 1000; ++b)
	{
	  int x = (a & ~i) | (b & i);
	  int y = a ^ ((a ^ b) & i);
	  //__builtin_printf ("%d %d\n", x, y);
	  if (x != y)
	    __builtin_abort ();
	}
}

Comment 4 Jakub Jelinek 2014-12-16 13:50:43 UTC

Not necessarily 3 vs. 4 ops, many targets have andnot instruction and can do it also in 3 ops.

Comment 5 Marek Polacek 2014-12-16 14:16:58 UTC

True.  E.g. on my x86_64 i7 Nehalem I see (using ./cc1 -quiet -O2 qq.c -mbmi) 

        andn    %edi, %edx, %edi
        andl    %edx, %esi
        movl    %edi, %eax
        orl     %esi, %eax
        ret

for return (a & ~m) | (b & m); and

        xorl    %edi, %esi
        movl    %edi, %eax
        andl    %esi, %edx
        xorl    %edx, %eax
        ret

for return a ^ ((a ^ b) & m);

Comment 6 rguenther@suse.de 2014-12-16 14:21:44 UTC

On Tue, 16 Dec 2014, mpolacek at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63568
> 
> --- Comment #5 from Marek Polacek <mpolacek at gcc dot gnu.org> ---
> True.  E.g. on my x86_64 i7 Nehalem I see (using ./cc1 -quiet -O2 qq.c -mbmi) 
> 
>         andn    %edi, %edx, %edi
>         andl    %edx, %esi
>         movl    %edi, %eax
>         orl     %esi, %eax
>         ret
> 
> for return (a & ~m) | (b & m); and
> 
>         xorl    %edi, %esi
>         movl    %edi, %eax
>         andl    %esi, %edx
>         xorl    %edx, %eax
>         ret
> 
> for return a ^ ((a ^ b) & m);

The former is also better for instruction level parallelism - but
that just asks for a clever enough expander / combiner that can
generate that from the latter.

I think on GIMPLE we want to canoncalize to the variant with less
(gimple) operations.  single-use restrictions may apply (with
the lack of a global unified combine / CSE phase)

Comment 7 Oleg Endo 2014-12-16 20:38:39 UTC

If you decide not to do the transform at the tree level, please change this to a target PR and assign it to me.

Comment 8 Marek Polacek 2014-12-16 20:58:08 UTC

(In reply to Oleg Endo from comment #7)
> If you decide not to do the transform at the tree level, please change this
> to a target PR and assign it to me.

I have a patch that does the transformation on match-and-simplify.  Let's see if it can make it in...

Comment 9 Marek Polacek 2014-12-17 11:49:05 UTC

Author: mpolacek
Date: Wed Dec 17 11:48:33 2014
New Revision: 218816

URL: https://gcc.gnu.org/viewcvs?rev=218816&root=gcc&view=rev
Log:
	PR middle-end/63568
	* match.pd: Add (x & ~m) | (y & m) -> ((x ^ y) & m) ^ x pattern.

	* gcc.dg/pr63568.c: New test.

Added:
    trunk/gcc/testsuite/gcc.dg/pr63568.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/match.pd
    trunk/gcc/testsuite/ChangeLog

Comment 10 Marek Polacek 2014-12-17 11:55:04 UTC

Fixed.