Bug 63568 - Missed optimization (a & ~mask) | (b & mask) = a ^ ((a ^ b) & mask)
Summary: Missed optimization (a & ~mask) | (b & mask) = a ^ ((a ^ b) & mask)
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: middle-end (show other bugs)
Version: 5.0
: P3 normal
Target Milestone: 5.0
Assignee: Marek Polacek
URL:
Keywords: missed-optimization
Depends on:
Blocks:
 
Reported: 2014-10-16 23:55 UTC by Oleg Endo
Modified: 2024-03-02 07:42 UTC (History)
3 users (show)

See Also:
Host:
Target: sh*-*-*
Build:
Known to work:
Known to fail:
Last reconfirmed: 2014-10-17 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Oleg Endo 2014-10-16 23:55:17 UTC
As of 216350, compiling the following example on SH with -O2

unsigned int test (unsigned int a, unsigned int b, unsigned int m)
{
  return (a & ~m) | (b & m);
}

results in:
        not     r6,r0
        and     r0,r4
        and     r6,r5
        mov     r4,r0
        rts
        or      r5,r0

A shorter way is to do the same is:
        xor     r4,r5
        and     r5,r6
        mov     r6,r0
        rts
        xor     r4,r0

If this kind of stuff is done as part of tree optimization, then this is probably not SH specific, although I haven't checked with other targets.
Comment 1 Richard Biener 2014-10-17 08:31:36 UTC
Looks like three vs. four ops and thus is simpler in general.  Easy to implement
as simplification on match-and-simplify branch.
Comment 2 Marek Polacek 2014-12-16 11:18:24 UTC
I'll try to come up with a match-and-simplify simplification.
Comment 3 Marek Polacek 2014-12-16 13:44:50 UTC
FWIW, I used this to check the whether the transformation is correct:

int
main ()
{
  for (int i = -1000; i < 1000; ++i)
    for (int a = -1000; a < 1000; ++a)
      for (int b = -1000; b < 1000; ++b)
	{
	  int x = (a & ~i) | (b & i);
	  int y = a ^ ((a ^ b) & i);
	  //__builtin_printf ("%d %d\n", x, y);
	  if (x != y)
	    __builtin_abort ();
	}
}
Comment 4 Jakub Jelinek 2014-12-16 13:50:43 UTC
Not necessarily 3 vs. 4 ops, many targets have andnot instruction and can do it also in 3 ops.
Comment 5 Marek Polacek 2014-12-16 14:16:58 UTC
True.  E.g. on my x86_64 i7 Nehalem I see (using ./cc1 -quiet -O2 qq.c -mbmi) 

        andn    %edi, %edx, %edi
        andl    %edx, %esi
        movl    %edi, %eax
        orl     %esi, %eax
        ret

for return (a & ~m) | (b & m); and

        xorl    %edi, %esi
        movl    %edi, %eax
        andl    %esi, %edx
        xorl    %edx, %eax
        ret

for return a ^ ((a ^ b) & m);
Comment 6 rguenther@suse.de 2014-12-16 14:21:44 UTC
On Tue, 16 Dec 2014, mpolacek at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63568
> 
> --- Comment #5 from Marek Polacek <mpolacek at gcc dot gnu.org> ---
> True.  E.g. on my x86_64 i7 Nehalem I see (using ./cc1 -quiet -O2 qq.c -mbmi) 
> 
>         andn    %edi, %edx, %edi
>         andl    %edx, %esi
>         movl    %edi, %eax
>         orl     %esi, %eax
>         ret
> 
> for return (a & ~m) | (b & m); and
> 
>         xorl    %edi, %esi
>         movl    %edi, %eax
>         andl    %esi, %edx
>         xorl    %edx, %eax
>         ret
> 
> for return a ^ ((a ^ b) & m);

The former is also better for instruction level parallelism - but
that just asks for a clever enough expander / combiner that can
generate that from the latter.

I think on GIMPLE we want to canoncalize to the variant with less
(gimple) operations.  single-use restrictions may apply (with
the lack of a global unified combine / CSE phase)
Comment 7 Oleg Endo 2014-12-16 20:38:39 UTC
If you decide not to do the transform at the tree level, please change this to a target PR and assign it to me.
Comment 8 Marek Polacek 2014-12-16 20:58:08 UTC
(In reply to Oleg Endo from comment #7)
> If you decide not to do the transform at the tree level, please change this
> to a target PR and assign it to me.

I have a patch that does the transformation on match-and-simplify.  Let's see if it can make it in...
Comment 9 Marek Polacek 2014-12-17 11:49:05 UTC
Author: mpolacek
Date: Wed Dec 17 11:48:33 2014
New Revision: 218816

URL: https://gcc.gnu.org/viewcvs?rev=218816&root=gcc&view=rev
Log:
	PR middle-end/63568
	* match.pd: Add (x & ~m) | (y & m) -> ((x ^ y) & m) ^ x pattern.

	* gcc.dg/pr63568.c: New test.

Added:
    trunk/gcc/testsuite/gcc.dg/pr63568.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/match.pd
    trunk/gcc/testsuite/ChangeLog
Comment 10 Marek Polacek 2014-12-17 11:55:04 UTC
Fixed.