As of 216350, compiling the following example on SH with -O2 unsigned int test (unsigned int a, unsigned int b, unsigned int m) { return (a & ~m) | (b & m); } results in: not r6,r0 and r0,r4 and r6,r5 mov r4,r0 rts or r5,r0 A shorter way is to do the same is: xor r4,r5 and r5,r6 mov r6,r0 rts xor r4,r0 If this kind of stuff is done as part of tree optimization, then this is probably not SH specific, although I haven't checked with other targets.
Looks like three vs. four ops and thus is simpler in general. Easy to implement as simplification on match-and-simplify branch.
I'll try to come up with a match-and-simplify simplification.
FWIW, I used this to check the whether the transformation is correct: int main () { for (int i = -1000; i < 1000; ++i) for (int a = -1000; a < 1000; ++a) for (int b = -1000; b < 1000; ++b) { int x = (a & ~i) | (b & i); int y = a ^ ((a ^ b) & i); //__builtin_printf ("%d %d\n", x, y); if (x != y) __builtin_abort (); } }
Not necessarily 3 vs. 4 ops, many targets have andnot instruction and can do it also in 3 ops.
True. E.g. on my x86_64 i7 Nehalem I see (using ./cc1 -quiet -O2 qq.c -mbmi) andn %edi, %edx, %edi andl %edx, %esi movl %edi, %eax orl %esi, %eax ret for return (a & ~m) | (b & m); and xorl %edi, %esi movl %edi, %eax andl %esi, %edx xorl %edx, %eax ret for return a ^ ((a ^ b) & m);
On Tue, 16 Dec 2014, mpolacek at gcc dot gnu.org wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63568 > > --- Comment #5 from Marek Polacek <mpolacek at gcc dot gnu.org> --- > True. E.g. on my x86_64 i7 Nehalem I see (using ./cc1 -quiet -O2 qq.c -mbmi) > > andn %edi, %edx, %edi > andl %edx, %esi > movl %edi, %eax > orl %esi, %eax > ret > > for return (a & ~m) | (b & m); and > > xorl %edi, %esi > movl %edi, %eax > andl %esi, %edx > xorl %edx, %eax > ret > > for return a ^ ((a ^ b) & m); The former is also better for instruction level parallelism - but that just asks for a clever enough expander / combiner that can generate that from the latter. I think on GIMPLE we want to canoncalize to the variant with less (gimple) operations. single-use restrictions may apply (with the lack of a global unified combine / CSE phase)
If you decide not to do the transform at the tree level, please change this to a target PR and assign it to me.
(In reply to Oleg Endo from comment #7) > If you decide not to do the transform at the tree level, please change this > to a target PR and assign it to me. I have a patch that does the transformation on match-and-simplify. Let's see if it can make it in...
Author: mpolacek Date: Wed Dec 17 11:48:33 2014 New Revision: 218816 URL: https://gcc.gnu.org/viewcvs?rev=218816&root=gcc&view=rev Log: PR middle-end/63568 * match.pd: Add (x & ~m) | (y & m) -> ((x ^ y) & m) ^ x pattern. * gcc.dg/pr63568.c: New test. Added: trunk/gcc/testsuite/gcc.dg/pr63568.c Modified: trunk/gcc/ChangeLog trunk/gcc/match.pd trunk/gcc/testsuite/ChangeLog
Fixed.