This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH] Add (x & ~m) | (y & m) folding (PR middle-end/63568)
- From: Marek Polacek <polacek at redhat dot com>
- To: Richard Biener <rguenther at suse dot de>
- Cc: Jakub Jelinek <jakub at redhat dot com>, GCC Patches <gcc-patches at gcc dot gnu dot org>
- Date: Wed, 17 Dec 2014 12:29:44 +0100
- Subject: Re: [PATCH] Add (x & ~m) | (y & m) folding (PR middle-end/63568)
- Authentication-results: sourceware.org; auth=none
- References: <20141217084644 dot GJ16332 at redhat dot com> <20141217090453 dot GS1667 at tucnak dot redhat dot com> <alpine dot LSU dot 2 dot 11 dot 1412171006310 dot 25296 at zhemvz dot fhfr dot qr>
On Wed, Dec 17, 2014 at 10:12:06AM +0100, Richard Biener wrote:
> On Wed, 17 Dec 2014, Jakub Jelinek wrote:
>
> > On Wed, Dec 17, 2014 at 09:46:44AM +0100, Marek Polacek wrote:
> > > This adds a transformation of (x & ~m) | (y & m), which (on GIMPLE)
> > > has 4 ops to x ^ ((x ^ y) & m) that has 3 ops on GIMPLE. In fact,
> > > the latter is then transformed to (a ^ b) & m ^ a, which also has 3
> > > ops.
> >
> > So why don't you transform it to ((x ^ y) & m) ^ x directly (i.e. swap
> > @0 with the (bit_and ...) ?
>
> Yeah, I'd prefer that as well. Also you probably should make sure
> the (x & ~m) and (y & m) exprs are single-use (see other examples
> in match.pd using has_single_use).
Ok. We have only one use of has_single_use so far in match.pd, so I
hope what I have is correct... I suppose the point is not to hinder
CSEing those two subexpressions if they have multiple uses.
> > BTW, the advantage of (x & ~m) | (y & m) form is that there are fewer
> > dependencies, at least if the target has andn instruction (e.g. SPARC,
> > Alpha, PA, MMIX, IA64 have them), just the final or depends on the result of
> > both and and andnot instructions. While in the 2 xor forms, and
> > depends on the first xor result and the second xor depends on the and
> > result, so if there are multiple ALU units available, the and | andn
> > form can use both the units, while the second one is unnecessarily
> > serialized.
>
> Right - I've pointed this out as well. Of course this simply
> asks for more clever expansion of ((x ^ y) & m) ^ x rather
> than disabling this transform. Should be doable with TER
> in the BIT_XOR_EXPR expansion code. Or figure out if teaching
> combine/simplify RTX is better. Of course this form requires
> one more register ... which means with high register pressure
> or on register starved machines the GIMPLE canonical form might
> be better (in some cases). Which means that LRA should know
> how to transform it back? (interesting kind of "reload" ;))
I'd rather leave the expansion/combine stuff to someone else ;).
Bootstrapped/regtested on x86_64-linux and ppc64-linux, ok for trunk?
2014-12-17 Marek Polacek <polacek@redhat.com>
PR middle-end/63568
* match.pd: Add (x & ~m) | (y & m) -> ((x ^ y) & m) ^ x pattern.
* gcc.dg/pr63568.c: New test.
diff --git gcc/match.pd gcc/match.pd
index dbca99e..4d4bc9f 100644
--- gcc/match.pd
+++ gcc/match.pd
@@ -382,6 +382,13 @@ along with GCC; see the file COPYING3. If not see
(bit_not (bit_not @0))
@0)
+/* (x & ~m) | (y & m) -> ((x ^ y) & m) ^ x */
+(simplify
+ (bit_ior:c (bit_and:c@3 @0 (bit_not @2)) (bit_and:c@4 @1 @2))
+ (if ((TREE_CODE (@3) != SSA_NAME || has_single_use (@3))
+ && (TREE_CODE (@4) != SSA_NAME || has_single_use (@4)))
+ (bit_xor (bit_and (bit_xor @0 @1) @2) @0)))
+
/* Associate (p +p off1) +p off2 as (p +p (off1 + off2)). */
(simplify
diff --git gcc/testsuite/gcc.dg/pr63568.c gcc/testsuite/gcc.dg/pr63568.c
index e69de29..fb42bea 100644
--- gcc/testsuite/gcc.dg/pr63568.c
+++ gcc/testsuite/gcc.dg/pr63568.c
@@ -0,0 +1,54 @@
+/* PR middle-end/63568 */
+/* { dg-do compile } */
+/* { dg-options "-fdump-tree-original" } */
+
+int
+fn1 (int a, int b, int m)
+{
+ return (a & ~m) | (b & m);
+}
+
+int
+fn2 (int a, int b, int m)
+{
+ return (a & ~m) | (m & b);
+}
+
+int
+fn3 (int a, int b, int m)
+{
+ return (~m & a) | (m & b);
+}
+
+int
+fn4 (int a, int b, int m)
+{
+ return (~m & a) | (b & m);
+}
+
+int
+fn5 (int a, int b, int m)
+{
+ return (b & m) | (a & ~m);
+}
+
+int
+fn6 (int a, int b, int m)
+{
+ return (m & b) | (a & ~m);
+}
+
+int
+fn7 (int a, int b, int m)
+{
+ return (m & b) | (~m & a);
+}
+
+int
+fn8 (int a, int b, int m)
+{
+ return (b & m) | (~m & a);
+}
+
+/* { dg-final { scan-tree-dump-not " \\| " "original" } } */
+/* { dg-final { cleanup-tree-dump "original" } } */
Marek