[RFA] [PATCH][PR tree-optimization/64910] Fix reassociation of binary bitwise operations with 3 operands

Jeff Law law@redhat.com
Wed Sep 6 05:21:00 GMT 2017


On 09/05/2017 11:26 AM, Jeff Law wrote:
> On 09/05/2017 12:38 AM, Christophe Lyon wrote:
>> Hi Jeff,
>>
>>
>> On 3 September 2017 at 16:44, Jeff Law <law@redhat.com> wrote:
>>> On 01/13/2016 05:30 AM, Richard Biener wrote:
>>>> On Wed, Jan 13, 2016 at 7:39 AM, Jeff Law <law@redhat.com> wrote:
>>>>> On 01/12/2016 08:11 AM, Richard Biener wrote:
>>>>>>
>>>>>> On Tue, Jan 12, 2016 at 6:10 AM, Jeff Law <law@redhat.com> wrote:
>>>>>>>
>>>>>>> On 01/11/2016 03:32 AM, Richard Biener wrote:
>>>>>>>
>>>>>>>>
>>>>>>>> Yeah, reassoc is largely about canonicalization.
>>>>>>>>
>>>>>>>>> Plus doing it in TER is almost certainly more complex than getting it
>>>>>>>>> right
>>>>>>>>> in reassoc to begin with.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> I guess canonicalizing differently is ok but you'll still create
>>>>>>>> ((a & b) & 1) & c then if you only change the above place.
>>>>>>>
>>>>>>>
>>>>>>> What's best for that expression would depend on factors like whether or
>>>>>>> not
>>>>>>> the target can exploit ILP.  ie (a & b) & (1 & c) exposes more
>>>>>>> parallelism
>>>>>>> while (((a & b) & c) & 1) is not good for parallelism, but does expose
>>>>>>> the
>>>>>>> bit test.
>>>>>>>
>>>>>>> reassoc currently generates ((a & 1) & b) & c which is dreadful as
>>>>>>> there's
>>>>>>> no ILP or chance of creating a bit test.  My patch shuffles things
>>>>>>> around,
>>>>>>> but still doesn't expose the ILP or bit test in the 4 operand case.
>>>>>>> Based
>>>>>>> on the comments in reassoc, it didn't seem like the author thought
>>>>>>> anything
>>>>>>> beyond the 3-operand case was worth handling. So my patch just handles
>>>>>>> the
>>>>>>> 3-operand case.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> So I'm not sure what pattern the backend is looking for?
>>>>>>>
>>>>>>>
>>>>>>> It just wants the constant last in the sequence.  That exposes bit clear,
>>>>>>> set, flip, test, etc idioms.
>>>>>>
>>>>>>
>>>>>> But those don't feed another bit operation, right?  Thus we'd like to see
>>>>>> ((a & b) & c) & 1, not ((a & b) & 1) & c?  It sounds like the instructions
>>>>>> are designed to feed conditionals (aka CC consuming ops)?
>>>>>
>>>>> At the gimple level they could feed a conditional, or be part of a series of
>>>>> ops on an SSA_NAME that eventually gets stored to memory, etc.  At the RTL
>>>>> level they'll feed CC consumers and bit manipulations of pseudos or memory.
>>>>>
>>>>> For the 3-op case, we always want the constant last.  For the 4-op case it's
>>>>> less clear.  Though ((a & b) & c) & 1 is certainly better than ((a & b) & 1)
>>>>> & c.
>>>>
>>>> Ok, so handling it in swap_ops_for_binary_stmt is merely a convenient place
>>>> to special-case bitwise ops.  The "real" fix to the sorting heuristic would be
>>>> to sort constants at the opposite end.
>>>>
>>>> That might be too invasive right now but there is another "convenient" place:
>>>>
>>>>               /* If the operand vector is now empty, all operands were
>>>>                  consumed by the __builtin_powi optimization.  */
>>>> ...
>>>>               else
>>>>                 {
>>>>                   machine_mode mode = TYPE_MODE (TREE_TYPE (lhs));
>>>>                   int ops_num = ops.length ();
>>>>                   int width = get_reassociation_width (ops_num, rhs_code, mode);
>>>>                   tree new_lhs = lhs;
>>>>
>>>>                   if (dump_file && (dump_flags & TDF_DETAILS))
>>>>                     fprintf (dump_file,
>>>>                              "Width = %d was chosen for
>>>> reassociation\n", width);
>>>>
>>>> at this point you can check rhs_code and move the (single) constant
>>>> entry in ops (if there is any constant entry) from .last () to the beginning.
>>>>
>>>> That'll get the 4 operand case correct as well and properly models
>>>> "constant last" for the specified operation kind.
>>> Resurrecting an old thread...  Just now getting around to flushing this
>>> out of the queue.
>>>
>>> To recap, given something like (x & y) & C reassociation will turn that
>>> into (x & C) & y.  It's functionally equivalent, but it will inhibit
>>> generation of bit test instructions.
>>>
>>> I originally hacked up swap_ops_for_binary_stmt.  You requested that
>>> change be made in reassociate_bb so that it would apply to cases where
>>> there are more than 3 args.
>>>
>>> So that's what this patch does.   OK for the trunk now?
>>>
>>> Bootstrapped and regression tested on x86_64.  Also tested the new
>>> testcase on m68k.
>>>
>>>
>>> commit c10ae0339674c27c89a1fa1904217a55bf530cb3
>>> Author: Jeff Law <law@torsion.usersys.redhat.com>
>>> Date:   Sun Sep 3 10:42:30 2017 -0400
>>>
>>>     2017-09-03  Jeff Law  <law@redhat.com>
>>>
>>>             PR tree-optimization/64910
>>>             * tree-ssa-reassoc.c (reassociate_bb): For bitwise binary ops,
>>>             swap the first and last operand if the last is a constant.
>>>
>>>             PR tree-optimization/64910
>>>             * gcc.dg/tree-ssa/pr64910-2.c: New test.
>>>
>>> diff --git a/gcc/ChangeLog b/gcc/ChangeLog
>>> index 3f632ca31c2..2c9a8c8265a 100644
>>> --- a/gcc/ChangeLog
>>> +++ b/gcc/ChangeLog
>>> @@ -1,3 +1,9 @@
>>> +2017-09-03  Jeff Law  <law@redhat.com>
>>> +
>>> +       PR tree-optimization/64910
>>> +       * tree-ssa-reassoc.c (reassociate_bb): For bitwise binary ops,
>>> +       swap the first and last operand if the last is a constant.
>>> +
>>>  2017-09-01  Segher Boessenkool  <segher@kernel.crashing.org>
>>>
>>>         PR rtl-optimization/82024
>>> diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
>>> index 4ead57edfa2..766677c899b 100644
>>> --- a/gcc/testsuite/ChangeLog
>>> +++ b/gcc/testsuite/ChangeLog
>>> @@ -1,3 +1,8 @@
>>> +2017-09-03  Jeff Law  <law@redhat.com>
>>> +
>>> +       PR tree-optimization/64910
>>> +       * gcc.dg/tree-ssa/pr64910-2.c: New test.
>>> +
>>>  2017-09-01  Jakub Jelinek  <jakub@redhat.com>
>>>
>>>         PR target/81766
>>> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr64910-2.c b/gcc/testsuite/gcc.dg/tree-ssa/pr64910-2.c
>>> new file mode 100644
>>> index 00000000000..2e3d6790776
>>> --- /dev/null
>>> +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr64910-2.c
>>> @@ -0,0 +1,85 @@
>>> +/* { dg-do compile } */
>>> +/* { dg-options "-O2 -fdump-tree-reassoc1" } */
>>> +
>>> +/* We want to make sure that we reassociate in a way that has the
>>> +   constant last.  With the constant last, it's more likely to result
>>> +   in a bitfield test on targets with such capabilities.  */
>>> +
>>> +extern void boo ();
>>> +
>>> +int b2b_uc (unsigned char u, unsigned char w)
>>> +{
>>> +  if ((u & w) & 0x20)
>>> +    boo ();
>>> +}
>>> +
>>> +int b2b_us (unsigned short u, unsigned short w)
>>> +{
>>> +  if ((u & w) & 0x20)
>>> +    boo ();
>>> +}
>>> +
>>> +int b2b_ui (unsigned int u, unsigned int w)
>>> +{
>>> +  if ((u & w) & 0x20)
>>> +    boo ();
>>> +}
>>> +int b2b_ul (unsigned long u, unsigned long w)
>>> +{
>>> +  if ((u & w) & 0x20)
>>> +    boo ();
>>> +}
>>> +int b2b_ull (unsigned long long u, unsigned long long w)
>>> +{
>>> +  if ((u & w) & 0x20)
>>> +    boo ();
>>> +}
>>> +
>>> +int b2b_sc (signed char u, signed char w)
>>> +{
>>> +  if ((u & w) & 0x20)
>>> +    boo ();
>>> +}
>>> +
>>> +int b2b_ss (signed short u, signed short w)
>>> +{
>>> +  if ((u & w) & 0x20)
>>> +    boo ();
>>> +}
>>> +
>>> +int b2b_si (signed int u, signed int w)
>>> +{
>>> +  if ((u & w) & 0x20)
>>> +    boo ();
>>> +}
>>> +int b2b_sl (signed long u, signed long w)
>>> +{
>>> +  if ((u & w) & 0x20)
>>> +    boo ();
>>> +}
>>> +int b2b_sll (signed long long u, signed long long w)
>>> +{
>>> +  if ((u & w) & 0x20)
>>> +    boo ();
>>> +}
>>> +
>>> +/* The AND of U & W should go into a temporary, when is then ANDed
>>> +   with the constant.
>>> +
>>> +   First verify that we have the right number of ANDs between U and W.  */
>>> +/* { dg-final { scan-tree-dump-times "\[uw\]_\[0-9\]+.D. \& \[uw\]_\[0-9\]+.D.;" 10 "reassoc1"} } */
>>> +
>>> +/* Then verify that we have the right number of ANDS between a temporary
>>> +   and the constant.  */
>>> +/* { dg-final { scan-tree-dump-times "_\[0-9]+ \& 32;" 10 "reassoc1"} } */
>>
>> This part of the new testcase fails on aarch64 & arm:
>> FAIL: gcc.dg/tree-ssa/pr64910-2.c scan-tree-dump-times reassoc1
>> "_[0-9]+ & 32;" 10
> Thanks.  I'm investigating.
So when there are precisely 2 operands the most recent change will
create non-canonical gimple.  Specifically we can end up with

cst OP ssa_name

On some targets (like aarch64, arm, likely others).

Nothing actually cared -- the form got rewritten at some point and
aarch64 at least generates the desired assembly code.  But since the
test was looking at the .reassoc dump, it failed.

The fix is easy.  Just restrict the operand swapping to cases where
there are 3 or more ops.  Bootstrapped and regression tested on aarch64.

Committing.

Jeff

-------------- next part --------------
commit 077cf883c3e983369e68b27307ec4944c7097393
Author: law <law@138bc75d-0d04-0410-961f-82ee72b054a4>
Date:   Wed Sep 6 05:20:25 2017 +0000

            PR tree-optimization/64910
            * tree-ssa-reassoc.c (reassociate_bb): Restrict last change to
            cases where we have 3 or more operands.
    
    git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@251751 138bc75d-0d04-0410-961f-82ee72b054a4

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 53637905722..1b3bddbb806 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,9 @@
+2017-09-05  Jeff Law  <law@redhat.com>
+
+	PR tree-optimization/64910
+	* tree-ssa-reassoc.c (reassociate_bb): Restrict last change to
+	cases where we have 3 or more operands.
+
 2017-09-05  Jakub Jelinek  <jakub@redhat.com>
 
 	PR middle-end/81768
diff --git a/gcc/tree-ssa-reassoc.c b/gcc/tree-ssa-reassoc.c
index 76048196b27..2fb6aef51d7 100644
--- a/gcc/tree-ssa-reassoc.c
+++ b/gcc/tree-ssa-reassoc.c
@@ -5763,14 +5763,15 @@ reassociate_bb (basic_block bb)
 			     "Width = %d was chosen for reassociation\n", width);
 
 
-		  /* For binary bit operations, if the last operand in
-		     OPS is a constant, move it to the front.  This
-		     helps ensure that we generate (X & Y) & C rather
-		     than (X & C) & Y.  The former will often match
-		     a canonical bit test when we get to RTL.  */
-		  if ((rhs_code == BIT_AND_EXPR
-		       || rhs_code == BIT_IOR_EXPR
-		       || rhs_code == BIT_XOR_EXPR)
+		  /* For binary bit operations, if there are at least 3
+		     operands and the last last operand in OPS is a constant,
+		     move it to the front.  This helps ensure that we generate
+		     (X & Y) & C rather than (X & C) & Y.  The former will
+		     often match a canonical bit test when we get to RTL.  */
+		  if (ops.length () != 2
+		      && (rhs_code == BIT_AND_EXPR
+		          || rhs_code == BIT_IOR_EXPR
+		          || rhs_code == BIT_XOR_EXPR)
 		      && TREE_CODE (ops.last ()->op) == INTEGER_CST)
 		    std::swap (*ops[0], *ops[ops_num - 1]);
 


More information about the Gcc-patches mailing list