This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [RFA] [PATCH][PR tree-optimization/64910] Fix reassociation of binary bitwise operations with 3 operands


On 09/05/2017 12:38 AM, Christophe Lyon wrote:
> Hi Jeff,
> 
> 
> On 3 September 2017 at 16:44, Jeff Law <law@redhat.com> wrote:
>> On 01/13/2016 05:30 AM, Richard Biener wrote:
>>> On Wed, Jan 13, 2016 at 7:39 AM, Jeff Law <law@redhat.com> wrote:
>>>> On 01/12/2016 08:11 AM, Richard Biener wrote:
>>>>>
>>>>> On Tue, Jan 12, 2016 at 6:10 AM, Jeff Law <law@redhat.com> wrote:
>>>>>>
>>>>>> On 01/11/2016 03:32 AM, Richard Biener wrote:
>>>>>>
>>>>>>>
>>>>>>> Yeah, reassoc is largely about canonicalization.
>>>>>>>
>>>>>>>> Plus doing it in TER is almost certainly more complex than getting it
>>>>>>>> right
>>>>>>>> in reassoc to begin with.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> I guess canonicalizing differently is ok but you'll still create
>>>>>>> ((a & b) & 1) & c then if you only change the above place.
>>>>>>
>>>>>>
>>>>>> What's best for that expression would depend on factors like whether or
>>>>>> not
>>>>>> the target can exploit ILP.  ie (a & b) & (1 & c) exposes more
>>>>>> parallelism
>>>>>> while (((a & b) & c) & 1) is not good for parallelism, but does expose
>>>>>> the
>>>>>> bit test.
>>>>>>
>>>>>> reassoc currently generates ((a & 1) & b) & c which is dreadful as
>>>>>> there's
>>>>>> no ILP or chance of creating a bit test.  My patch shuffles things
>>>>>> around,
>>>>>> but still doesn't expose the ILP or bit test in the 4 operand case.
>>>>>> Based
>>>>>> on the comments in reassoc, it didn't seem like the author thought
>>>>>> anything
>>>>>> beyond the 3-operand case was worth handling. So my patch just handles
>>>>>> the
>>>>>> 3-operand case.
>>>>>>
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> So I'm not sure what pattern the backend is looking for?
>>>>>>
>>>>>>
>>>>>> It just wants the constant last in the sequence.  That exposes bit clear,
>>>>>> set, flip, test, etc idioms.
>>>>>
>>>>>
>>>>> But those don't feed another bit operation, right?  Thus we'd like to see
>>>>> ((a & b) & c) & 1, not ((a & b) & 1) & c?  It sounds like the instructions
>>>>> are designed to feed conditionals (aka CC consuming ops)?
>>>>
>>>> At the gimple level they could feed a conditional, or be part of a series of
>>>> ops on an SSA_NAME that eventually gets stored to memory, etc.  At the RTL
>>>> level they'll feed CC consumers and bit manipulations of pseudos or memory.
>>>>
>>>> For the 3-op case, we always want the constant last.  For the 4-op case it's
>>>> less clear.  Though ((a & b) & c) & 1 is certainly better than ((a & b) & 1)
>>>> & c.
>>>
>>> Ok, so handling it in swap_ops_for_binary_stmt is merely a convenient place
>>> to special-case bitwise ops.  The "real" fix to the sorting heuristic would be
>>> to sort constants at the opposite end.
>>>
>>> That might be too invasive right now but there is another "convenient" place:
>>>
>>>               /* If the operand vector is now empty, all operands were
>>>                  consumed by the __builtin_powi optimization.  */
>>> ...
>>>               else
>>>                 {
>>>                   machine_mode mode = TYPE_MODE (TREE_TYPE (lhs));
>>>                   int ops_num = ops.length ();
>>>                   int width = get_reassociation_width (ops_num, rhs_code, mode);
>>>                   tree new_lhs = lhs;
>>>
>>>                   if (dump_file && (dump_flags & TDF_DETAILS))
>>>                     fprintf (dump_file,
>>>                              "Width = %d was chosen for
>>> reassociation\n", width);
>>>
>>> at this point you can check rhs_code and move the (single) constant
>>> entry in ops (if there is any constant entry) from .last () to the beginning.
>>>
>>> That'll get the 4 operand case correct as well and properly models
>>> "constant last" for the specified operation kind.
>> Resurrecting an old thread...  Just now getting around to flushing this
>> out of the queue.
>>
>> To recap, given something like (x & y) & C reassociation will turn that
>> into (x & C) & y.  It's functionally equivalent, but it will inhibit
>> generation of bit test instructions.
>>
>> I originally hacked up swap_ops_for_binary_stmt.  You requested that
>> change be made in reassociate_bb so that it would apply to cases where
>> there are more than 3 args.
>>
>> So that's what this patch does.   OK for the trunk now?
>>
>> Bootstrapped and regression tested on x86_64.  Also tested the new
>> testcase on m68k.
>>
>>
>> commit c10ae0339674c27c89a1fa1904217a55bf530cb3
>> Author: Jeff Law <law@torsion.usersys.redhat.com>
>> Date:   Sun Sep 3 10:42:30 2017 -0400
>>
>>     2017-09-03  Jeff Law  <law@redhat.com>
>>
>>             PR tree-optimization/64910
>>             * tree-ssa-reassoc.c (reassociate_bb): For bitwise binary ops,
>>             swap the first and last operand if the last is a constant.
>>
>>             PR tree-optimization/64910
>>             * gcc.dg/tree-ssa/pr64910-2.c: New test.
>>
>> diff --git a/gcc/ChangeLog b/gcc/ChangeLog
>> index 3f632ca31c2..2c9a8c8265a 100644
>> --- a/gcc/ChangeLog
>> +++ b/gcc/ChangeLog
>> @@ -1,3 +1,9 @@
>> +2017-09-03  Jeff Law  <law@redhat.com>
>> +
>> +       PR tree-optimization/64910
>> +       * tree-ssa-reassoc.c (reassociate_bb): For bitwise binary ops,
>> +       swap the first and last operand if the last is a constant.
>> +
>>  2017-09-01  Segher Boessenkool  <segher@kernel.crashing.org>
>>
>>         PR rtl-optimization/82024
>> diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
>> index 4ead57edfa2..766677c899b 100644
>> --- a/gcc/testsuite/ChangeLog
>> +++ b/gcc/testsuite/ChangeLog
>> @@ -1,3 +1,8 @@
>> +2017-09-03  Jeff Law  <law@redhat.com>
>> +
>> +       PR tree-optimization/64910
>> +       * gcc.dg/tree-ssa/pr64910-2.c: New test.
>> +
>>  2017-09-01  Jakub Jelinek  <jakub@redhat.com>
>>
>>         PR target/81766
>> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr64910-2.c b/gcc/testsuite/gcc.dg/tree-ssa/pr64910-2.c
>> new file mode 100644
>> index 00000000000..2e3d6790776
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr64910-2.c
>> @@ -0,0 +1,85 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-O2 -fdump-tree-reassoc1" } */
>> +
>> +/* We want to make sure that we reassociate in a way that has the
>> +   constant last.  With the constant last, it's more likely to result
>> +   in a bitfield test on targets with such capabilities.  */
>> +
>> +extern void boo ();
>> +
>> +int b2b_uc (unsigned char u, unsigned char w)
>> +{
>> +  if ((u & w) & 0x20)
>> +    boo ();
>> +}
>> +
>> +int b2b_us (unsigned short u, unsigned short w)
>> +{
>> +  if ((u & w) & 0x20)
>> +    boo ();
>> +}
>> +
>> +int b2b_ui (unsigned int u, unsigned int w)
>> +{
>> +  if ((u & w) & 0x20)
>> +    boo ();
>> +}
>> +int b2b_ul (unsigned long u, unsigned long w)
>> +{
>> +  if ((u & w) & 0x20)
>> +    boo ();
>> +}
>> +int b2b_ull (unsigned long long u, unsigned long long w)
>> +{
>> +  if ((u & w) & 0x20)
>> +    boo ();
>> +}
>> +
>> +int b2b_sc (signed char u, signed char w)
>> +{
>> +  if ((u & w) & 0x20)
>> +    boo ();
>> +}
>> +
>> +int b2b_ss (signed short u, signed short w)
>> +{
>> +  if ((u & w) & 0x20)
>> +    boo ();
>> +}
>> +
>> +int b2b_si (signed int u, signed int w)
>> +{
>> +  if ((u & w) & 0x20)
>> +    boo ();
>> +}
>> +int b2b_sl (signed long u, signed long w)
>> +{
>> +  if ((u & w) & 0x20)
>> +    boo ();
>> +}
>> +int b2b_sll (signed long long u, signed long long w)
>> +{
>> +  if ((u & w) & 0x20)
>> +    boo ();
>> +}
>> +
>> +/* The AND of U & W should go into a temporary, when is then ANDed
>> +   with the constant.
>> +
>> +   First verify that we have the right number of ANDs between U and W.  */
>> +/* { dg-final { scan-tree-dump-times "\[uw\]_\[0-9\]+.D. \& \[uw\]_\[0-9\]+.D.;" 10 "reassoc1"} } */
>> +
>> +/* Then verify that we have the right number of ANDS between a temporary
>> +   and the constant.  */
>> +/* { dg-final { scan-tree-dump-times "_\[0-9]+ \& 32;" 10 "reassoc1"} } */
> 
> This part of the new testcase fails on aarch64 & arm:
> FAIL: gcc.dg/tree-ssa/pr64910-2.c scan-tree-dump-times reassoc1
> "_[0-9]+ & 32;" 10
Thanks.  I'm investigating.

Jeff


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]