This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [RFC][PR82479] missing popcount builtin detection
- From: Richard Biener <richard dot guenther at gmail dot com>
- To: Kugan Vivekanandarajah <kugan dot vivekanandarajah at linaro dot org>, "Amker.Cheng" <amker dot cheng at gmail dot com>
- Cc: GCC Patches <gcc-patches at gcc dot gnu dot org>
- Date: Mon, 5 Mar 2018 16:24:27 +0100
- Subject: Re: [RFC][PR82479] missing popcount builtin detection
- Authentication-results: sourceware.org; auth=none
- References: <CAELXzTNXvW1jo4hbn3LHgo5ne=u-0tFJB5Ri3-P=1z-_YHNnFA@mail.gmail.com> <CAFiYyc39pjEXHwzM-jJjgLZRSZcnaMDWBREKgEEws5W3p2SF9Q@mail.gmail.com> <CAELXzTOXPkz+Uy15PO=X8S8qvE2znQNXz8knqKHHUEU9wHRP2w@mail.gmail.com> <CAFiYyc15+U_WgDmNys=ep=+uKm9usLCz-2YHYYBGJGZBN8djKg@mail.gmail.com> <CAELXzTOztCE+KXr8wM49pqGKMsF6dTm-1gXixOikmNT5yEe62Q@mail.gmail.com> <CAFiYyc0fr=O5z2td-=X8LNTea13qR7AAyJ0=_VyMZ-4z3WkAOQ@mail.gmail.com> <CAELXzTPuVQWyLKQOie5vPtYsx=_06xCXeum4NG3mT2R+SMPWrw@mail.gmail.com>
On Thu, Feb 8, 2018 at 1:41 AM, Kugan Vivekanandarajah
<kugan.vivekanandarajah@linaro.org> wrote:
> Hi Richard,
>
> On 1 February 2018 at 23:21, Richard Biener <richard.guenther@gmail.com> wrote:
>> On Thu, Feb 1, 2018 at 5:07 AM, Kugan Vivekanandarajah
>> <kugan.vivekanandarajah@linaro.org> wrote:
>>> Hi Richard,
>>>
>>> On 31 January 2018 at 21:39, Richard Biener <richard.guenther@gmail.com> wrote:
>>>> On Wed, Jan 31, 2018 at 11:28 AM, Kugan Vivekanandarajah
>>>> <kugan.vivekanandarajah@linaro.org> wrote:
>>>>> Hi Richard,
>>>>>
>>>>> Thanks for the review.
>>>>> On 25 January 2018 at 20:04, Richard Biener <richard.guenther@gmail.com> wrote:
>>>>>> On Wed, Jan 24, 2018 at 10:56 PM, Kugan Vivekanandarajah
>>>>>> <kugan.vivekanandarajah@linaro.org> wrote:
>>>>>>> Hi All,
>>>>>>>
>>>>>>> Here is a patch for popcount builtin detection similar to LLVM. I
>>>>>>> would like to queue this for review for next stage 1.
>>>>>>>
>>>>>>> 1. This is done part of loop-distribution and effective for -O3 and above.
>>>>>>> 2. This does not distribute loop to detect popcount (like
>>>>>>> memcpy/memmove). I dont think that happens in practice. Please correct
>>>>>>> me if I am wrong.
>>>>>>
>>>>>> But then it has no business inside loop distribution but instead is
>>>>>> doing final value
>>>>>> replacement, right? You are pattern-matching the whole loop after all. I think
>>>>>> final value replacement would already do the correct thing if you
>>>>>> teached number of
>>>>>> iteration analysis that niter for
>>>>>>
>>>>>> <bb 3> [local count: 955630224]:
>>>>>> # b_11 = PHI <b_5(5), b_8(6)>
>>>>>> _1 = b_11 + -1;
>>>>>> b_8 = _1 & b_11;
>>>>>> if (b_8 != 0)
>>>>>> goto <bb 6>; [89.00%]
>>>>>> else
>>>>>> goto <bb 8>; [11.00%]
>>>>>>
>>>>>> <bb 6> [local count: 850510900]:
>>>>>> goto <bb 3>; [100.00%]
>>>>>
>>>>> I am looking into this approach. What should be the scalar evolution
>>>>> for b_8 (i.e. b & (b -1) in a loop) should be? This is not clear to me
>>>>> and can this be represented with the scev?
>>>>
>>>> No, it's not affine and thus cannot be represented. You only need the
>>>> scalar evolution of the counting IV which is already handled and
>>>> the number of iteration analysis needs to handle the above IV - this
>>>> is the missing part.
>>> Thanks for the clarification. I am now matching this loop pattern in
>>> number_of_iterations_exit when number_of_iterations_exit_assumptions
>>> fails. If the pattern matches, I am inserting the _builtin_popcount in
>>> the loop preheater and setting the loop niter with this. This will be
>>> used by the final value replacement. Is this what you wanted?
>>
>> No, you shouldn't insert a popcount stmt but instead the niter
>> GENERIC tree should be a CALL_EXPR to popcount with the
>> appropriate argument.
>
> Thats what I tried earlier but ran into some ICEs. I wasn't sure if
> niter in tree_niter_desc can take such.
>
> Attached patch now does this. Also had to add support for CALL_EXPR in
> few places to handle niter with CALL_EXPR. Does this look OK?
Overall this looks ok - the patch includes changes in places that I don't think
need changes such as chrec_convert_1 or extract_ops_from_tree.
The expression_expensive_p change should be more specific than making
all calls inexpensive as well.
The verify_ssa change looks bogus, you do
+ dest = gimple_phi_result (count_phi);
+ tree var = make_ssa_name (TREE_TYPE (dest), NULL);
+ tree fn = builtin_decl_implicit (BUILT_IN_POPCOUNT);
+
+ var = build_call_expr (fn, 1, src);
+ *niter = fold_build2 (MINUS_EXPR, TREE_TYPE (dest), var,
+ build_int_cst (TREE_TYPE (dest), 1));
why do you allocate a new SSA name here? It seems unused
as you overwrive 'var' with the CALL_EXPR immediately.
I didn't review the pattern matching thoroughly nor the exact place you
call it. But
+ if (check_popcount_pattern (loop, &count))
+ {
+ niter->assumptions = boolean_false_node;
+ niter->control.base = NULL_TREE;
+ niter->control.step = NULL_TREE;
+ niter->control.no_overflow = false;
+ niter->niter = count;
+ niter->assumptions = boolean_true_node;
+ niter->may_be_zero = boolean_false_node;
+ niter->max = -1;
+ niter->bound = NULL_TREE;
+ niter->cmp = ERROR_MARK;
+ return true;
+ }
simply setting may_be_zero to false looks fishy. Try
with -fno-tree-loop-ch. Also max should not be negative,
it should be the number of bits in the IV type?
A related testcase could be that we can completely peel
a loop like the following which iterates at most 8 times:
int a[8];
void foo (unsigned char ctrl)
{
int c = 0;
while (ctrl)
{
ctrl = ctrl & (ctrl - 1);
a[c++] = ctrl;
}
}
This is now stage1 material so please update and re-post. Maybe Bin has
further suggestions as well.
Thanks,
Richard.
> Thanks,
> Kugan
>
>
> gcc/ChangeLog:
>
> 2018-02-08 Kugan Vivekanandarajah <kuganv@linaro.org>
>
> * gimple-expr.c (extract_ops_from_tree): Handle CALL_EXPR.
> * tree-chrec.c (chrec_convert_1): Likewise.
> * tree-scalar-evolution.c (expression_expensive_p): Likewise.
> * tree-ssa-loop-ivopts.c (contains_abnormal_ssa_name_p): Likewise.
> * tree-ssa-loop-niter.c (check_popcount_pattern): New.
> (number_of_iterations_exit): Record niter for popcount patern.
> * tree-ssa.c (verify_ssa): Check stmt to be non NULL.
>
> gcc/testsuite/ChangeLog:
>
> 2018-02-08 Kugan Vivekanandarajah <kuganv@linaro.org>
>
> * gcc.dg/tree-ssa/popcount.c: New test.
>
>
>>
>>> In general, there is also a condition gating the loop like
>>>
>>> if (b_4 != 0)
>>> goto loop;
>>> else
>>> end:
>>>
>>> This of course will not be removed by final value replacement. Since
>>> popcount (0) is defined, this is redundant and could be removed but
>>> not removed.
>>
>> Yeah, that's probably sth for another pass though. I suppose the
>> end: case just uses zero in which case you'll have a PHI and you
>> can optimize
>>
>> if (b != 0)
>> return popcount (b);
>> return 0;
>>
>> in phiopt.
>>
>> Richard.
>>
>>> Thanks,
>>> Kugan
>>>
>>>>
>>>> Richard.
>>>>
>>>>> Thanks,
>>>>> Kugan
>>>>>>
>>>>>> is related to popcount (b_5).
>>>>>>
>>>>>> Richard.
>>>>>>
>>>>>>> Bootstrapped and regression tested on aarch64-linux-gnu with no new regressions.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Kugan
>>>>>>>
>>>>>>> gcc/ChangeLog:
>>>>>>>
>>>>>>> 2018-01-25 Kugan Vivekanandarajah <kuganv@linaro.org>
>>>>>>>
>>>>>>> PR middle-end/82479
>>>>>>> * tree-loop-distribution.c (handle_popcount): New.
>>>>>>> (pass_loop_distribution::execute): Use handle_popcount.
>>>>>>>
>>>>>>> gcc/testsuite/ChangeLog:
>>>>>>>
>>>>>>> 2018-01-25 Kugan Vivekanandarajah <kuganv@linaro.org>
>>>>>>>
>>>>>>> PR middle-end/82479
>>>>>>> * gcc.dg/tree-ssa/popcount.c: New test.