Transform (x >> cst) != 0 to x >= (1 << cst) and (x >> cst) == 0 to x < (1 << cst)

Wed Oct 4 18:35:00 GMT 2017

On 3 October 2017 at 14:10, Jeff Law <law@redhat.com> wrote:
> On 10/03/2017 03:00 PM, Marc Glisse wrote:
>> On Tue, 3 Oct 2017, Jakub Jelinek wrote:
>>
>>> On Tue, Oct 03, 2017 at 12:54:39PM -0700, Prathamesh Kulkarni wrote:
>>>> Hi,
>>>> This follow-up patch implements the patterns mentioned in $subject.
>>>> Bootstrap+test in progress on x86_64-unknown-linux-gnu and
>>>> aarch64-linux-gnu.
>>>> OK to commit if passes ?
>>>>
>>>> Thanks,
>>>> Prathamesh
>>>
>>>> 2017-10-03  Prathamesh Kulkarni  <prathamesh.kulkarni@linaro.org>
>>>>
>>>>     * match.pd ((X >> CST) == 0 -> X < (1 << CST)): New pattern.
>>>>     ((X >> CST) != 0 -> X >= (1 << CST)): Likewise.
>>
>> build_int_cstu doesn't work for vectors, you want build_one_cst. I never
>> know if we should check single_use or not :-(
Thanks for the pointers, I wasn't aware about build_one_cst.
>>
>>>> testsuite/
>>>>     * gcc.dg/tree-ssa/cmpdiv.c: Add test-cases f3 and f4.
>>>
>>> Why this way and not the other way around?
>>
>> For high level gimple optimizations, X < CST is more convenient (and
>> smaller, just one insn) than (X >> CST) == 0.
> Right.  One could easily argue that Prathamesh's form should be the
> preferred form because it is simpler at the gimple level -- and that
> x86-isms should be dealt with later in the pipeline.
>
>
>>
>>> E.g. i?86/x86_64 and various other targets have shift instructions which
>>> set condition codes, so (X >> 51) == 0 is certainly smaller
>>> smaller and I believe cheaper than the latter.
>>> Try:
>>> void foo (void);
>>>
>>> void
>>> bar (unsigned long long x)
>>> {
>>>  if ((x >> 51) == 0)
>>>    foo ();
>>> }
>>>
>>> void
>>> baz (unsigned long long x)
>>> {
>>>  if (x < (1LL << 51))
>>>    foo ();
>>> }
>>> with -Os on x86_64, the first function is 4 insns, 12 bytes,
>>> the second one 5 insns, 21 bytes.
Indeed, I can reproduce this behavior. Thanks for pointing it out.
>>>
>>> I wonder if this kind of instruction selection stuff shouldn't be
>>> done in target.pd instead, with input from the target.
> Right, but I think that argues that Prathamesh's patch is the right
> direction and that to move forward what needs to happen is something
> needs to be fixed at the gimple/rtl border to ensure we get good x86 code.
>
>>
>> At a late stage, maybe during an RTL pass or expansion (or just before
>> expansion) it would indeed be good to generate a shift for such
>> comparisons, on targets where that sets a cc. The lack of this
>> transformation could be considered a blocker for the other one, to avoid
>> regressing on bar.
> Right.  In fact, I think Jakub's test ought to be added to this work as
> part of its basic testing.
Um, how to write the above-test case for bar() in DejaGNU format ?
Is it possible to test for number of insns or to use nm to test for
size of a function with dejagnu directive ?
Or should I scan for "cmpq" since the pattern transforms a right shift
and cmp against 0 to cmp between operands ?

Thanks,
Prathamesh
>
> jeff
>