[match.pd] Fix for PR35691

Tue Nov 8 11:20:00 GMT 2016

On 08/11/16 11:16, Richard Biener wrote:
> On Tue, 8 Nov 2016, Prathamesh Kulkarni wrote:
>
>> On 8 November 2016 at 13:23, Richard Biener <rguenther@suse.de> wrote:
>>> On Mon, 7 Nov 2016, Prathamesh Kulkarni wrote:
>>>
>>>> On 7 November 2016 at 23:06, Prathamesh Kulkarni
>>>> <prathamesh.kulkarni@linaro.org> wrote:
>>>>> On 7 November 2016 at 15:43, Richard Biener <rguenther@suse.de> wrote:
>>>>>> On Fri, 4 Nov 2016, Prathamesh Kulkarni wrote:
>>>>>>
>>>>>>> On 4 November 2016 at 13:41, Richard Biener <rguenther@suse.de> wrote:
>>>>>>>> On Thu, 3 Nov 2016, Marc Glisse wrote:
>>>>>>>>
>>>>>>>>> On Thu, 3 Nov 2016, Richard Biener wrote:
>>>>>>>>>
>>>>>>>>>>>>> The transform would also work for vectors (element_precision for
>>>>>>>>>>>>> the test but also a value-matching zero which should ensure the
>>>>>>>>>>>>> same number of elements).
>>>>>>>>>>>> Um sorry, I didn't get how to check vectors to be of equal length by a
>>>>>>>>>>>> matching zero.
>>>>>>>>>>>> Could you please elaborate on that ?
>>>>>>>>>>> He may have meant something like:
>>>>>>>>>>>
>>>>>>>>>>>    (op (cmp @0 integer_zerop@2) (cmp @1 @2))
>>>>>>>>>> I meant with one being @@2 to allow signed vs. Unsigned @0/@1 which was the
>>>>>>>>>> point of the pattern.
>>>>>>>>> Oups, that's what I had written first, and then I somehow managed to confuse
>>>>>>>>> myself enough to remove it so as to remove the call to types_match :-(
>>>>>>>>>
>>>>>>>>>>> So the last operand is checked with operand_equal_p instead of
>>>>>>>>>>> integer_zerop. But the fact that we could compute bit_ior on the
>>>>>>>>>>> comparison results should already imply that the number of elements is the
>>>>>>>>>>> same.
>>>>>>>>>> Though for equality compares we also allow scalar results IIRC.
>>>>>>>>> Oh, right, I keep forgetting that :-( And I have no idea how to generate one
>>>>>>>>> for a testcase, at least until the GIMPLE FE lands...
>>>>>>>>>
>>>>>>>>>>> On platforms that have IOR on floats (at least x86 with SSE, maybe some
>>>>>>>>>>> vector mode on s390?), it would be cool to do the same for floats (most
>>>>>>>>>>> likely at the RTL level).
>>>>>>>>>> On GIMPLE view-converts could come to the rescue here as well.  Or we cab
>>>>>>>>>> just allow bit-and/or on floats as much as we allow them on pointers.
>>>>>>>>> Would that generate sensible code on targets that do not have logic insns for
>>>>>>>>> floats? Actually, even on x86_64 that generates inefficient code, so there
>>>>>>>>> would be some work (for instance grep finds no gen_iordf3, only gen_iorv2df3).
>>>>>>>>>
>>>>>>>>> I am also a bit wary of doing those obfuscating optimizations too early...
>>>>>>>>> a==0 is something that other optimizations might use. long
>>>>>>>>> c=(long&)a|(long&)b; (double&)c==0; less so...
>>>>>>>>>
>>>>>>>>> (and I am assuming that signaling NaNs don't make the whole transformation
>>>>>>>>> impossible, which might be wrong)
>>>>>>>> Yeah.  I also think it's not so much important - I just wanted to mention
>>>>>>>> vectors...
>>>>>>>>
>>>>>>>> Btw, I still think we need a more sensible infrastructure for passes
>>>>>>>> to gather, analyze and modify complex conditions.  (I'm always pointing
>>>>>>>> to tree-affine.c as an, albeit not very good, example for handling
>>>>>>>> a similar problem)
>>>>>>> Thanks for mentioning the value-matching capture @@, I wasn't aware of
>>>>>>> this match.pd feature.
>>>>>>> The current patch keeps it restricted to only bitwise operators on integers.
>>>>>>> Bootstrap+test running on x86_64-unknown-linux-gnu.
>>>>>>> OK to commit if passes ?
>>>>>> +/* PR35691: Transform
>>>>>> +   (x == 0 & y == 0) -> (x | typeof(x)(y)) == 0.
>>>>>> +   (x != 0 | y != 0) -> (x | typeof(x)(y)) != 0.  */
>>>>>> +
>>>>>>
>>>>>> Please omit the vertical space
>>>>>>
>>>>>> +(for bitop (bit_and bit_ior)
>>>>>> +     cmp (eq ne)
>>>>>> + (simplify
>>>>>> +  (bitop (cmp @0 integer_zerop) (cmp @1 integer_zerop))
>>>>>>
>>>>>> if you capture the first integer_zerop as @2 then you can re-use it...
>>>>>>
>>>>>> +   (if (INTEGRAL_TYPE_P (TREE_TYPE (@0))
>>>>>> +       && INTEGRAL_TYPE_P (TREE_TYPE (@1))
>>>>>> +       && TYPE_PRECISION (TREE_TYPE (@0)) == TYPE_PRECISION (TREE_TYPE
>>>>>> (@1)))
>>>>>> +    (cmp (bit_ior @0 (convert @1)) { build_zero_cst (TREE_TYPE (@0));
>>>>>>
>>>>>> ... here inplace of the { build_zero_cst ... }.
>>>>>>
>>>>>> Ok with that changes.
>>>>> Thanks, committed the attached version as r241915.
>>>> ugh, the svn commit message has:
>>>>
>>>> testsuite/
>>>> * gcc.dg/pr35691-1.c: New test-case.
>>>> * gcc.dg/pr35691-4.c: Likewise.
>>>>
>>>> pr35691-4.c was a typo, should be pr35691-2.c :/
>>>> However testsuite/ChangeLog correctly has entry for pr35691-2.c
>>>> Is it possible to edit the commit message for r241915 ?
>>>> Sorry about this.
>>> No, just leave it as-is.
>> Hi,
>> Chritstophe reported to me that the commit caused test-cases
>> pr35691-1.c and pr35691-2.c (which were added by the commit)
>> to FAIL for cortex-a5:
>> http://people.linaro.org/~christophe.lyon/cross-validation/gcc/trunk/241915/arm-none-linux-gnueabihf/diff-gcc-rh60-arm-none-linux-gnueabihf-arm-cortex-a5-vfpv3-d16-fp16.txt
>>
>> It seems truth_andif_expr is not simplified to bit_and_expr on
>> cortex-a5 as for x86_64 (and other arm variants).
>> The differences in dumps start from 004t.gimple for pr35691-1.c:
>>
>> x86_64 gimple dump:
>> foo (int z0, unsigned int z1)
>> {
>>    int D.1800;
>>    int t0;
>>    int t1;
>>    int t2;
>>
>>    _1 = z0 == 0;
>>    t0 = (int) _1;
>>    _2 = z1 == 0;
>>    t1 = (int) _2;
>>    _3 = t0 != 0;
>>    _4 = t1 != 0;
>>    _5 = _3 & _4;
>>    t2 = (int) _5;
>>    D.1800 = t2;
>>    return D.1800;
>> }
>>
>> cortex-a5 gimple dump:
>> foo (int z0, unsigned int z1)
>> {
>>    int iftmp.0;
>>    int D.4176;
>>    int t0;
>>    int t1;
>>    int t2;
>>
>>    _1 = z0 == 0;
>>    t0 = (int) _1;
>>    _2 = z1 == 0;
>>    t1 = (int) _2;
>>    if (t0 != 0) goto <D.4174>; else goto <D.4172>;
>>    <D.4174>:
>>    if (t1 != 0) goto <D.4175>; else goto <D.4172>;
>>    <D.4175>:
>>    iftmp.0 = 1;
>>    goto <D.4173>;
>>    <D.4172>:
>>    iftmp.0 = 0;
>>    <D.4173>:
>>    t2 = iftmp.0;
>>    D.4176 = t2;
>>    return D.4176;
>> }
>>
>> Since the pattern expects truth_andif_expr to be converted to bit_and_expr,
>> it fails to match for cortex-a5.
>> This seems to happen only for cortex-a5 (the other variants a9, a15,
>> a57 are OK).
>>
>> Is my assumption that truth_andif_expr would be always converted to bit_and_expr
>> for above case incorrect ?
> Yes, it depends on LOGICAL_OP_SHORT_CIRCUIT.

Right, and that varies by core in the arm backend.
looking at the tuning structures in arm.c I'd expect exynosm1 to also exhibit this

Kyrill

> Richard.
>
>> Thanks,
>> Prathamesh
>>> Richard.
>>>
>>>> Regards,
>>>> Prathamesh
>>>>>> Richard.
>>>>
>>> --
>>> Richard Biener <rguenther@suse.de>
>>> SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)
>>