This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: how to check if target supports andnot instruction ?


On 17 October 2016 at 13:52, Richard Biener <rguenther@suse.de> wrote:
> On Sat, 15 Oct 2016, Prathamesh Kulkarni wrote:
>
>> On 13 October 2016 at 13:22, Marc Glisse <marc.glisse@inria.fr> wrote:
>> > On Thu, 13 Oct 2016, Prathamesh Kulkarni wrote:
>> >
>> >> On 12 October 2016 at 14:43, Richard Biener <rguenther@suse.de> wrote:
>> >>>
>> >>> On Wed, 12 Oct 2016, Marc Glisse wrote:
>> >>>
>> >>>> On Wed, 12 Oct 2016, Prathamesh Kulkarni wrote:
>> >>>>
>> >>>>> I was having a look at PR71636 and added the following pattern to
>> >>>>> match.pd:
>> >>>>> x & ((1U << b) - 1) -> x & ~(~0U << b)
>> >>>>> However the transform is useful only if the target supports "andnot"
>> >>>>> instruction.
>> >>>>
>> >>>>
>> >>>> rth was selling the transformation as a canonicalization, which is
>> >>>> beneficial
>> >>>> when there is an andnot instruction, and neutral otherwise, so it could
>> >>>> be
>> >>>> done always.
>> >>>
>> >>>
>> >>> Well, its three instructions to three instructions and a more expensive
>> >>> constant(?).  ~0U might not be available as immediate for the shift
>> >>> instruction and 1U << b might be available as a bit-set instruction ...
>> >>> (vs. the andnot).
>> >
>> >
>> > True, I hadn't thought of bit-set.
>> >
>> >>> So yes, we might decide to canonicalize to andnot (and decide that
>> >>> three binary to two binary and one unary op is "better").
>> >>>
>> >>> So no excuse to explore the target specific .pd fragment idea ... :/
>> >>
>> >> Hi,
>> >> I have attached patch that adds the transform.
>> >> Does that look OK ?
>> >
>> >
>> > Why bit_not of build_zero_cst instead of build_all_ones_cst, as suggested in
>> > the PR? If we only do the transformation when (1<<b)-1 is fed to a bit_and,
>> > then we probably want to require that it has a single use (maybe even the
>> > shift).
>> >
>> >> I am not sure how to write test-cases for it though.
>> >> For the test-case:
>> >> unsigned f(unsigned x, unsigned b)
>> >> {
>> >>  unsigned t1 = 1U << b;
>> >>  unsigned t2 = t1 - 1;
>> >>  unsigned t3 = x & t2;
>> >>  return t3;
>> >> }
>> >>
>> >> forwprop dump shows:
>> >> Applying pattern match.pd:523, gimple-match.c:47419
>> >> gimple_simplified to _6 = 4294967295 << b_1(D);
>> >> _8 = ~_6;
>> >> t3_5 = x_4(D) & _8;
>> >>
>> >> I could scan for "_6 = 4294967295 << b_1(D);"  however I suppose
>> >> ~0 would depend on width of int and not always be 4294967295 ?
>> >> Or should I scan for "_6 = 4294967295 << b_1(D);"
>> >> and add /* { dg-require-effective int32 } */  to the test-case ?
>> >
>> >
>> > You could check that you have ~, or that you don't have " 1 << ".
>> Thanks for the suggestions.
>> Does the attached patch look OK ?
>>
>> For test-cases, scan-tree-dump-not "1 <<" works well for pr71636-1.c
>> which tests GENERIC folding,
>> however for GIMPLE folding, "1 << " still remains in the forwprop dump
>> because dce isn't
>> run to remove unused values.
>>
>> For the test-case:
>> unsigned f(unsigned x, unsigned b)
>> {
>>   unsigned t1 = 1U << b;
>>   unsigned t2 = t1 - 1;
>>   unsigned t3 = x & t2;
>>   return t3;
>> }
>>
>> forwprop dump shows:
>> Applying pattern match.pd:523, gimple-match.c:47418
>> gimple_simplified to _6 = 4294967295 << b_1(D);
>> _8 = ~_6;
>> t3_5 = x_4(D) & _8;
>> f (unsigned int x, unsigned int b)
>> {
>>   unsigned int t3;
>>   unsigned int t2;
>>   unsigned int t1;
>>   unsigned int _6;
>>   unsigned int _8;
>>
>>   <bb 2>:
>>   t1_2 = 1 << b_1(D);
>>   t2_3 = t1_2 + 4294967295;
>>   _6 = 4294967295 << b_1(D);
>>   _8 = ~_6;
>>   t3_5 = x_4(D) & _8;
>>   return t3_5;
>>
>> }
>>
>> Instead I scanned for _8 = ~_6 with:
>> /* { dg-final { scan-tree-dump "_\[0-9\] = ~_\[0-9\]" "forwprop1" } } */
>> because rhs has bit_not and lhs doesn't.
>> Is that OK ?
>
> That's ok -- note I usually scan cddce1 instead of forwprop to have
> DCE run on the IL.
>
> Ok (with or without changing to scan cddce1 instead for not-1<<).
Thanks, committed as r241229.

Regards,
Prathamesh
>
> Thanks,
> Richard.
>
>> Bootstrap+tested on x86_64-unknown-linux-gnu.
>> Cross-tested on arm*-*-*, aarch64*-*-*
>>
>> Thanks,
>> Prathamesh
>> >
>> > --
>> > Marc Glisse
>>
>
> --
> Richard Biener <rguenther@suse.de>
> SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]