This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH] P0556R3 Integral power-of-2 operations, P0553R2 Bit operations
- From: Jakub Jelinek <jakub at redhat dot com>
- To: Jonathan Wakely <jwakely at redhat dot com>
- Cc: libstdc++ at gcc dot gnu dot org, gcc-patches at gcc dot gnu dot org
- Date: Wed, 4 Jul 2018 10:09:34 +0200
- Subject: Re: [PATCH] P0556R3 Integral power-of-2 operations, P0553R2 Bit operations
- References: <20180703210247.GA7287@redhat.com> <20180703214003.GE7166@tucnak> <20180703222359.GL2838@redhat.com>
- Reply-to: Jakub Jelinek <jakub at redhat dot com>
On Tue, Jul 03, 2018 at 11:24:00PM +0100, Jonathan Wakely wrote:
> > Wouldn't it be better to use some branchless pattern that
> > GCC can also optimize well, like:
> > return (__x << __sN) | (__x >> ((-_sN) & (_Nd - 1)));
> > (iff _Nd is always power of two),
>
> _Nd is 20 for one of the INT_N types on msp340, but we could have a
> special case for the rare integer types with unusual sizes.
>
> > or perhaps
> > return (__x << __sN) | (__x >> ((-_sN) % _Nd));
> > which is going to be folded into the above one for power of two constants?
>
> That looks good.
Unfortunately it is not correct if _Nd is not power of two.
E.g. for __sN 1, -1U % 20 is 15, not 19.
So it would need to be
return (__x << __sN) | (__x >> ((_Nd - __sN) % _Nd));
Unfortunately, our rotate pattern recognizer handles
return (__x << __sN) | (__x >> ((-__sN) % _Nd));
or
return (__x << __sN) | (__x >> ((-__sN) & (_Nd - 1)));
but doesn't handle the _Nd - __sN case.
Is this C++17+ only? Then perhaps
if constexpr ((_Nd & (_Nd - 1)) == 0)
return (__x << __sN) | (__x >> (-__sN & (_Nd - 1)));
return (__x << __sN) | (__x >> ((_Nd - __sN) % _Nd));
Verify that on x86_64 for all the unsigned {char,short,int,long long} you
actually get a mere rol? instruction with perhaps some register movement,
but no masking, nor shifts etc.
> > E.g. ia32intrin.h also uses:
> > /* 64bit rol */
> > extern __inline unsigned long long
> > __attribute__((__gnu_inline__, __always_inline__, __artificial__))
> > __rolq (unsigned long long __X, int __C)
> > {
> > __C &= 63;
> > return (__X << __C) | (__X >> (-__C & 63));
> > }
> > etc.
>
> Should we delegate to those intrinsics for x86, so that
> __builtin_ia32_rolqi and __builtin_ia32_rolhi can be used when
> relevant?
No, the pattern recognizers should handle (for power of two bitcounts)
even the char/short cases. Those intrinsics predate the improvements
in rotate pattern recognition.
Jakub