This is the mail archive of the mailing list for the libstdc++ project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] P0556R3 Integral power-of-2 operations, P0553R2 Bit operations

On 04/07/18 10:09 +0200, Jakub Jelinek wrote:
On Tue, Jul 03, 2018 at 11:24:00PM +0100, Jonathan Wakely wrote:
> Wouldn't it be better to use some branchless pattern that
> GCC can also optimize well, like:
>      return (__x << __sN) | (__x >> ((-_sN) & (_Nd - 1)));
> (iff _Nd is always power of two),

_Nd is 20 for one of the INT_N types on msp340, but we could have a
special case for the rare integer types with unusual sizes.

> or perhaps
>      return (__x << __sN) | (__x >> ((-_sN) % _Nd));
> which is going to be folded into the above one for power of two constants?

That looks good.

Unfortunately it is not correct if _Nd is not power of two.
E.g. for __sN 1, -1U % 20 is 15, not 19.
So it would need to be
     return (__x << __sN) | (__x >> ((_Nd - __sN) % _Nd));
Unfortunately, our rotate pattern recognizer handles
     return (__x << __sN) | (__x >> ((-__sN) % _Nd));
     return (__x << __sN) | (__x >> ((-__sN) & (_Nd - 1)));
but doesn't handle the _Nd - __sN case.
Is this C++17+ only?  Then perhaps

The std::rotr and std::rotl functions are C++2a only, but I've added
the __rotr and __rotl versions for our own internal use in C++14 and

In practice I have no internal use for rotr and rotl, so I could
remove the __rot[rl] forms. However, won't ((_Nd & (_Nd - 1)) optimize
to a constant even without if-constexpr? I'll check.

     if constexpr ((_Nd & (_Nd - 1)) == 0)
	return (__x << __sN) | (__x >> (-__sN & (_Nd - 1)));
     return (__x << __sN) | (__x >> ((_Nd - __sN) % _Nd));

Verify that on x86_64 for all the unsigned {char,short,int,long long} you
actually get a mere rol? instruction with perhaps some register movement,
but no masking, nor shifts etc.

Will do.

> E.g. ia32intrin.h also uses:
> /* 64bit rol */
> extern __inline unsigned long long
> __attribute__((__gnu_inline__, __always_inline__, __artificial__))
> __rolq (unsigned long long __X, int __C)
> {
>  __C &= 63;
>  return (__X << __C) | (__X >> (-__C & 63));
> }
> etc.

Should we delegate to those intrinsics for x86, so that
__builtin_ia32_rolqi and __builtin_ia32_rolhi can be used when

No, the pattern recognizers should handle (for power of two bitcounts)
even the char/short cases.  Those intrinsics predate the improvements
in rotate pattern recognition.

OK, good to know.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]