This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH] P0556R3 Integral power-of-2 operations, P0553R2 Bit operations
- From: Jonathan Wakely <jwakely at redhat dot com>
- To: Jakub Jelinek <jakub at redhat dot com>
- Cc: libstdc++ at gcc dot gnu dot org, gcc-patches at gcc dot gnu dot org
- Date: Wed, 4 Jul 2018 09:14:04 +0100
- Subject: Re: [PATCH] P0556R3 Integral power-of-2 operations, P0553R2 Bit operations
- References: <20180703210247.GA7287@redhat.com> <20180703214003.GE7166@tucnak> <20180703222359.GL2838@redhat.com> <20180704080934.GF7166@tucnak>
On 04/07/18 10:09 +0200, Jakub Jelinek wrote:
On Tue, Jul 03, 2018 at 11:24:00PM +0100, Jonathan Wakely wrote:
> Wouldn't it be better to use some branchless pattern that
> GCC can also optimize well, like:
> return (__x << __sN) | (__x >> ((-_sN) & (_Nd - 1)));
> (iff _Nd is always power of two),
_Nd is 20 for one of the INT_N types on msp340, but we could have a
special case for the rare integer types with unusual sizes.
> or perhaps
> return (__x << __sN) | (__x >> ((-_sN) % _Nd));
> which is going to be folded into the above one for power of two constants?
That looks good.
Unfortunately it is not correct if _Nd is not power of two.
E.g. for __sN 1, -1U % 20 is 15, not 19.
So it would need to be
return (__x << __sN) | (__x >> ((_Nd - __sN) % _Nd));
Unfortunately, our rotate pattern recognizer handles
return (__x << __sN) | (__x >> ((-__sN) % _Nd));
or
return (__x << __sN) | (__x >> ((-__sN) & (_Nd - 1)));
but doesn't handle the _Nd - __sN case.
Is this C++17+ only? Then perhaps
The std::rotr and std::rotl functions are C++2a only, but I've added
the __rotr and __rotl versions for our own internal use in C++14 and
later.
In practice I have no internal use for rotr and rotl, so I could
remove the __rot[rl] forms. However, won't ((_Nd & (_Nd - 1)) optimize
to a constant even without if-constexpr? I'll check.
if constexpr ((_Nd & (_Nd - 1)) == 0)
return (__x << __sN) | (__x >> (-__sN & (_Nd - 1)));
return (__x << __sN) | (__x >> ((_Nd - __sN) % _Nd));
Verify that on x86_64 for all the unsigned {char,short,int,long long} you
actually get a mere rol? instruction with perhaps some register movement,
but no masking, nor shifts etc.
Will do.
> E.g. ia32intrin.h also uses:
> /* 64bit rol */
> extern __inline unsigned long long
> __attribute__((__gnu_inline__, __always_inline__, __artificial__))
> __rolq (unsigned long long __X, int __C)
> {
> __C &= 63;
> return (__X << __C) | (__X >> (-__C & 63));
> }
> etc.
Should we delegate to those intrinsics for x86, so that
__builtin_ia32_rolqi and __builtin_ia32_rolhi can be used when
relevant?
No, the pattern recognizers should handle (for power of two bitcounts)
even the char/short cases. Those intrinsics predate the improvements
in rotate pattern recognition.
OK, good to know.