This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] P0556R3 Integral power-of-2 operations, P0553R2 Bit operations


On Tue, Jul 03, 2018 at 11:24:00PM +0100, Jonathan Wakely wrote:
> > Wouldn't it be better to use some branchless pattern that
> > GCC can also optimize well, like:
> >      return (__x << __sN) | (__x >> ((-_sN) & (_Nd - 1)));
> > (iff _Nd is always power of two),
> 
> _Nd is 20 for one of the INT_N types on msp340, but we could have a
> special case for the rare integer types with unusual sizes.
> 
> > or perhaps
> >      return (__x << __sN) | (__x >> ((-_sN) % _Nd));
> > which is going to be folded into the above one for power of two constants?
> 
> That looks good.

Unfortunately it is not correct if _Nd is not power of two.
E.g. for __sN 1, -1U % 20 is 15, not 19.
So it would need to be 
      return (__x << __sN) | (__x >> ((_Nd - __sN) % _Nd));
Unfortunately, our rotate pattern recognizer handles
      return (__x << __sN) | (__x >> ((-__sN) % _Nd));
or
      return (__x << __sN) | (__x >> ((-__sN) & (_Nd - 1)));
but doesn't handle the _Nd - __sN case.
Is this C++17+ only?  Then perhaps
      if constexpr ((_Nd & (_Nd - 1)) == 0)
	return (__x << __sN) | (__x >> (-__sN & (_Nd - 1)));
      return (__x << __sN) | (__x >> ((_Nd - __sN) % _Nd));

Verify that on x86_64 for all the unsigned {char,short,int,long long} you
actually get a mere rol? instruction with perhaps some register movement,
but no masking, nor shifts etc.

> > E.g. ia32intrin.h also uses:
> > /* 64bit rol */
> > extern __inline unsigned long long
> > __attribute__((__gnu_inline__, __always_inline__, __artificial__))
> > __rolq (unsigned long long __X, int __C)
> > {
> >  __C &= 63;
> >  return (__X << __C) | (__X >> (-__C & 63));
> > }
> > etc.
> 
> Should we delegate to those intrinsics for x86, so that
> __builtin_ia32_rolqi and __builtin_ia32_rolhi can be used when
> relevant?

No, the pattern recognizers should handle (for power of two bitcounts)
even the char/short cases.  Those intrinsics predate the improvements
in rotate pattern recognition.

	Jakub


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]