This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [v3] Add tr1::poisson_distribution
- From: Andrew Pinski <pinskia at gmail dot com>
- To: Falk Hueffner <falk at debian dot org>
- Cc: Paolo Carlini <pcarlini at suse dot de>, "'gcc-patches at gcc dot gnu dot org'" <gcc-patches at gcc dot gnu dot org>
- Date: Mon, 14 Aug 2006 23:47:50 -0700
- Subject: Re: [v3] Add tr1::poisson_distribution
- References: <44E131E7.6020307@suse.de> <874pwems8p.fsf@debian.org>
On Tue, 2006-08-15 at 08:24 +0200, Falk Hueffner wrote:
> Paolo Carlini <pcarlini@suse.de> writes:
>
> > * include/tr1/random.tcc (mersenne_twister<>::operator()): Tweak
> > a bit for efficiency.
> > + const _UIntType __fx[2] = { 0, __a };
> >
> > for (int __k = 0; __k < (__n - __m); ++__k)
> > {
> > _UIntType __y = ((_M_x[__k] & __upper_mask)
> > | (_M_x[__k + 1] & __lower_mask));
> > - _M_x[__k] = (_M_x[__k + __m] ^ (__y >> 1)
> > - ^ ((__y & 0x01) ? __a : 0));
> > + _M_x[__k] = _M_x[__k + __m] ^ (__y >> 1) ^ __fx[__y & 0x01];
> > }
> >
> > for (int __k = (__n - __m); __k < (__n - 1); ++__k)
>
> I think this is actually going to be slower on many architectures,
> since modern architectures tend to have long latency load/store, but a
> conditional move to compensate (for example on alphaev6, "y & 1 ? a : 0"
> takes 1 insn/2 cycles, and "fx[y&1]" takes 2/4). Also, it really looks
> like something the compiler should do itself if it is a win..
I bet there is a better way to optimize this without a branch or a load.
Something like:
(-(__y & 0x01)) & __a
Oh, I just looked at the GCC's output of "(y & 1) ? a : 0" and it knows
how to convert that into the above so really you cause a de-optimization
to happen on 95% of the targets.
Thanks,
Andrew Pinski