This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH, RFC, gfortran] Multi-threaded random number generator
- From: Steve Kargl <sgk at troutmask dot apl dot washington dot edu>
- To: Janne Blomqvist <blomqvist dot janne at gmail dot com>
- Cc: Fortran List <fortran at gcc dot gnu dot org>, GCC Patches <gcc-patches at gcc dot gnu dot org>
- Date: Thu, 24 Dec 2015 09:54:31 -0800
- Subject: Re: [PATCH, RFC, gfortran] Multi-threaded random number generator
- Authentication-results: sourceware.org; auth=none
- References: <CAO9iq9EjJZPzuQpCwuogc+Ui2FFjhKWuJ=wgT5p_4wu73DqO+w at mail dot gmail dot com> <20151223231837 dot GA87079 at troutmask dot apl dot washington dot edu> <CAO9iq9EHOxDDbaUNNqRLMp2vh6z8wp6siK+VKDGgKmPMKs+Gsw at mail dot gmail dot com>
On Thu, Dec 24, 2015 at 04:53:30PM +0200, Janne Blomqvist wrote:
> On Thu, Dec 24, 2015 at 1:18 AM, Steve Kargl
> <sgk@troutmask.apl.washington.edu> wrote:
> > On Thu, Dec 24, 2015 at 12:29:02AM +0200, Janne Blomqvist wrote:
> >>
> >> the GFortran random number generator (RANDOM_NUMBER and RANDOM_SEED
> >> intrinsics) has a number of issues that the attached preliminary patch
> >> tries to address.
> >>
> >
> > First, I think your effort to retool random numbers is a nice enhancement
> > for gfortran. But, ...
> >
> >> - When RANDOM_SEED is called without arguments, the master_seed and
> >> current thread seed is set to random data read from the OS
> >> /dev/urandom device. Otherwise like above.
> >
> > Doesn't this break backwards compatibility with the current
> > implementation. In the following program, the call to random_seed
> > resets the stream. If an OS has /dev/urandom, then the two streamsr
> > won't match?
>
> Yes, but backwards compatibility is already broken to some extent
> (different behavior for multi-threaded programs, different streams due
> to a different generator and initial seed), so I thought that this was
> a good time to do this as well, although per se the rest of the patch
> does not depend on this. I think this is better than the status quo,
> because if one wants a static seed it's easy enough to do, but getting
> random data from the OS is somewhat trickier, so I think it's better
> to do it in the library rather than trusting end users to get it
> right.
Ah, okay. I had not thought about the differences in
the multithreaded cases. I agree if we break backwards
compatibility a little bit, then breaking it a little
bit more can't hurt too much.
> > Two questions.
> >
> > 1) Does the patch deal with PR 52879?
>
> To be honest, I haven't tested. FWIW, the paper at
>
> http://vigna.di.unimi.it/ftp/papers/xorshift.pdf
>
> contains results of some experiments on how quickly the generator
> escapes from a low-entropy state.
>
> (I think one issue behind PR 52879 is that part of the seed is only
> used for generating real(8/16) and are thus unused for real(4))
Currently, we have 3 KISS generators with different seeds (of course).
FX added a pass over the seeds to mix them. See the scramble_seed()
and unscramble_seed() functions.
> > 2) Does it maintain the correspondence between a REAL(4) and
> > REAL(8) stream if the same seeds are used? For example,
> >
> > program foo
> > integer, parameter :: seed(12) = [1, 12, 123, 1234, 12345, 123456, &
> > & 654321, 54321, 4321, 321, 21, 1]
> > real(4) x
> > real(8) y
> > call random_seed(put=seed)
> > call random_number(x)
> > call random_seed(put=seed)
> > call random_number(y)
> > print *, x, y
> > end program foo
> >
> > % gfc -o z r.f90 && ./z
> > 0.181959510 0.18195952290401995
>
> No, it doesn't do this. Currently each thread only has a single set of
> state variables (vs. 3 for the current). Additionally, for real(4), a
> single 64-bit random value is used to create two real(4) variables,
> but the performance advantage of this isn't huge compared to the naive
> implementation of discarding half of the 64 random bits per call.
> OTOH, it's not particularly hard to do this in user code if one wants
> to, so is this feature really worth it?
IHMO, this a nice feature to have. One can easily test an algorithm
in REAL(4), REAL(8), REAL(10), and REAL(16) where the high order bits
all match to determine a suitable precision to use. An obvious
outcome is choosing a precision to balance performance vs accuracy.
So, your mapping is
PRNG stream: |------64-----|------64-----| 2 64-bit xorshift values.
REAL(4): |--24--|--24--|--24--|--24--| 4 values (24-bit significand).
REAL(8): |-----53----xx|-----53----xx| 2 values (53-bit significand).
REAL(10): |------64-----|------64-----| 2 values (64-bit significand).
REAL(16): |-----------113---------xxxx| 1 value (113-bit significand).
The current mapping is
PRNG stream: |--32--|--32--|--32--|--32--| 4 32-bit KISS values.
REAL(4): |--24--|xxxxxx|xxxxxx|xxxxxx| 4 values (24-bit significand).
REAL(8): |-----53----xx|xxxxxxxxxxxxx| 2 values (53-bit significand).
REAL(10): |------64-----|xxxxxxxxxxxxx| 2 values (64-bit significand).
REAL(16): |-----------113---------xxxx| 1 value (113-bit significand).
where x means unused bits.
--
Steve