This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH, RFC, gfortran] Multi-threaded random number generator


On Thu, Dec 24, 2015 at 7:54 PM, Steve Kargl
<sgk@troutmask.apl.washington.edu> wrote:
> On Thu, Dec 24, 2015 at 04:53:30PM +0200, Janne Blomqvist wrote:
>> On Thu, Dec 24, 2015 at 1:18 AM, Steve Kargl
>> <sgk@troutmask.apl.washington.edu> wrote:
>> > Two questions.
>> >
>> > 1) Does the patch deal with PR 52879?
>>
>> To be honest, I haven't tested.  FWIW, the paper at
>>
>> http://vigna.di.unimi.it/ftp/papers/xorshift.pdf
>>
>> contains results of some experiments on how quickly the generator
>> escapes from a low-entropy state.
>>
>> (I think one issue behind PR 52879 is that part of the seed is only
>> used for generating real(8/16) and are  thus unused for real(4))
>
> Currently, we have 3 KISS generators with different seeds (of course).
> FX added a pass over the seeds to mix them.  See the scramble_seed()
> and unscramble_seed() functions.

Not particularly related to that PR specifically, I've been thinking a
bit about how one could detect low entropy seeds. Apparently there
isn't any really straightforward test one can do on a bunch of data to
estimate the entropy. What apparently is used is to compress the data,
and then the compression factor can be seen as some kind of entropy
estimate. Although I suspect for as little data as the random seed
this won't work that well.

A really simple ad-hoc method would be to just run
__builtin_popcountll() over the seed (i.e. count the number of
non-zero bits). Then if the result is below or above some threshold
(that is, the seed is mostly zeros or mostly ones) do, well,
something. E.g. print a warning to stderr, discard the provided seed
and read a new seed from /dev/urandom, or something like that? This
won't of course catch a seed with a bit pattern like 010101010101.. ,
but maybe it's still better than nothing?

>> > 2) Does it maintain the correspondence between a REAL(4) and
>> >    REAL(8) stream if the same seeds are used?  For example,
>> >
>> > program foo
>> >    integer, parameter :: seed(12) = [1, 12, 123, 1234, 12345, 123456, &
>> >    &  654321, 54321, 4321, 321, 21, 1]
>> >    real(4) x
>> >    real(8) y
>> >    call random_seed(put=seed)
>> >    call random_number(x)
>> >    call random_seed(put=seed)
>> >    call random_number(y)
>> >    print *, x, y
>> > end program foo
>> >
>> > %  gfc -o z r.f90 && ./z
>> >   0.181959510      0.18195952290401995
>>
>> No, it doesn't do this. Currently each thread only has a single set of
>> state variables (vs. 3 for the current). Additionally, for real(4), a
>> single 64-bit random value is used to create two real(4) variables,
>> but the performance advantage of this isn't huge compared to the naive
>> implementation of discarding half of the 64 random bits per call.
>> OTOH, it's not particularly hard to do this in user code if one wants
>> to, so is this feature really worth it?
>
> IHMO, this a nice feature to have.  One can easily test an algorithm
> in REAL(4), REAL(8), REAL(10), and REAL(16) where the high order bits
> all match to determine a suitable precision to use.  An obvious
> outcome is choosing a precision to balance performance vs accuracy.

Yes, I understand this. I just wonder whether it's useful enough that
it's worth doing by default and have everyone pay the cost (lower
performance for real(4), larger state, slower escape from a low
entropy state for real(16))? Alternatively, it's simple to have a
wrapper routine that calls the highest precision random_number for
testing, right?

>
> So, your mapping is
>
> PRNG stream: |------64-----|------64-----|  2 64-bit xorshift values.
>     REAL(4): |--24--|--24--|--24--|--24--|  4 values (24-bit significand).
>     REAL(8): |-----53----xx|-----53----xx|  2 values (53-bit significand).
>    REAL(10): |------64-----|------64-----|  2 values (64-bit significand).
>    REAL(16): |-----------113---------xxxx|  1 value  (113-bit significand).
>
> The current mapping is
>
> PRNG stream: |--32--|--32--|--32--|--32--|  4 32-bit KISS values.
>     REAL(4): |--24--|xxxxxx|xxxxxx|xxxxxx|  4 values (24-bit significand).
>     REAL(8): |-----53----xx|xxxxxxxxxxxxx|  2 values (53-bit significand).
>    REAL(10): |------64-----|xxxxxxxxxxxxx|  2 values (64-bit significand).
>    REAL(16): |-----------113---------xxxx|  1 value  (113-bit significand).
>
> where x means unused bits.

Roughly, yes. Or well, there is some unused bits in the real(4)
generation. Note that none of the arandom_rX functions that convert
from uint{32,64}_t to realN are modified by the patch.

-- 
Janne Blomqvist


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]