Created attachment 23651 [details] Probability table for binomial_distribution. The attached program samples binomial_distribution(10, .75) and calculates the correct probabilities of each outcome. The output shows that the probabilities are reversed. With p > .5 the distribution should be biased toward higher values. sample correct p_0 0.056 0.000 p_1 0.185 0.000 p_2 0.283 0.000 p_3 0.252 0.003 p_4 0.147 0.016 p_5 0.058 0.058 p_6 0.016 0.146 p_7 0.003 0.250 p_8 0.000 0.282 p_9 0.000 0.188 p_10 0.000 0.056 The problem is here, __param.p() is a double. --- include/c++/bits/random.tcc +++ include/c++/bits/random.tcc @@ -1434,7 +1434,7 @@ { result_type __ret; const _IntType __t = __param.t(); - const _IntType __p = __param.p(); + const double __p = __param.p(); const double __p12 = __p <= 0.5 ? __p : 1.0 - __p; __detail::_Adaptor<_UniformRandomNumberGenerator, double> __aurng(__urng);
Oops, luckily the fix seems easy. Thanks. Let me have a look.
Ok, I'm going to apply the simple fix to both 4.6 and 4.5. Then I'll finally work on the testsuite for these distributions, long overdue.
I'll add this here since it's in the same place and the test will be similar. For geometric_distribution(.25) the first 10 probabilities are: sample correct p_0 0.000 0.250 p_1 0.750 0.188 p_2 0.187 0.141 p_3 0.046 0.105 p_4 0.012 0.079 p_5 0.003 0.059 p_6 0.001 0.044 p_7 0.000 0.033 p_8 0.000 0.025 p_9 0.000 0.019 The smallest value returned is 1, but should be 0 (and d.min() is correctly 0); and p is used as 1-p. Fix: --- include/c++/bits/random.h +++ include/c++/bits/random.h @@ -3628,8 +3628,8 @@ void _M_initialize() - { _M_log_p = std::log(_M_p); } + { _M_log_1_p = std::log(1 - _M_p); } double _M_p; - double _M_log_p; + double _M_log_1_p; }; --- include/c++/bits/random.tcc +++ include/c++/bits/random.tcc @@ -1027,3 +1027,3 @@ do - __cand = std::ceil(std::log(__aurng()) / __param._M_log_p); + __cand = std::floor(std::log(__aurng()) / __param._M_log_1_p); while (__cand >= __thr); Also, there's a line in random.h _GLIBCXX_DEBUG_ASSERT((_M_p >= 0.0) && (_M_p <= 1.0)); but the standard only requires 0 < p < 1.
Created attachment 23652 [details] Probability table for geometric_distribution.
Thanks, the second issue is ultimately due to the fact that we have been implementing, at variance with the Standard, a slightly different definition, see the formula before the class.
I meant geometric, of course.
... and the reason is, the code has been too blindly adapted from the TR1 version.
Ah, yes, I only looked at the C++0x formulas. By the way, testcases for these don't need to sample probabilities. They can just check, say, the first 100 values drawn using a deterministic generator.
Yes, but really I'd like to have something quite serious in terms of testing, for 4.7.0 at this point. I even have some half-baked code around. By the way, if you are interested in contributing more in this area or elsewhere, your help is welcome but in case I would ask you to assign the copyright, it takes a bit of time but then you are set forever. See: http://gcc.gnu.org/contribute.html
Author: paolo Date: Mon Mar 14 17:57:48 2011 New Revision: 170946 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=170946 Log: 2011-03-14 Andrey Zholos <aaz@althenia.net> PR libstdc++/48114 * include/bits/random.h (geometric_distribution): Correct formula in comment, per C++0x. (geometric_distribution<>::param_type::param_type(double)): Fix check. (geometric_distribution<>::param_type::_M_initialize): Store log(1 - p). * include/bits/random.tcc (geometric_distribution<>::operator()): Fix computation. (binomial_distribution<>::operator()): Likewise. Modified: trunk/libstdc++-v3/ChangeLog trunk/libstdc++-v3/include/bits/random.h trunk/libstdc++-v3/include/bits/random.tcc
In 4_5-branch I'm going to fix only the first issue.
The double __p also fixes an infinite (maybe) loop in both binomial_distribution(100, .75) and binomial_distribution(100, .25) when _GLIBCXX_USE_C99_MATH_TR1 is defined. I guess the proper way to test the algorithms is to generate a lot of values and do a Pearson's chi-square test to check that the distribution is correct. Then that can be used to make a faster testcase for just a few values.
Good idea, yes, something like that.
Author: paolo Date: Mon Mar 14 18:10:36 2011 New Revision: 170950 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=170950 Log: 2011-03-14 Andrey Zholos <aaz@althenia.net> PR libstdc++/48114 * include/bits/random.tcc (binomial_distribution<>::operator()): Fix thinko in computation, __param.p() is a double. Modified: branches/gcc-4_5-branch/libstdc++-v3/ChangeLog branches/gcc-4_5-branch/libstdc++-v3/include/bits/random.tcc
(In reply to comment #11) > In 4_5-branch I'm going to fix only the first issue. Please don't forget about 4_6-branch too. I guess #c14 change would be fine even before 4.6.0 release.
Author: paolo Date: Mon Mar 14 18:17:51 2011 New Revision: 170951 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=170951 Log: 2011-03-14 Andrey Zholos <aaz@althenia.net> PR libstdc++/48114 * include/bits/random.h (geometric_distribution): Correct formula in comment, per C++0x. (geometric_distribution<>::param_type::param_type(double)): Fix check. (geometric_distribution<>::param_type::_M_initialize): Store log(1 - p). * include/bits/random.tcc (geometric_distribution<>::operator()): Fix computation. (binomial_distribution<>::operator()): Likewise. Modified: branches/gcc-4_6-branch/libstdc++-v3/ChangeLog branches/gcc-4_6-branch/libstdc++-v3/include/bits/random.h branches/gcc-4_6-branch/libstdc++-v3/include/bits/random.tcc
Jakub thanks for reminding me, luckily I noticed your message to the mailing list announcing the branching. Anyway, I went ahead and applied both fixes to 4_6 too, if you want me to go with #c14 only just let me know and I'll revert the other bits.
Done.
For the time being at least, for testing I think I'm going to adapt the code in the GNU GSL, it's pretty simple (see randist/test.c) but at least we are 100% safe from the licensing point of view.
Good idea. The testcases should be adapted to the code paths in the GCC generators though: for instance, binomial with p > .5 isn't covered there. And BINS should be increased: for instance in the binomial(.3, 5500) testcase the most frequent outcomes are around 1650, but BINS only covers outcomes up to 100.