Bug 48114 - [C++0x] binomial_distribution incorrect for p > .5 and geometric_distribution wrongly implements the TR1 definition
Summary: [C++0x] binomial_distribution incorrect for p > .5 and geometric_distribution...
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: libstdc++ (show other bugs)
Version: 4.6.0
: P3 normal
Target Milestone: 4.5.3
Assignee: Paolo Carlini
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-03-14 14:47 UTC by Andrey Zholos
Modified: 2011-03-16 22:43 UTC (History)
1 user (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2011-03-14 15:17:49


Attachments
Probability table for binomial_distribution. (331 bytes, application/octet-stream)
2011-03-14 14:47 UTC, Andrey Zholos
Details
Probability table for geometric_distribution. (310 bytes, application/octet-stream)
2011-03-14 15:25 UTC, Andrey Zholos
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Andrey Zholos 2011-03-14 14:47:30 UTC
Created attachment 23651 [details]
Probability table for binomial_distribution.

The attached program samples binomial_distribution(10, .75) and
calculates the correct probabilities of each outcome. The output shows
that the probabilities are reversed. With p > .5 the distribution should
be biased toward higher values.

        sample  correct
p_0     0.056   0.000
p_1     0.185   0.000
p_2     0.283   0.000
p_3     0.252   0.003
p_4     0.147   0.016
p_5     0.058   0.058
p_6     0.016   0.146
p_7     0.003   0.250
p_8     0.000   0.282
p_9     0.000   0.188
p_10    0.000   0.056

The problem is here, __param.p() is a double.

--- include/c++/bits/random.tcc
+++ include/c++/bits/random.tcc
@@ -1434,7 +1434,7 @@
       {
        result_type __ret;
        const _IntType __t = __param.t();
-       const _IntType __p = __param.p();
+       const double __p = __param.p();
        const double __p12 = __p <= 0.5 ? __p : 1.0 - __p;
        __detail::_Adaptor<_UniformRandomNumberGenerator, double>
          __aurng(__urng);
Comment 1 Paolo Carlini 2011-03-14 15:08:08 UTC
Oops, luckily the fix seems easy. Thanks. Let me have a look.
Comment 2 Paolo Carlini 2011-03-14 15:17:49 UTC
Ok, I'm going to apply the simple fix to both 4.6 and 4.5. Then I'll finally work on the testsuite for these distributions, long overdue.
Comment 3 Andrey Zholos 2011-03-14 15:23:25 UTC
I'll add this here since it's in the same place and the test will be
similar.

For geometric_distribution(.25) the first 10 probabilities are:

        sample  correct
p_0     0.000   0.250
p_1     0.750   0.188
p_2     0.187   0.141
p_3     0.046   0.105
p_4     0.012   0.079
p_5     0.003   0.059
p_6     0.001   0.044
p_7     0.000   0.033
p_8     0.000   0.025
p_9     0.000   0.019

The smallest value returned is 1, but should be 0 (and d.min() is
correctly 0); and p is used as 1-p.

Fix:

--- include/c++/bits/random.h
+++ include/c++/bits/random.h
@@ -3628,8 +3628,8 @@
        void
        _M_initialize()
-       { _M_log_p = std::log(_M_p); }
+       { _M_log_1_p = std::log(1 - _M_p); }
 
        double _M_p;
 
-       double _M_log_p;
+       double _M_log_1_p;
       };

--- include/c++/bits/random.tcc
+++ include/c++/bits/random.tcc
@@ -1027,3 +1027,3 @@
        do
-         __cand = std::ceil(std::log(__aurng()) / __param._M_log_p);
+         __cand = std::floor(std::log(__aurng()) / __param._M_log_1_p);
        while (__cand >= __thr);

Also, there's a line in random.h

          _GLIBCXX_DEBUG_ASSERT((_M_p >= 0.0)
                             && (_M_p <= 1.0));

but the standard only requires 0 < p < 1.
Comment 4 Andrey Zholos 2011-03-14 15:25:14 UTC
Created attachment 23652 [details]
Probability table for geometric_distribution.
Comment 5 Paolo Carlini 2011-03-14 16:30:34 UTC
Thanks, the second issue is ultimately due to the fact that we have been implementing, at variance with the Standard, a slightly different definition, see the formula before the class.
Comment 6 Paolo Carlini 2011-03-14 16:31:46 UTC
I meant geometric, of course.
Comment 7 Paolo Carlini 2011-03-14 16:36:06 UTC
... and the reason is, the code has been too blindly adapted from the TR1 version.
Comment 8 Andrey Zholos 2011-03-14 17:34:01 UTC
Ah, yes, I only looked at the C++0x formulas.

By the way, testcases for these don't need to sample probabilities.
They can just check, say, the first 100 values drawn using a
deterministic generator.
Comment 9 Paolo Carlini 2011-03-14 17:43:33 UTC
Yes, but really I'd like to have something quite serious in terms of testing, for 4.7.0 at this point. I even have some half-baked code around. By the way, if you are interested in contributing more in this area or elsewhere, your help is welcome but in case I would ask you to assign the copyright, it takes a bit of time but then you are set forever. See: http://gcc.gnu.org/contribute.html
Comment 10 paolo@gcc.gnu.org 2011-03-14 17:57:53 UTC
Author: paolo
Date: Mon Mar 14 17:57:48 2011
New Revision: 170946

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=170946
Log:
2011-03-14  Andrey Zholos  <aaz@althenia.net>

	PR libstdc++/48114
	* include/bits/random.h (geometric_distribution): Correct formula
	in comment, per C++0x.
	(geometric_distribution<>::param_type::param_type(double)): Fix check.
	(geometric_distribution<>::param_type::_M_initialize):
	Store log(1 - p).
	* include/bits/random.tcc (geometric_distribution<>::operator()):
	Fix computation.
	(binomial_distribution<>::operator()): Likewise.

Modified:
    trunk/libstdc++-v3/ChangeLog
    trunk/libstdc++-v3/include/bits/random.h
    trunk/libstdc++-v3/include/bits/random.tcc
Comment 11 Paolo Carlini 2011-03-14 18:02:41 UTC
In 4_5-branch I'm going to fix only the first issue.
Comment 12 Andrey Zholos 2011-03-14 18:07:22 UTC
The  double __p  also fixes an infinite (maybe) loop in both
binomial_distribution(100, .75) and binomial_distribution(100, .25)
when _GLIBCXX_USE_C99_MATH_TR1 is defined.

I guess the proper way to test the algorithms is to generate a lot of
values and do a Pearson's chi-square test to check that the distribution
is correct. Then that can be used to make a faster testcase for just a
few values.
Comment 13 Paolo Carlini 2011-03-14 18:08:35 UTC
Good idea, yes, something like that.
Comment 14 paolo@gcc.gnu.org 2011-03-14 18:10:47 UTC
Author: paolo
Date: Mon Mar 14 18:10:36 2011
New Revision: 170950

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=170950
Log:
2011-03-14  Andrey Zholos  <aaz@althenia.net>

	PR libstdc++/48114
	* include/bits/random.tcc (binomial_distribution<>::operator()):
	Fix thinko in computation, __param.p() is a double.


Modified:
    branches/gcc-4_5-branch/libstdc++-v3/ChangeLog
    branches/gcc-4_5-branch/libstdc++-v3/include/bits/random.tcc
Comment 15 Jakub Jelinek 2011-03-14 18:16:36 UTC
(In reply to comment #11)
> In 4_5-branch I'm going to fix only the first issue.

Please don't forget about 4_6-branch too.  I guess #c14 change would be fine even before 4.6.0 release.
Comment 16 paolo@gcc.gnu.org 2011-03-14 18:17:55 UTC
Author: paolo
Date: Mon Mar 14 18:17:51 2011
New Revision: 170951

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=170951
Log:
2011-03-14  Andrey Zholos  <aaz@althenia.net>

	PR libstdc++/48114
	* include/bits/random.h (geometric_distribution): Correct formula
	in comment, per C++0x.
	(geometric_distribution<>::param_type::param_type(double)): Fix check.
	(geometric_distribution<>::param_type::_M_initialize):
	Store log(1 - p).
	* include/bits/random.tcc (geometric_distribution<>::operator()):
	Fix computation.
	(binomial_distribution<>::operator()): Likewise.

Modified:
    branches/gcc-4_6-branch/libstdc++-v3/ChangeLog
    branches/gcc-4_6-branch/libstdc++-v3/include/bits/random.h
    branches/gcc-4_6-branch/libstdc++-v3/include/bits/random.tcc
Comment 17 Paolo Carlini 2011-03-14 18:20:33 UTC
Jakub thanks for reminding me, luckily I noticed your message to the mailing list announcing the branching.  Anyway, I went ahead and applied both fixes to 4_6 too, if you want me to go with #c14 only just let me know and I'll revert the other bits.
Comment 18 Paolo Carlini 2011-03-14 18:28:02 UTC
Done.
Comment 19 Paolo Carlini 2011-03-16 18:23:46 UTC
For the time being at least, for testing I think I'm going to adapt the code in the GNU GSL, it's pretty simple (see randist/test.c) but at least we are 100% safe from the licensing point of view.
Comment 20 Andrey Zholos 2011-03-16 22:43:45 UTC
Good idea. The testcases should be adapted to the code paths in the GCC
generators though: for instance, binomial with p > .5 isn't covered there.
And BINS should be increased: for instance in the binomial(.3, 5500)
testcase the most frequent outcomes are around 1650, but BINS only covers
outcomes up to 100.