48114 – [C++0x] binomial_distribution incorrect for p > .5 and geometric_distribution wrongly implements the TR1 definition

Bug 48114 - [C++0x] binomial_distribution incorrect for p > .5 and geometric_distribution wrongly implements the TR1 definition

Summary: [C++0x] binomial_distribution incorrect for p > .5 and geometric_distribution...

Status:	RESOLVED FIXED

Alias:	None

Product:	gcc
Classification:	Unclassified
Component:	libstdc++ (show other bugs)
Version:	4.6.0

Importance:	P3 normal
Target Milestone:	4.5.3
Assignee:	Paolo Carlini

URL:
Keywords:

Depends on:
Blocks:

Reported:	2011-03-14 14:47 UTC by Andrey Zholos
Modified:	2011-03-16 22:43 UTC (History)
CC List:	1 user (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed:	2011-03-14 15:17:49

Attachments
Probability table for binomial_distribution. (331 bytes, application/octet-stream) 2011-03-14 14:47 UTC, Andrey Zholos	Details
Probability table for geometric_distribution. (310 bytes, application/octet-stream) 2011-03-14 15:25 UTC, Andrey Zholos	Details
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Andrey Zholos 2011-03-14 14:47:30 UTC

Created attachment 23651 [details]
Probability table for binomial_distribution.

The attached program samples binomial_distribution(10, .75) and
calculates the correct probabilities of each outcome. The output shows
that the probabilities are reversed. With p > .5 the distribution should
be biased toward higher values.

        sample  correct
p_0     0.056   0.000
p_1     0.185   0.000
p_2     0.283   0.000
p_3     0.252   0.003
p_4     0.147   0.016
p_5     0.058   0.058
p_6     0.016   0.146
p_7     0.003   0.250
p_8     0.000   0.282
p_9     0.000   0.188
p_10    0.000   0.056

The problem is here, __param.p() is a double.

--- include/c++/bits/random.tcc
+++ include/c++/bits/random.tcc
@@ -1434,7 +1434,7 @@
       {
        result_type __ret;
        const _IntType __t = __param.t();
-       const _IntType __p = __param.p();
+       const double __p = __param.p();
        const double __p12 = __p <= 0.5 ? __p : 1.0 - __p;
        __detail::_Adaptor<_UniformRandomNumberGenerator, double>
          __aurng(__urng);

Comment 1 Paolo Carlini 2011-03-14 15:08:08 UTC

Oops, luckily the fix seems easy. Thanks. Let me have a look.

Comment 2 Paolo Carlini 2011-03-14 15:17:49 UTC

Ok, I'm going to apply the simple fix to both 4.6 and 4.5. Then I'll finally work on the testsuite for these distributions, long overdue.

Comment 3 Andrey Zholos 2011-03-14 15:23:25 UTC

I'll add this here since it's in the same place and the test will be
similar.

For geometric_distribution(.25) the first 10 probabilities are:

        sample  correct
p_0     0.000   0.250
p_1     0.750   0.188
p_2     0.187   0.141
p_3     0.046   0.105
p_4     0.012   0.079
p_5     0.003   0.059
p_6     0.001   0.044
p_7     0.000   0.033
p_8     0.000   0.025
p_9     0.000   0.019

The smallest value returned is 1, but should be 0 (and d.min() is
correctly 0); and p is used as 1-p.

Fix:

--- include/c++/bits/random.h
+++ include/c++/bits/random.h
@@ -3628,8 +3628,8 @@
        void
        _M_initialize()
-       { _M_log_p = std::log(_M_p); }
+       { _M_log_1_p = std::log(1 - _M_p); }
 
        double _M_p;
 
-       double _M_log_p;
+       double _M_log_1_p;
       };

--- include/c++/bits/random.tcc
+++ include/c++/bits/random.tcc
@@ -1027,3 +1027,3 @@
        do
-         __cand = std::ceil(std::log(__aurng()) / __param._M_log_p);
+         __cand = std::floor(std::log(__aurng()) / __param._M_log_1_p);
        while (__cand >= __thr);

Also, there's a line in random.h

          _GLIBCXX_DEBUG_ASSERT((_M_p >= 0.0)
                             && (_M_p <= 1.0));

but the standard only requires 0 < p < 1.

Comment 4 Andrey Zholos 2011-03-14 15:25:14 UTC

Created attachment 23652 [details]
Probability table for geometric_distribution.

Comment 5 Paolo Carlini 2011-03-14 16:30:34 UTC

Thanks, the second issue is ultimately due to the fact that we have been implementing, at variance with the Standard, a slightly different definition, see the formula before the class.

Comment 6 Paolo Carlini 2011-03-14 16:31:46 UTC

I meant geometric, of course.

Comment 7 Paolo Carlini 2011-03-14 16:36:06 UTC

... and the reason is, the code has been too blindly adapted from the TR1 version.

Comment 8 Andrey Zholos 2011-03-14 17:34:01 UTC

Ah, yes, I only looked at the C++0x formulas.

By the way, testcases for these don't need to sample probabilities.
They can just check, say, the first 100 values drawn using a
deterministic generator.

Comment 9 Paolo Carlini 2011-03-14 17:43:33 UTC

Yes, but really I'd like to have something quite serious in terms of testing, for 4.7.0 at this point. I even have some half-baked code around. By the way, if you are interested in contributing more in this area or elsewhere, your help is welcome but in case I would ask you to assign the copyright, it takes a bit of time but then you are set forever. See: http://gcc.gnu.org/contribute.html

Comment 10 paolo@gcc.gnu.org 2011-03-14 17:57:53 UTC

Author: paolo
Date: Mon Mar 14 17:57:48 2011
New Revision: 170946

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=170946
Log:
2011-03-14  Andrey Zholos  <aaz@althenia.net>

	PR libstdc++/48114
	* include/bits/random.h (geometric_distribution): Correct formula
	in comment, per C++0x.
	(geometric_distribution<>::param_type::param_type(double)): Fix check.
	(geometric_distribution<>::param_type::_M_initialize):
	Store log(1 - p).
	* include/bits/random.tcc (geometric_distribution<>::operator()):
	Fix computation.
	(binomial_distribution<>::operator()): Likewise.

Modified:
    trunk/libstdc++-v3/ChangeLog
    trunk/libstdc++-v3/include/bits/random.h
    trunk/libstdc++-v3/include/bits/random.tcc

Comment 11 Paolo Carlini 2011-03-14 18:02:41 UTC

In 4_5-branch I'm going to fix only the first issue.

Comment 12 Andrey Zholos 2011-03-14 18:07:22 UTC

The  double __p  also fixes an infinite (maybe) loop in both
binomial_distribution(100, .75) and binomial_distribution(100, .25)
when _GLIBCXX_USE_C99_MATH_TR1 is defined.

I guess the proper way to test the algorithms is to generate a lot of
values and do a Pearson's chi-square test to check that the distribution
is correct. Then that can be used to make a faster testcase for just a
few values.

Comment 13 Paolo Carlini 2011-03-14 18:08:35 UTC

Good idea, yes, something like that.

Comment 14 paolo@gcc.gnu.org 2011-03-14 18:10:47 UTC

Author: paolo
Date: Mon Mar 14 18:10:36 2011
New Revision: 170950

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=170950
Log:
2011-03-14  Andrey Zholos  <aaz@althenia.net>

	PR libstdc++/48114
	* include/bits/random.tcc (binomial_distribution<>::operator()):
	Fix thinko in computation, __param.p() is a double.


Modified:
    branches/gcc-4_5-branch/libstdc++-v3/ChangeLog
    branches/gcc-4_5-branch/libstdc++-v3/include/bits/random.tcc

Comment 15 Jakub Jelinek 2011-03-14 18:16:36 UTC

(In reply to comment #11)
> In 4_5-branch I'm going to fix only the first issue.

Please don't forget about 4_6-branch too.  I guess #c14 change would be fine even before 4.6.0 release.

Comment 16 paolo@gcc.gnu.org 2011-03-14 18:17:55 UTC

Author: paolo
Date: Mon Mar 14 18:17:51 2011
New Revision: 170951

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=170951
Log:
2011-03-14  Andrey Zholos  <aaz@althenia.net>

	PR libstdc++/48114
	* include/bits/random.h (geometric_distribution): Correct formula
	in comment, per C++0x.
	(geometric_distribution<>::param_type::param_type(double)): Fix check.
	(geometric_distribution<>::param_type::_M_initialize):
	Store log(1 - p).
	* include/bits/random.tcc (geometric_distribution<>::operator()):
	Fix computation.
	(binomial_distribution<>::operator()): Likewise.

Modified:
    branches/gcc-4_6-branch/libstdc++-v3/ChangeLog
    branches/gcc-4_6-branch/libstdc++-v3/include/bits/random.h
    branches/gcc-4_6-branch/libstdc++-v3/include/bits/random.tcc

Comment 17 Paolo Carlini 2011-03-14 18:20:33 UTC

Jakub thanks for reminding me, luckily I noticed your message to the mailing list announcing the branching.  Anyway, I went ahead and applied both fixes to 4_6 too, if you want me to go with #c14 only just let me know and I'll revert the other bits.

Comment 18 Paolo Carlini 2011-03-14 18:28:02 UTC

Done.

Comment 19 Paolo Carlini 2011-03-16 18:23:46 UTC

For the time being at least, for testing I think I'm going to adapt the code in the GNU GSL, it's pretty simple (see randist/test.c) but at least we are 100% safe from the licensing point of view.

Comment 20 Andrey Zholos 2011-03-16 22:43:45 UTC

Good idea. The testcases should be adapted to the code paths in the GCC
generators though: for instance, binomial with p > .5 isn't covered there.
And BINS should be increased: for instance in the binomial(.3, 5500)
testcase the most frequent outcomes are around 1650, but BINS only covers
outcomes up to 100.