Bug 58800 - [4.7/4.8/4.9 Regression] std::nth_element segfaults on valid input
Summary: [4.7/4.8/4.9 Regression] std::nth_element segfaults on valid input
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: libstdc++ (show other bugs)
Version: 4.8.2
: P3 major
Target Milestone: 4.7.4
Assignee: Not yet assigned to anyone
URL:
Keywords:
: 59116 59557 (view as bug list)
Depends on:
Blocks:
 
Reported: 2013-10-18 22:21 UTC by Andy Lutomirski
Modified: 2013-12-19 16:12 UTC (History)
3 users (show)

See Also:
Host:
Target:
Build:
Known to work: 4.7.4, 4.8.3, 4.9.0
Known to fail:
Last reconfirmed: 2013-10-18 00:00:00


Attachments
preprocessed source (105.28 KB, application/x-xz)
2013-10-18 22:22 UTC, Andy Lutomirski
Details
random test (915 bytes, text/x-c++src)
2013-10-19 08:54 UTC, Chris Jefferson
Details
nth_element patch (2.21 KB, patch)
2013-10-19 12:11 UTC, Chris Jefferson
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Andy Lutomirski 2013-10-18 22:21:09 UTC
This program segfaults on gcc 4.8.1 from Ubuntu 13.10.  I'm building a copy of the 4.8 branch to see if I can reproduce it there.

This is on x86_64.  I used a gross hack to specify the input so I don't have to think about precision issues.  The numbers are all well-behaved values near 0.9.

I'll attach preprocessed source.  The preprocessed source crashes even if built with gcc 4.7.something.

#include <algorithm>
#include <stdint.h>
//#include <iostream>

double to_double(uint64_t x)
{
        union {double d; uint64_t x;} u;
        u.x = x;
        return u.d;
}

int main()
{
        std::vector<double> v = {
                to_double(4606672070890243784),
                to_double(4606672025854247510),
                to_double(4606671800674266141),
                to_double(4606671575494284772),
                to_double(4606672115926240057),
                to_double(4606672160962236330),
                to_double(4606672070890243784),
        };

//        for (auto i : v)
//                std::cout << i << std::endl;

        std::nth_element(v.begin(), v.begin() + 3, v.end());
        return 0;
}
Comment 1 Andy Lutomirski 2013-10-18 22:22:25 UTC
Created attachment 31047 [details]
preprocessed source

This was generated with g++ -std=gnu++11 -E sort.cc on Ubuntu 13.10.
Comment 2 Paolo Carlini 2013-10-18 22:48:02 UTC
Please figure out a testcase involving plain integers.
Comment 3 Paolo Carlini 2013-10-18 23:05:49 UTC
Unfortunately the issue seems real. I can reproduce it with:

        std::vector<uint64_t> v = {
                4606672070890243784,
                4606672025854247510,
                4606671800674266141,
                4606671575494284772,
                4606672115926240057,
                4606672160962236330,
                4606672070890243784,
        };

Chris?
Comment 4 Paolo Carlini 2013-10-18 23:17:17 UTC
I can also reproduce with something this simple:

        std::vector<int> v = {
                207089,
                202585,
                180067,
                157549,
                211592,
                216096,
                207089
        };

-D_GLIBCXX_DEBUG reveals that we are dereferencing a past-the-end iterator. We badly need a quick fix. Say 2-3 day max or we have to revert the fix for 58437.
Comment 5 Paolo Carlini 2013-10-18 23:53:36 UTC
Don't tell me is just this:

Index: include/bits/stl_algo.h
===================================================================
--- include/bits/stl_algo.h	(revision 203835)
+++ include/bits/stl_algo.h	(working copy)
@@ -1917,7 +1917,7 @@
 				_RandomAccessIterator __last, _Compare __comp)
     {
       _RandomAccessIterator __mid = __first + (__last - __first) / 2;
-      std::__move_median_to_first(__first, __first + 1, __mid, (__last - 2),
+      std::__move_median_to_first(__first, __first + 1, __mid, (__last - 1),
 				  __comp);
       return std::__unguarded_partition(__first + 1, __last, __first, __comp);
     }

??

Andy, while waiting for Chris' feedback, you may want to test the above on your real applications. In 4.8.x some nth_algorithm details are different (the above is versus mainline), but you can quickly find a couple of (__last - 2) which I'm asking you to change to (__last - 1)
Comment 6 Andy Lutomirski 2013-10-19 00:13:43 UTC
I changed two instances of __last - 2 to __last - 1.  I'm running against r203836 on gcc-4_8-branch.

The thing that caught it was an experiment, not part of my test suite, so this is a little tricky.  It already survived longer than the unpatched version did.  The crash wasn't the same segfault, though -- it was a much later crash due to memory corruption.  I'll let it run for a while under valgrind to see what, if anything, pops up.
Comment 7 Andy Lutomirski 2013-10-19 00:20:39 UTC
FWIW, replacing that list with a random permutation seems to crash about 5% of the time.
Comment 8 Paolo Carlini 2013-10-19 00:21:59 UTC
Thanks Andy. We really want to fix this as soon as possible. In the meanwhile in my tests the tweak passed the testsuite, without regressing on the performance of std::sort (libstdc++/58437). Thus apparently we are on the right track.
Comment 9 Andy Lutomirski 2013-10-19 00:35:20 UTC
This code has survived my smoke test so far, which is a good sign.  If it was causing some kind of bad problem, I'd expect something to have exploded by now.
Comment 10 Marc Glisse 2013-10-19 07:36:44 UTC
{0, 1, 2, 0} should be enough.

__unguarded_partition only stops increasing __first when *__first is not smaller than the pivot. If all elements are smaller than the pivot, boom. And when we have fewer than 5 elements (the check in __introselect is only for > 3), the pivot selection code doesn't guarantee that (some of the 3 pointers used to compute the median are the same, and with Paolo's change they become distinct).
Comment 11 Paolo Carlini 2013-10-19 07:48:34 UTC
Hi Marc. I'm coming to the conclusion that the '- 2' in the patch sent by Chris was in any case a typo. Now, sorry I didn't investigate the issue further, I'm not sure to understand if you think changing it to '- 1' is enough or not: it certainly passes or my tests so far ({0, 1, 2, 0} included)
Comment 12 Chris Jefferson 2013-10-19 07:55:12 UTC
Yes, I didn't trace all the members which call __unguarded_partition_pivot.

The better (in my opinion) fix is to change __introselect from:

__last - __first > 3

to:

__last - __first > int(_S_threshold)

Like the other call the __unguarded_partition_pivot.

I am currently testing that change.

The 'last - 2' was on purpose (although, it is causing the problem!), as 'last - 1' is the the last element of the array which we want to avoid (think of it as 'last - 1 - 1', to go with 'first + 1'. Obviously we couldn't de-ref 'last - 1'.
Comment 13 Paolo Carlini 2013-10-19 08:05:39 UTC
Ok, note that I was reviewing the description you originally sent to the mailing list and it never talked about the - 2...

Let's figure out something safe, please.
Comment 14 Marc Glisse 2013-10-19 08:15:14 UTC
(In reply to Chris Jefferson from comment #12)
> The better (in my opinion) fix is to change __introselect from:
> 
> __last - __first > 3
> 
> to:
> 
> __last - __first > int(_S_threshold)
> 
> Like the other call the __unguarded_partition_pivot.

Note that the optimal threshold is probably not the same for sort and for select (which drops to full sorting below that threshold!).
Comment 15 Chris Jefferson 2013-10-19 08:49:34 UTC
In my last comment, "Obviously we couldn't de-ref 'last - 1'" should have been "Obviously we couldn't de-ref 'last'"
Comment 16 Chris Jefferson 2013-10-19 08:54:47 UTC
Created attachment 31049 [details]
random test

In order to catch any other issues, and stop this kind of thing happening again, I want to quickly and massively expand the algorithms testsuite.

The attached test tests a whole set of random nth_element cases (it would catch the problem we are currently having).

The problem is on my computer, unoptimised, this test takes 30 seconds, and I want to add a similar test to all the sorting related algorithms. Is there a standard way of submitting tests that take a lot of time and memory?
Comment 17 Paolo Carlini 2013-10-19 10:02:37 UTC
At the moment there isn't, but some tests self adjust to run less on simulators. For the time being I think we could certainly add a few (not 10 or 20) of the tests tou are talking about, but remember all with the bits for the simulators. We could also easily add some tests to the performance testsuite (there are no limits there and isn't run by default) which, outside the timing loop also check that the computation is correct (well, we should probably allways do that)
Comment 18 Chris Jefferson 2013-10-19 11:06:42 UTC
(In reply to Paolo Carlini from comment #13)
> Ok, note that I was reviewing the description you originally sent to the
> mailing list and it never talked about the - 2...
> 
> Let's figure out something safe, please.

I miswrote in my final paragraph:

...first+1, mid, last-1 (there is no reason in this bug to consider
last-1 instead of last, but I thought it wouldn't do any harm, and
might avoid other issues of accidentally choosing a bad pivot.

Should have said:

...first+1, mid, last-2 (there is no reason in this bug to consider *last-2* instead of *last-1*, but I thought it wouldn't do any harm, and might avoid other issues of accidentally choosing a bad pivot.

So, the original code chose last-1, I moved that to last-2, not checking all the places that called the code enough (although, I could also have broken it with the first+1 if the ranges had been small enough).

Changing 'last-2' to 'last-1' might well be the easiest, smallest fix, if it does not effect the performance testsuite. We can then, once we have a much more comphrehensive testsuite, make another pass at performance improvements and tweaks.
Comment 19 Chris Jefferson 2013-10-19 12:11:37 UTC
Created attachment 31051 [details]
nth_element patch

Here is a patch. The actual patch is just changing '__last - 2' to '__last - 1' (***NOTE*** When back-applying this patch, before the recent merge of algorithms, there will be two copies of __last - 2 to change).

The larger part is a patch just for this bug, and a general big improvement in the test procedure, to sanity check no other related methods have any issues (they do not). I intend to continue to add new testsuite functions in the future, but this will do to start.
Comment 20 Paolo Carlini 2013-10-19 17:47:10 UTC
Chris, <random> isn't unconditionally available (requires _GLIBCXX_USE_C99_STDINT_TR1 defined). Please massage the new tests (and the new header) to just do nothing when <random> isn't available and send the result separately to the library mailing list. Let's first commit everywhere fix + minimal testcase.
Comment 21 Paolo Carlini 2013-10-19 18:07:01 UTC
I think you can probably simply add at the beginning of the new tests:

// { dg-require-cstdint "" }

and then the new include doesn't require any change. But please double check, we don't want new Bugzillas: say, damage check_v3_target_cstdint to artificially return false on your machine and double check that the new testcases are simply skipped.
Comment 22 paolo@gcc.gnu.org 2013-10-20 09:07:44 UTC
Author: paolo
Date: Sun Oct 20 09:07:36 2013
New Revision: 203872

URL: http://gcc.gnu.org/viewcvs?rev=203872&root=gcc&view=rev
Log:
2013-10-20  Chris Jefferson  <chris@bubblescope.net>
	    Paolo Carlini  <paolo.carlini@oracle.com>

	PR libstdc++/58800
	* include/bits/stl_algo.h (__unguarded_partition_pivot): Change
	__last - 2 to __last - 1.
	* testsuite/25_algorithms/nth_element/58800.cc: New

Added:
    trunk/libstdc++-v3/testsuite/25_algorithms/nth_element/58800.cc
Modified:
    trunk/libstdc++-v3/ChangeLog
    trunk/libstdc++-v3/include/bits/stl_algo.h
Comment 23 paolo@gcc.gnu.org 2013-10-20 09:08:14 UTC
Author: paolo
Date: Sun Oct 20 09:08:12 2013
New Revision: 203873

URL: http://gcc.gnu.org/viewcvs?rev=203873&root=gcc&view=rev
Log:
2013-10-20  Chris Jefferson  <chris@bubblescope.net>
	    Paolo Carlini  <paolo.carlini@oracle.com>

	PR libstdc++/58800
	* include/bits/stl_algo.h (__unguarded_partition_pivot): Change
	__last - 2 to __last - 1.
	* testsuite/25_algorithms/nth_element/58800.cc: New

Added:
    branches/gcc-4_8-branch/libstdc++-v3/testsuite/25_algorithms/nth_element/58800.cc
Modified:
    branches/gcc-4_8-branch/libstdc++-v3/ChangeLog
    branches/gcc-4_8-branch/libstdc++-v3/include/bits/stl_algo.h
Comment 24 paolo@gcc.gnu.org 2013-10-20 09:08:29 UTC
Author: paolo
Date: Sun Oct 20 09:08:26 2013
New Revision: 203874

URL: http://gcc.gnu.org/viewcvs?rev=203874&root=gcc&view=rev
Log:
2013-10-20  Chris Jefferson  <chris@bubblescope.net>
	    Paolo Carlini  <paolo.carlini@oracle.com>

	PR libstdc++/58800
	* include/bits/stl_algo.h (__unguarded_partition_pivot): Change
	__last - 2 to __last - 1.
	* testsuite/25_algorithms/nth_element/58800.cc: New

Added:
    branches/gcc-4_7-branch/libstdc++-v3/testsuite/25_algorithms/nth_element/58800.cc
Modified:
    branches/gcc-4_7-branch/libstdc++-v3/ChangeLog
    branches/gcc-4_7-branch/libstdc++-v3/include/bits/stl_algo.h
Comment 25 Paolo Carlini 2013-10-20 09:09:14 UTC
Fixed for 4.7.4/4.8.3/4.9.0.
Comment 26 Marc Glisse 2013-11-13 15:59:39 UTC
*** Bug 59116 has been marked as a duplicate of this bug. ***
Comment 27 Richard Biener 2013-12-19 16:12:35 UTC
*** Bug 59557 has been marked as a duplicate of this bug. ***