Summary: | [4.7/4.8/4.9 Regression] Sorting value in reverse order is much slower compare to gcc44 | ||
---|---|---|---|
Product: | gcc | Reporter: | Tammy Hsu <tammy> |
Component: | libstdc++ | Assignee: | Not yet assigned to anyone <unassigned> |
Status: | RESOLVED FIXED | ||
Severity: | normal | CC: | chris, daniel.kruegler, jmbnyc, paolo.carlini |
Priority: | P3 | ||
Version: | 4.8.1 | ||
Target Milestone: | 4.7.4 | ||
Host: | Target: | ||
Build: | Known to work: | ||
Known to fail: | Last reconfirmed: | 2013-09-17 00:00:00 | |
Attachments: |
Sort patch
Performance tests for sort |
Description
Tammy Hsu
2013-09-16 20:37:55 UTC
I can't reproduce this. I tried compiling with -O3 (with 4.4, 4.7 and 4.8) after commenting out either of the 2 sort lines, and I see roughly the same execution times (maybe a few % difference, but nowhere near the factor 6 you mention, which seems quite suspicious to me). You need to at least describe your platform more in details... Ah forget my last message, I understand now you are really interested in how long it takes to reverse-sort an already sorted vector. Indeed it does take much longer with 4.6+ than with 4.4. Less confusing testcase: #include <algorithm> #include <vector> using namespace std; int main() { const int num = 10000000; vector<int> v; v.reserve(num); for(int i=0;i!=num;++i) v.push_back(-i); sort(v.begin(), v.end()); } Eh, it's *a lot* of time! I guess we have ask again Chris' help, because I don't think much happened after the move semantics work. *** Bug 58440 has been marked as a duplicate of this bug. *** In particular I'm thinking this change: http://gcc.gnu.org/ml/libstdc++/2009-08/msg00073.html I will look at this next week. Quicksorts are always suboptimal for mostly sorted (forwards or backwards) data, and it would be nice to fix that. Thanks Chris. Note that, according at least to the Dup bug, some slow down is noticeable in other less special cases too. Marc/Paolo/Chris, Thanks a lot for your help. It's awesome to see so many activities within hours after submitting the bug. Thanks, Tammy Tammy, Something must have been in the air because your timestamp on the bug submission means that right at the time you were reporting the bug I was actively looking into why I was seeing horrible sort performance. I am working on an algo that requires a post sort of something 1B entries so I wanted to see worst case for std::sort on 500M and was stunned to see how poor it was. My CPU was Haswell so wondered if there was an issue so I tried another bug (that happened to have gcc 4.4.7) and realized it appeared to be a compiler or std lib regression. Anyhow, given the length of time that this has been out there it is weird that we reported the same bug within hours of each other. Best, /JMB Jeffrey, It's weird indeed. We are still using gcc445 for our production releases and are in the process of evaluating gcc481/gcc473. The performance tests show slowness in some tests.... I also don't have gcc45x available in house, so don't know if this bug exists in gcc45. BTW, it's really nice to know someone else encountered the same issue. Tammy Tammy, We tested gcc 4.7.2, 4.6.2 and 4.4.3/5 (the bug is not in either 4.4.3/5). I have gcc 4.8.1 on my laptop but have not tried it yet. I confirmed the issue by compiling my test (almost identical to the one you submitted but using 500M elements) on 4.4.5 and then moving the executable over to a box with 4.7.2 installed. the native compiled program performed poorly compared to the one compiled with 4.4.5 (this also ruled out chip issues, i.e. haswell vs sandybridge). I knew something was wrong when my own single threaded merge sort that produces a gradient instead of sorting the data in place was outperforming the std::sort using -D_GLIBCXX_PARALLEL, i.e. parallel sort of 500M entries. Best, /JMB Created attachment 30861 [details]
Sort patch
Wow, this an embarrassing bug to get through testing. Obviously not enough profiling done!
This patch (I hope) gets performance back to g++-4.4 levels, and in my experiments slightly (but not noticeably) better. I would be interested in anyone trying out this patch on their own data.
For the interested, here is a description of the bug. I will assume you understand quicksort!
Quicksort requires taking a pivot, then partitioning the array by that pivot. Getting O(n log n) behaviour requires that you choose a "good" pivot. Like most people, in g++ we choose the median of the first, middle, and last value in the array as the pivot.
In the C++03 world, after we chose the pivot we took a copy of it. In C++11, we cannot copy values in std::sort any more, only move and swap.
Trying to keep track of the pivot as it moves around during the partitioning is horrible and inefficient. So, I had the "bright idea" to do swap(*first, *pivot). Now the pivot is at the first location in the array, and we can just partition [first+1, end) and the partitioning is fine. See the mistake yet?
The bug comes when we recursively partition the first cell. We again choose the median from the first, mid and last of this cell, but *first is always the biggest thing in the cell (as we pivoted on it one level above), meaning our median calculation is unbalanced and often chooses terrible values, leading to us often hitting the O(n^2) behaviour of quicksort (actually we technically get o(n log n), as we switch to heapsort when things are going badly, but the time is still bad).
There are a selection of ways of fixing this. This patch switches the median calculation from considering then first, mid, last locations to on first+1, mid, last-1 (there is no reason in this bug to consider last-1 instead of last, but I thought it wouldn't do any harm, and might avoid other issues of accidentally choosing a bad pivot.
Hi Chris, (detail: could you pass -u10, or at least -p, to diff to make the patches easier to read? It isn't required so you don't have to) I don't really understand why the pivot is still considered in lower levels of the recursion at all. So, first we move the pivot to the first position with a swap: (pivot, rest...) Then we partition: (pivot, small..., big...) Then we know what the right position is for the pivot (this isn't a stable sort), so we could put it there with another swap: (small..., pivot, big...) And now we can recurse on small and big, and no one needs to look at the pivot ever again. From what I understand of the code, we are instead relying on recursive calls to eventually sort pivot to the right position, which seems strange to me. Thanks a lot guys, I appreciate all the help you are providing. While fixing this, let's remember that this regressed even in the old 4.7.x branch. Thus, if we are sure we are minimally restoring the performance of the C++03 code, we could also envisage a very minimal but *very* safe fix for 4.7.4 and 4.8.2 and something a tad more aggressive for 4.9.0. Indeed, if std::sort had never used lower-level partitioning to get the pivot in the correct location, we would never have had this problem in the first place! This is not too serious a problem performance-wise (as the pivot ends up in the right place fairly quickly), but it would certainly be better to do that in future. Unfortunately doing this requires changing quite a few functions, to make all code which uses partition skip over the pivot value. I would suggest my current patch (which is simpler than it looks from the diff) for previous versions, then investigate pivot. In general I would like to modernise the sort to match more modern thoughts on sorting such as making use of partly sorted data (which would include reverse-ordered). The only problem there is satisfying all the requirements of std::sort). (detail: -p doesn't help, you just get things like " -110,16 +112,18 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION". Neither -u 10, -x -u 10 or -x "-u 10" seem to work, I just get errors. If you can give me the exact svn diff command to run to get nicer output, I am happy to do so). Can I ask you also a rather simple test for testsuite/performance/25_algorithms? [Ugh, bugzilla seems half broken at the moment]
> I would suggest my current patch (which is simpler than it looks from the diff) for previous versions, then investigate pivot.
Well, you're the one doing all the work ;-)
(svn diff --diff-cmd diff -x -u10 (actually I have diff-cmd = diff-for-svn in ~/.subversion/config where diff-for-svn is a script where I put the options I like, so svn diff does it by default), too bad -p often doesn't work well with C++)
I ran some quick tests and indeed the performance seems equal of better of those of the old C++03 code. Note that the patch is very small and can be installed even on top of the installed headers, without rebuilding anything, thus I would really encourage the bug submitters to test it and report back. Created attachment 30867 [details]
Performance tests for sort
This is some performance tests for performance checking. Sorry for tar rather than patch, just pop them in performance/25_algorithms.
I decided to check stable_sort and heap_sort as well as just sort, just to check nothing else got broken at the same time (turns out, nothing else did).
Not sure exactly what the appropriate memory / time tradeoffs are for the performance test suite, so any comments welcome.
(In reply to Paolo Carlini from comment #19) > I ran some quick tests and indeed the performance seems equal of better of > those of the old C++03 code. Note that the patch is very small and can be > installed even on top of the installed headers, without rebuilding anything, > thus I would really encourage the bug submitters to test it and report back. Excellent. Will test and report back. Thanks a lot. We will test it out with our real application.
Just want to do things right, the patch is the one described in Attachment 30861 [details]. And I just need to patch the <gcc481_root>/include/c++/4.8.1/bits/stl_algo.h, don't need to rebuild gcc481.
If this is a question, the answer is YES. Yes, it is a question. And thank you for the answer....... :-) Note that naively doing what I am proposing in comment #14 (it's just an iter_swap and a +-1) also makes reverse-sorted arrays a bad case, because of the way we do partitioning, so it isn't an alternative to Chris's first+1 approach, more of an orthogonal optimization. Thanks a lot for the patch. It addresses the issue on the standalone test and shows also in our real tool test results. With this patch, the major degradations are gone. Compared to gcc445, there is a slight overall improvement for this tool now. Really appreciate all the comments and the creation of a patch in such a short of time!!!! Author: paolo Date: Mon Sep 30 17:42:31 2013 New Revision: 203035 URL: http://gcc.gnu.org/viewcvs?rev=203035&root=gcc&view=rev Log: 2013-09-30 Chris Jefferson <chris@bubblescope.net> PR libstdc++/58437 * include/bits/stl_algo.h (__move_median_first): Rename to __move_median_to_first, change to take an addition argument. (__unguarded_partition_pivot): Adjust. * testsuite/performance/25_algorithms/sort.cc: New. * testsuite/performance/25_algorithms/sort_heap.cc: Likewise. * testsuite/performance/25_algorithms/stable_sort.cc: Likewise. Added: trunk/libstdc++-v3/testsuite/performance/25_algorithms/sort.cc trunk/libstdc++-v3/testsuite/performance/25_algorithms/sort_heap.cc trunk/libstdc++-v3/testsuite/performance/25_algorithms/stable_sort.cc Modified: trunk/libstdc++-v3/ChangeLog trunk/libstdc++-v3/include/bits/stl_algo.h Author: paolo Date: Mon Sep 30 17:42:52 2013 New Revision: 203036 URL: http://gcc.gnu.org/viewcvs?rev=203036&root=gcc&view=rev Log: 2013-09-30 Chris Jefferson <chris@bubblescope.net> PR libstdc++/58437 * include/bits/stl_algo.h (__move_median_first): Rename to __move_median_to_first, change to take an addition argument. (__unguarded_partition_pivot): Adjust. * testsuite/performance/25_algorithms/sort.cc: New. * testsuite/performance/25_algorithms/sort_heap.cc: Likewise. * testsuite/performance/25_algorithms/stable_sort.cc: Likewise. Added: branches/gcc-4_8-branch/libstdc++-v3/testsuite/performance/25_algorithms/sort.cc branches/gcc-4_8-branch/libstdc++-v3/testsuite/performance/25_algorithms/sort_heap.cc branches/gcc-4_8-branch/libstdc++-v3/testsuite/performance/25_algorithms/stable_sort.cc Modified: branches/gcc-4_8-branch/libstdc++-v3/ChangeLog branches/gcc-4_8-branch/libstdc++-v3/include/bits/stl_algo.h Author: paolo Date: Mon Sep 30 17:43:05 2013 New Revision: 203037 URL: http://gcc.gnu.org/viewcvs?rev=203037&root=gcc&view=rev Log: 2013-09-30 Chris Jefferson <chris@bubblescope.net> PR libstdc++/58437 * include/bits/stl_algo.h (__move_median_first): Rename to __move_median_to_first, change to take an addition argument. (__unguarded_partition_pivot): Adjust. * testsuite/performance/25_algorithms/sort.cc: New. * testsuite/performance/25_algorithms/sort_heap.cc: Likewise. * testsuite/performance/25_algorithms/stable_sort.cc: Likewise. Added: branches/gcc-4_7-branch/libstdc++-v3/testsuite/performance/25_algorithms/sort.cc branches/gcc-4_7-branch/libstdc++-v3/testsuite/performance/25_algorithms/sort_heap.cc branches/gcc-4_7-branch/libstdc++-v3/testsuite/performance/25_algorithms/stable_sort.cc Modified: branches/gcc-4_7-branch/libstdc++-v3/ChangeLog branches/gcc-4_7-branch/libstdc++-v3/include/bits/stl_algo.h Fixed for 4.7.4/4.8.2/4.9.0. |