Bug 58358 - [4.7 Regression] search_n has a Complexity violation for random access iterator
Summary: [4.7 Regression] search_n has a Complexity violation for random access iterator
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: libstdc++ (show other bugs)
Version: 4.8.2
: P3 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-09-08 01:59 UTC by Mitsuru Kariya
Modified: 2013-09-19 10:21 UTC (History)
3 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2013-09-08 00:00:00


Attachments
Bug patch (1.37 KB, patch)
2013-09-08 20:36 UTC, Chris Jefferson
Details | Diff
Patch (675 bytes, patch)
2013-09-09 16:56 UTC, Mitsuru Kariya
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Mitsuru Kariya 2013-09-08 01:59:49 UTC
Following code should print less than or equal to 11, but it prints 20.

#include <algorithm>
#include <vector>
#include <iostream>

int main()
{
  std::vector<int> a{2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1};
  int count = 0;
  std::search_n(a.begin(), a.end(), 10, 1, [&count](int t, int u){ ++count; return t == u; });
  std::cout << count << std::endl;
}
Comment 1 Marc Glisse 2013-09-08 06:39:23 UTC
The indexes of the values that are tested:
9 8 7 6 5 4 3 2 1 0 10 9 8 7 6 5 4 3 2 1

It starts well, first checking 9 because if that one fails we can skip testing 0-8. The backtrack is normal. Once the backtracking fails, the code jumps to a sensible place (so that backtracking will go precisely to the last place before we failed) but it forgets that it has already tested many of those values.
Comment 2 Paolo Carlini 2013-09-08 09:12:54 UTC
This isn't the original HP/SGI algorithm, and was supposed to be an optimization, probably often is, but we can't violate the requirements so easily. Luckily Chris is around, let's ask him to look into this.
Comment 3 Paolo Carlini 2013-09-08 09:22:09 UTC
And this is a regression, old but still a regression. If we can't fix the new algorithm easily enough, we may have to return to the old one.
Comment 4 Chris Jefferson 2013-09-08 11:19:41 UTC
Just to say I see this, and fortunately it's not hard to keep the optimisation and meet the complexity requirements.

Expect patch later today.
Comment 5 Paolo Carlini 2013-09-08 11:29:19 UTC
Great, thanks Chris.
Comment 6 Chris Jefferson 2013-09-08 14:06:30 UTC
I have a patch I believe fixes this, but trunk doesn't currently build on my machine (Bug 58340). I'll wait for that to get fixed.

It is annoying there is still separate predicate and non-predicate copies of every function.
Comment 7 Paolo Carlini 2013-09-08 14:09:35 UTC
Yes, bootstrap is known to be broken, you can do:

  update -r202295 tree-ssa-threadedge.c
Comment 8 Paolo Carlini 2013-09-08 14:15:03 UTC
About the duplication, you may want to review what Francois posted to the mailing list a few days ago and send your comments. Personally, I agree it would be very nice to avoid the duplication, but at the same time when I discussed the topic with Howard a few years ago he explained that there are some tricky details, in particular vs proxy-iterators, we want to be really sure that nothing breaks vs those, the "natural" idea of using std::less & co doesn't work, a beefed up version is required (this is horrible IMHO, but we may have to live with it). Again, if you are interested, please have a look to that patch. Thanks!
Comment 9 Daniel Krügler 2013-09-08 14:22:47 UTC
(In reply to Paolo Carlini from comment #8)
> About the duplication, you may want to review what Francois posted to the
> mailing list a few days ago and send your comments. Personally, I agree it
> would be very nice to avoid the duplication, but at the same time when I
> discussed the topic with Howard a few years ago he explained that there are
> some tricky details, in particular vs proxy-iterators, we want to be really
> sure that nothing breaks vs those, the "natural" idea of using std::less &
> co doesn't work, a beefed up version is required (this is horrible IMHO, but
> we may have to live with it).

Shouldn't the library be able to use the new diamond operators (specializations in void) that use perfect forwarding for both arguments and result? User-code cannot specialize these operations.
Comment 10 Paolo Carlini 2013-09-08 14:27:26 UTC
Daniel, please send your comment to the mailing list, Francois wants to know. In any case, note that we want to have something for C++98 too, otherwise we can't really remove the duplicates. Or maybe those aren't *that* new, sorry I'm in the middle of too many other things ;)

Anyway, this discussion is off topic here.
Comment 11 Marc Glisse 2013-09-08 14:48:25 UTC
(In reply to Daniel Krügler from comment #9)
> Shouldn't the library be able to use the new diamond operators
> (specializations in void) that use perfect forwarding for both arguments and
> result? User-code cannot specialize these operations.

Perfect forwarding was never perfect, I wish that name hadn't stuck.

http://gcc.gnu.org/ml/libstdc++/2012-06/msg00066.html
Comment 12 Marc Glisse 2013-09-08 19:26:26 UTC
Chris, did you consider applying this optimized code to bidirectional iterators and not just random access iterators? We may end up doing a few more ++/-- than necessary, but not by more than a factor 2 I believe, and we would often save many calls to the predicate. Something may also be doable for forward iterators, but that's more complicated for less gain and couldn't share the same code.
Comment 13 Chris Jefferson 2013-09-08 20:36:44 UTC
Created attachment 30767 [details]
Bug patch

Patch attached.

This features:

1) As well as checking backwards when we find a match, check forwards too. Improve so we never do more than n checks in an array of size n.

2) Improve the existing tester (which tries all arrays up to length 15) to check the number of predicate calls.

3) Add the test included in this bug as a new test, just for completeness.
Comment 14 Chris Jefferson 2013-09-08 20:37:50 UTC
(In reply to Marc Glisse from comment #12)
> Chris, did you consider applying this optimized code to bidirectional
> iterators and not just random access iterators? We may end up doing a few
> more ++/-- than necessary, but not by more than a factor 2 I believe, and we
> would often save many calls to the predicate. Something may also be doable
> for forward iterators, but that's more complicated for less gain and
> couldn't share the same code.

I considered this, but as you say this would slow things down in some cases, and I've found bugs which cause slowdowns in any situation tend to have serious problems getting accepted.
Comment 15 Mitsuru Kariya 2013-09-09 16:56:47 UTC
Created attachment 30775 [details]
Patch

For your convenience, I attached a patch for this problem.

This algorithm is always scanned to reverse order.
If a scan fails in less than enough, a re-scan is performed from the point that advanced necessary elements from the original starting point.

For example, if __count is 20 and a scan fails after 18 elements succeeded, a re-scan is performed from the point that advanced 2 elements.
Comment 16 Paolo Carlini 2013-09-09 16:59:50 UTC
Thanks Chris, but I almost missed the patch. Please send it to the mailing list, thanks!
Comment 17 Paolo Carlini 2013-09-09 17:03:45 UTC
Two nits: in the new testcase, remember to declare the test variable. Also, we are standardizing on -std=gnu++11.
Comment 18 Paolo Carlini 2013-09-10 12:39:40 UTC
Chris, I'm going to slightly tweak / test again locally / post to the mailing list your patch in order to apply it. Please have a look to Comment #15, in the meanwhile, thanks!
Comment 19 Chris Jefferson 2013-09-10 13:19:42 UTC
(In reply to Mitsuru Kariya from comment #15)
> Created attachment 30775 [details]
> Patch
> 
> For your convenience, I attached a patch for this problem.
> 
> This algorithm is always scanned to reverse order.
> If a scan fails in less than enough, a re-scan is performed from the point
> that advanced necessary elements from the original starting point.
> 
> For example, if __count is 20 and a scan fails after 18 elements succeeded,
> a re-scan is performed from the point that advanced 2 elements.

This patch is good, but takes a different route., trying to always skip as far forwards as possible. On a small test (compiling the 'search_n/iterator.cc' test, disabling forward checking and bidirection tests, increasing TEST_DEPTH to 23, compiling -O3):

Both implementations pass correctness and number of comparison tests.

Mitsuru's code is about 1K smaller.

My code is slightly faster (3.74sec vs 4.12sec)

I think I might choose Mitsuru's code, as his code is smaller in terms of both source and binary, and the performance difference is not that big, but either would be fine.
Comment 20 Paolo Carlini 2013-09-10 13:28:41 UTC
I see, thanks Chris. Don't you have the performance numbers for the baseline, "naive" code, do you? To put in better perspective 3.74secs vs 4.12secs. Like, if the baseline is 1 or 3.5 isn't the same!

In fact, I also like Mitsuru' patch, it's simplicity in particular. But again, I would not choose it if the baseline is 3.5secs.
Comment 21 Paolo Carlini 2013-09-10 13:30:30 UTC
Sorry. Take my 1sec and 3.5secs, as, say, 4.5secs and 20secs. You see may point.
Comment 22 Chris Jefferson 2013-09-10 14:01:08 UTC
Her are some comparisons. Just to compare, I also checked doing away with skipping optimisations altogether. Binary sizes (-O3, stripped)


current head: 11928
my code:      12248
Mitsuru:      11384
No skip:      10904

Timing:

current head: 3.70
my code:      3.70
Mitsuru:      4.04
No skip:     15.37

So we clearly want to do skipping. The tradeoff is between same speed and bigger executables (my code) or ~10% slower but saving 1K or so binary and some source (Mitsuru's code). I don't know what gcc/libstdc++'s general direction in that area is.

I actually would expect Mitsuru's code to be faster (as it tries harder to skip forwards), but it is hard to predict how these things interact with optimisers/caches/branch predictors at a low level.
Comment 23 Paolo Carlini 2013-09-10 16:41:25 UTC
Thanks a lot Chris. Sorry if I bother you for a few minutes more: when you say "doing away with skipping optimizations altogether", you mean, essentially, using the algorithm we have got now for forward iterators? Which is also, if I remember correctly, what we had earlier for random access iterators too? 

In that case, I'm really unsure: I'm tempted by what Mitsuru is proposing, but I'm not 100% sure, not because of the 10% difference, but more importantly because we know well what we have got, nobody complained for so many years, and your change would be only a small tweak of it. Also, I know you are around to maintain it, which isn't a minor detail! In case, would you be willing to maintain Mitsuru's code too? Eventually, I think I will send both patches to the mailing list with my personal opinion, and I will ask.
Comment 24 paolo@gcc.gnu.org 2013-09-11 22:24:53 UTC
Author: paolo
Date: Wed Sep 11 22:24:50 2013
New Revision: 202510

URL: http://gcc.gnu.org/viewcvs?rev=202510&root=gcc&view=rev
Log:
2013-09-11  Mitsuru Kariya  <kariya_mitsuru@hotmail.com>
	    Chris Jefferson  <chris@bubblescope.net>

	PR libstdc++/58358
	* include/bits/stl_algo.h (search_n): Fix to guarantee a number
	of comparisons <= number of elements in the range.
	* testsuite/25_algorithms/search_n/58358.cc: New.
	* testsuite/25_algorithms/search_n/iterator.cc: Extend.

Added:
    trunk/libstdc++-v3/testsuite/25_algorithms/search_n/58358.cc
Modified:
    trunk/libstdc++-v3/ChangeLog
    trunk/libstdc++-v3/include/bits/stl_algo.h
    trunk/libstdc++-v3/testsuite/25_algorithms/search_n/iterator.cc
Comment 25 Paolo Carlini 2013-09-11 22:27:06 UTC
Fixed in mainline so far.
Comment 26 paolo@gcc.gnu.org 2013-09-19 10:20:01 UTC
Author: paolo
Date: Thu Sep 19 10:19:58 2013
New Revision: 202736

URL: http://gcc.gnu.org/viewcvs?rev=202736&root=gcc&view=rev
Log:
2013-09-19  Mitsuru Kariya  <kariya_mitsuru@hotmail.com>
	    Chris Jefferson  <chris@bubblescope.net>

	PR libstdc++/58358
	* include/bits/stl_algo.h (search_n): Fix to guarantee a number
	of comparisons <= number of elements in the range.
	* testsuite/25_algorithms/search_n/58358.cc: New.
	* testsuite/25_algorithms/search_n/iterator.cc: Extend.

Added:
    branches/gcc-4_8-branch/libstdc++-v3/testsuite/25_algorithms/search_n/58358.cc
Modified:
    branches/gcc-4_8-branch/libstdc++-v3/ChangeLog
    branches/gcc-4_8-branch/libstdc++-v3/include/bits/stl_algo.h
    branches/gcc-4_8-branch/libstdc++-v3/testsuite/25_algorithms/search_n/iterator.cc
Comment 27 Paolo Carlini 2013-09-19 10:21:07 UTC
Fixed for 4.8.2 too. WONTFIX in 4.7.x.