This is the mail archive of the libstdc++@gcc.gnu.org mailing list for the libstdc++ project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Change to search_n


Hi Chris, hi list,

First of all I am sorry about the big delay in my
answer. Unfortunately the last couple of months I have
a lot to do and my free time is bellow zero! Moreover,
I needed some extra time in order to become familiar
with mingw, gcc, make etc. (great tools!) Until now I
had mingw in my computer but I wasn't really using it.

The first thing I did with your search_n
implementation was to compare its performance with my
own implementation. I made this comparison in my
Pentium III 800Hz PC, running WinXP-Pro and using both
the VC7.1 and mingw-gcc 3.2.3 compilers. Both
compilers were set to optimize intensively. (gcc
options:  -g1 -O2 -D NDEBUG -Wall)

The result of the above comparison proved to me that
your implementation has some performance problems.
Below, I will mention the two most notable performance
issues that I have found, along with two other issues
that I consider important.

--------------------------------------------------------

A) Slow skip process.

Most of the time, our algorithms are expected to just
test one element and then skip the next N-1 elements
of the search range [__first, __last) without
performing any backtracking at all. This is especially
true in text search, where the probability of meeting
a matching element is typically low. Nevertheless, I
have noticed that you are using too much (and probably
too slow) code in order to implement this skip
process:

Code that executes in the skip process of your
implementation:

if(*__first == __val) //common code
if(__last - __first > __count)
__first += __count; //common code
__precount = 0;

Code that executes in the skip process of my
implementation:

if (__tailSize >= __count)
__tailSize -= __count;
if (*__lookAhead == __val)  //common code
__lookAhead += __count;  //common code

By removing the common code we have:

Your implementation: 

one iterator subtraction
one integer comparison //common operation
one integer assignment

My implementation:

one integer comparison //common operation
one integer subtraction

That means you are using one iterator subtraction
instead of one integer subtraction (that can be much
slower, depending on the iterator) and you are also
using one extra integer assignment. It is true that in
many cases the modern CPUs (with many ALUs and
pipeline stages) will make this difference negligible
in practice, but I expect this performance issue to be
exposed in many combinations of the following cases:

1) When the element comparison itself is fast.
2) When the probability of meeting a matching element
is rather low.
3) When N is small, mostly when N is 2, because in
that case the algorithm has to execute the skip
process more times in the same search range [__first,
__last).
4) When iterator arithmetic is slow (for example when
using deques).
5) When the target CPU lacks of extra ALUs or many
pipeline stages.

--------------------------------------------------------

B) Slow performance when the probability of meeting a
matching element is high.

In this case the skip process is not our primary
concern anymore. The algorithm performance now depends
much more on the backtracking process. Typical example
of a situation like that is a DNA-pattern search.
Unfortunately, all the performance tests that I have
made, have proved that your implementation has
performance problems when the probability (P) of
meeting a matching element in the search range
[__first, __last) is high. for example, in a
particular test (test D of my article) when P was 50%
and N was equal to 20, your implementation had tested
16% of the overtaken elements, while mine had only
tested 14% of the overtaken elements.

--------------------------------------------------------

3) A serious bug.

The bug can be reproduced by running the following
snippet:

char seq[] = {0, 1, 1, 1};
char *seqStart = seq;
char *seqEnd = seq + 3;

assert( search_n(seqStart, seqEnd, 3, 1) == seqEnd);

The most important thing in this bug is not that
search_n returns an incorrect result, but the fact
that search_n is actually dereferencing and using
*__last, while operating on the range [__first,
__last). This can lead to access faults, exceptions
and even crashes! The problem starts in code-line
__first += __count - __precount; and the potentially
invalid access comes immediately after.

--------------------------------------------------------

4) Incompatible to unsigned integers.

In your implementation you are making the assumption
that __count is a signed integer. Consequently, when
__count happens to be unsigned, the compilers complain
by producing warning messages like: "comparison
between signed and unsigned". These assumptions are
both annoying and potentially dangerous. (The STL that
comes with VC also makes this kind of assumptions
frequently, but SGI-STL is much more careful.)

--------------------------------------------------------

Recommendations:

I would recommend to first concentrate on improving
the "generic" performance of your core algorithm and
test it intensively. Then test it again with a variety
of iterators, a variety data types and a variety of
data distributions. Compare it a lot with my
implementation. You can download my performance tests
from CodeProject and use them to compare the two
algorithms.

After ensuring that your core algorithm is at least as
fast as mine (there is no sense in re-implementing a
slower version of a piece of free code that already
exists!) then (and only then) is the time to worry
about backward prefetchers in "exotic" targets and do
specializations for N=2, N=3, N=4 etc. In my opinion,
it is essential that you first work a lot more with
your generic/core algorithm.


Best regards,
Jim Xochellis



> As Paolo has said, there was some worry this patch
> might slow things 
> down in some cases.. in particular, I found it did
> slow things down when 
> searching for strings of length 1 or 2 on some
> computers. Of course 
> search_n for length 1 is just find, so we can ask
> find to do it, and I 
> wrote a specialised version for length 2. I attach
> to this email my 
> latest version from my personal gcc tree, which I am
> more than happy for 
> you to comment on / suggest changes :)
> 
> As to submitting this into libstdc++, my personal
> current temptation was 
> to submit it to the "so_7" branch, so called as it
> will be the next 
> version of libstdc++ which is not binary compatable
> with the current 
> version (so_6), mainly because I felt such a time
> was a good time to 
> insert code which might change behaviour. Of course
> this code shouldn't 
> change the behaviour of any code which follows the
> standard, but we've 
> all seen enough badly written code to know it might
> upset someone..
> 
> Chris
> > 
>   template<typename _ForwardIterator, typename
> _Integer, typename _Tp>
>     _ForwardIterator
>     __search_n(_ForwardIterator __first,
> _ForwardIterator __last,
> 	       _Integer __count, const _Tp& __val,
> forward_iterator_tag)
>     {
>       __first = std::find(__first, __last, __val);
>       while (__first != __last)
> 	{
> 	  typename
> iterator_traits<_ForwardIterator>::difference_type
> 	    __n = __count;
> 	  _ForwardIterator __i = __first;
> 	  ++__i;
> 	  while (__i != __last && __n != 1 && *__i ==
> __val)
> 	    {
> 	      ++__i;
> 	      --__n;
> 	    }
> 	  if (__n == 1)
> 	    return __first;
> 	  else
> 	    if(__i != __last)
> 	      __first = std::find(++__i, __last, __val);
> 	    else
> 	      return __last;
> 	}
>       return __last;
>     }
> 
>  template<typename _RandomAccessIterator, typename
> _Integer, typename _Tp>
>    _RandomAccessIterator
>    __search_n(_RandomAccessIterator __first,
> _RandomAccessIterator __last,
> 	       _Integer __count, const _Tp& __val,
> random_access_iterator_tag)
>     {
>       if(__last - __first < __count)
> 	return __last;
>       if(__count == 1)
> 	return std::find(__first, __last, __val);
>       if(__count == 2)
> 	{
> 	  __first++;
> 	  while(1)
> 	    {
> 	      if(*__first == __val)
> 		{
> 		  if(*(--__first) == __val)
> 		    return __first;
> 		  if(__last - __first <= 2)
> 		    return __last;
> 		  __first += 2;
> 		  if(*__first == __val)
> 		    return --__first;
> 		}
> 	      if(__last - __first <= 2)
> 		return __last;
> 	      else __first += 2;
> 	    }
> 	}
>       else
> 	{
> 	  // Note: The following code is not correct for
> __count < 3
> 	  typename
>
iterator_traits<_RandomAccessIterator>::difference_type
> 	    __precount = (*__first == __val);
> 	  __first += __count - __precount;
> 	  while(1)
> 	    {
> 	      if(*__first == __val)
> 		{
> 		  _RandomAccessIterator __it1 = __first;
> 		  typename
>
iterator_traits<_RandomAccessIterator>::difference_type
> 		    __n = __count - 1;
> 		  --__it1;
> 		  while(*__it1 == __val && --__n > __precount)
> 		    --__it1;
> 		  if(__n == __precount)
> 		    return __it1 - __precount;
> 		  if(__last - __first <= __n)
> 		    return __last;
> 		  typename
>
iterator_traits<_RandomAccessIterator>::difference_type
> 		    __cache_n  = __n;
> 		  _RandomAccessIterator __it2 = __first + __n;
> 		  while(*__it2 == __val && --__n > 0)
> 		    --__it2;
> 		  if(__n == 0)
> 		    return ++__it1;
> 		  __precount = __cache_n - __n;
> 		  if(__last - __first > __count - __precount)
> 		    __first += __count - __precount;
> 		  else
> 		    return __last;
> 		}
> 	      else
> 		{
> 		  if(__last - __first > __count)
> 		    {
> 		      __first += __count;
> 		      __precount = 0;
> 		    }
> 		  else
> 		    return __last;
> 		}
> 	    }
> 	}
>     }
>   
>   /**
>    *  @brief Search a sequence for a number of
> consecutive values.
>    *  @param  first  A forward iterator.
>    *  @param  last   A forward iterator.
>    *  @param  count  The number of consecutive
> values.
>    *  @param  val    The value to find.
>    *  @return   The first iterator @c i in the range
> @p [first,last-count)
>    *  such that @c *(i+N) == @p val for each @c N in
> the range @p [0,count),
>    *  or @p last if no such iterator exists.
>    *
>    *  Searches the range @p [first,last) for @p
> count consecutive elements
>    *  equal to @p val.
>   */
>   template<typename _ForwardIterator, typename
> _Integer, typename _Tp>
>     _ForwardIterator
>     search_n(_ForwardIterator __first,
> _ForwardIterator __last,
> 	     _Integer __count, const _Tp& __val)
>     {
>       // concept requirements
>      
>
__glibcxx_function_requires(_ForwardIteratorConcept<_ForwardIterator>)
>      
>
__glibcxx_function_requires(_EqualityComparableConcept<
> 	    typename
> iterator_traits<_ForwardIterator>::value_type>)
>      
>
__glibcxx_function_requires(_EqualityComparableConcept<_Tp>)
>       __glibcxx_requires_valid_range(__first,
> __last);
> 
>       if (__count <= 0)
> 	return __first;
>       else
> 	return __search_n(__first, __last, __count, __val, 
> 			  std::__iterator_category(__first));
>     }
> 
>  

____________________________________________________________
Do You Yahoo!?
Αποκτήστε τη δωρεάν @yahoo.gr διεύθυνση σας στο http://www.otenet.gr


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]