This is the mail archive of the libstdc++@gcc.gnu.org mailing list for the libstdc++ project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Performance of std::max_element and std::min_element


2008/4/28 David GONZALEZ MALINE <David.Gonzalez.Maline@cern.ch>:
> Dear Developers,
>
>  I have written a month ago about the performance of the
> std::next_permutation. As I said before, we are making some performance
> tests on some of the standard library algorithms against the ones we use in
> ROOT (see Thu, 13 Mar 2008).
>
>  This time I want to show the results obtained for the std::max_element and
> std::min_elements. We have tried the algorithm in three different machines
> and the results on performance are as follows (on microseconds per call):
>                                                           std algorithms
> ROOT algorithms
>         MacOSX
>  (with Intel 64 bits & gcc 4.0)                              317.703
> 84.94
>      Linux 64 bits
>    (scientific Linux 4 & gcc 3.4.6)                              398.7
> 129.113
>     Linux 32 bits
>    (scientific Linux 4 & gcc 3.4.6)                             412.82
> 623.95       // Only case where there is a worst time.

This kind of fine tuning can be very subtle, although that doesn't
mean you should give up on it.

On MacOSX, 32 bits, recent gcc svn, I tried the following:

Note: 'iterator looping' is "while(++__first != __last)", while 'int
looping' is "for(int i = 1; i < n; ++i)"

int looping, int looping + 'register', iterator looping, iterator
looping + 'register',

-O: 3.2s, 3.2s, 3.0s, 1.9s
-O2: 3.2s, 2.3s, 3.0s, 1.9s
-O3: 3.6s, 3.6s, 1.9s, 1.9s
-O3 -funroll-all-loops: 2.7s, 2.7s, 2.2s, 2.2s

So it seems iterator looping is always better, but it's not till -O3
that the compiler can get the same speed-up as introducing the extra
variable ourselves.

Note that adding this extra variable would have to be done carefully,
because in general you aren't allowed to perform any copies in
std::min / std::max. There are already algorithms which are
specialised for built-in types however, so this kind of specialisation
isn't new.



>  As the table shows, only in the version of Linux with 32 bits the
> performance is improved. Both algorithms are very similar, with the
> difference that in the std it seems like it dereferences one of the
> iterators in every epoch of the loop:
>
>      if (__first == __last)
>        return __first;
>      _ForwardIterator __result = __first;
>      while (++__first != __last)
>        if (*__result < *__first)      ///   Always dereference the *__result
> iterator!
>          __result = __first;
>      return __result;
>
>
>  In the ROOT algorithm, the same value is stored in a register variable:
>
>  Long64_t *TMath::LocMin*(Long64_t n, *const* Short_t *a)
>  {
>   /// Return index of array with the minimum element.
>  /   /// If more than one element is minimum returns first found.
>  /
>   *if*  (n <= 0 || !a) *return* -1;
>   Short_t xmin = a[0];      // This is the variable used for the comparison.
>   Long64_t loc = 0;
>   *for*  (Long64_t i = 1; i < n; i++) {
>      *if* (xmin > a[i])  {
>         xmin = a[i];
>         loc = i;
>      }
>   }
>   *return* loc;
>  }
>
>  It seems like in most of the systems, the compiler optimizes the second
> version better.
>
>  Hope it helps.
>  David
>
>


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]