This is the mail archive of the libstdc++@gcc.gnu.org mailing list for the libstdc++ project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

std::max/min optimization


I have been experimenting with code sequences for computing max() and 
min() on integer types.  What I've found is that an implementation like 

  int max(int a, int b) 
    { int i = -(a > b); return (a & i)|(b & ~i); }

is about three times faster, on P3 (and probably moreso on P4) than
the naive implementation as found in our library:

  int max(int a, int b) 
    { if (a < b) return b; else return a; }

This happens, apparently, because a conditional branch mis-predicted 
introduces a bubble into the pipeline where a series of arithmetic/logical 
operations does not.  Interestingly, the tricky but more or less equivalent

  int max(int a, int b) 
    { return a - ((a - b) & -(a < b)); }

is rewritten by the optimizer to exactly the same code as the *second*
version above.  In other words, the optimizer badly pessimizes the code.  
Probably that "optimization" should be removed, at least for targets 
that are heavily pipelined.

It is tricky to get correct timing results, because naive benchmarks 
get spoofed by branch prediction.  You have to feed it really-random 
input to get meaningful results.  When branch prediction comes out 
right every time, the current version appears faster.  (Of course, user 
code actually susceptible to branch prediction benefits similarly.  
I don't know how common that is.)  My benchmark is attached below;
I put the implementation of max() in a separate compilation unit.
I din't try it with inlining, which seemed likely to confuse
mattters.

We can apply these results directly by specializing std::min, max, 
(and maybe abs) for integer types, until the compiler can be persuaded 
to optimize such expressions in the right direction.

An extension to evaluate the branching version might be worthwhile 
for those applications where branch prediction would usually succeed.
It might be called "reducing_max", because it is in reductions where
one expects to see most evaluations favor one or other argument.

Nathan Myers
ncm-nospam@cantrip.org

---cut----

#include <unistd.h>

int buf[256];
extern int max(int, int);
int main()
{
  int i = 0;
  int sum = 0;

  int fd = open("/dev/urandom", 0);
  if (read(fd, (char*)buf, sizeof(buf)) != sizeof(buf))
    return 0;
  for (; i < 100000000; ++i) {
      sum += max(buf[i&255], buf[~i&255]);
  }
  return sum == 0;
}

------------

gcc -O3 -funroll-all-loops -c maxt.c
gcc -O3 -c max.c
gcc -o t max.c maxt.c
time ./t


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]