This is the mail archive of the
libstdc++@gcc.gnu.org
mailing list for the libstdc++ project.
std::max/min optimization
- From: Nathan Myers <ncm-nospam at cantrip dot org>
- To: libstdc++ at gcc dot gnu dot org
- Date: Tue, 25 Nov 2003 15:18:50 -0800
- Subject: std::max/min optimization
I have been experimenting with code sequences for computing max() and
min() on integer types. What I've found is that an implementation like
int max(int a, int b)
{ int i = -(a > b); return (a & i)|(b & ~i); }
is about three times faster, on P3 (and probably moreso on P4) than
the naive implementation as found in our library:
int max(int a, int b)
{ if (a < b) return b; else return a; }
This happens, apparently, because a conditional branch mis-predicted
introduces a bubble into the pipeline where a series of arithmetic/logical
operations does not. Interestingly, the tricky but more or less equivalent
int max(int a, int b)
{ return a - ((a - b) & -(a < b)); }
is rewritten by the optimizer to exactly the same code as the *second*
version above. In other words, the optimizer badly pessimizes the code.
Probably that "optimization" should be removed, at least for targets
that are heavily pipelined.
It is tricky to get correct timing results, because naive benchmarks
get spoofed by branch prediction. You have to feed it really-random
input to get meaningful results. When branch prediction comes out
right every time, the current version appears faster. (Of course, user
code actually susceptible to branch prediction benefits similarly.
I don't know how common that is.) My benchmark is attached below;
I put the implementation of max() in a separate compilation unit.
I din't try it with inlining, which seemed likely to confuse
mattters.
We can apply these results directly by specializing std::min, max,
(and maybe abs) for integer types, until the compiler can be persuaded
to optimize such expressions in the right direction.
An extension to evaluate the branching version might be worthwhile
for those applications where branch prediction would usually succeed.
It might be called "reducing_max", because it is in reductions where
one expects to see most evaluations favor one or other argument.
Nathan Myers
ncm-nospam@cantrip.org
---cut----
#include <unistd.h>
int buf[256];
extern int max(int, int);
int main()
{
int i = 0;
int sum = 0;
int fd = open("/dev/urandom", 0);
if (read(fd, (char*)buf, sizeof(buf)) != sizeof(buf))
return 0;
for (; i < 100000000; ++i) {
sum += max(buf[i&255], buf[~i&255]);
}
return sum == 0;
}
------------
gcc -O3 -funroll-all-loops -c maxt.c
gcc -O3 -c max.c
gcc -o t max.c maxt.c
time ./t