This is the mail archive of the
libstdc++@gcc.gnu.org
mailing list for the libstdc++ project.
Re: std::max/min optimization
- From: Ulrich Drepper <drepper at redhat dot com>
- To: Nathan Myers <ncm-nospam at cantrip dot org>
- Cc: libstdc++ at gcc dot gnu dot org
- Date: Tue, 25 Nov 2003 16:11:06 -0800
- Subject: Re: std::max/min optimization
- Organization: Red Hat, Inc.
- References: <20031125231850.GA13072@tofu.dreamhost.com>
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Nathan Myers wrote:
> I have been experimenting with code sequences for computing max() and
> min() on integer types. What I've found is that an implementation like
>
> int max(int a, int b)
> { int i = -(a > b); return (a & i)|(b & ~i); }
>
> is about three times faster, on P3 (and probably moreso on P4) than
> the naive implementation as found in our library:
Your argumentation is flawed. You compare apples and oranges.
You argue that because of the architecture of the i686+ processors plain
old i386 code is slow and your convoluted code is better. But this is
not the comparison you must make. You must compare it with the code
generated for these processors. Exactly because mispredicted branches
(and max has a 50% misprediction rate in general) is bad the designers
added support to avoid them: conditional instructions. More concrete:
conditional moves. These instructions are used automatically if you
tell gcc to use them which is only the case if you do *not* generate
plain old i386 code. Add the -march=pentium4 option of whatever is
adequate.
What you could have said is that your code is a compromise. It
generating sufficiently blended code to perform acceptable regardless of
the compiler options. But it's certainly not optimal. The plain version
int max(int a, int b)
{ return a > b ? a : b; }
is about 15%+ faster on a P4 than your code (my entire test program,
including all the overhead, runs 10%+ faster with the simple code).
For this reason I would strongly recommend to not make this change. If
somebody wants optimally performing code s/he should be able to get it.
This is not possible with the blended code. And if the appropriate
compiler options are not used, there is obviously no interest in the
best performance.
- --
â Ulrich Drepper â Red Hat, Inc. â 444 Castro St â Mountain View, CA â
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.3 (GNU/Linux)
iD8DBQE/w++b2ijCOnn/RHQRAhprAKC8x+U6fAQ7Pu3EBImgDbgJHUJWDwCfX1HB
zjJpuL6ZZ0ohXafBVyGFP14=
=VTWH
-----END PGP SIGNATURE-----