This is the mail archive of the libstdc++@gcc.gnu.org mailing list for the libstdc++ project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: std::max/min optimization


On Tue, Nov 25, 2003 at 04:11:06PM -0800, Ulrich Drepper wrote:
> Nathan Myers wrote:
> > I have been experimenting with code sequences for computing max() and 
> > min() on integer types.  What I've found is that an implementation like 
> > 
> >   int max(int a, int b) 
> >     { int i = -(a > b); return (a & i)|(b & ~i); }
> > 
> > is about three times faster, on P3 (and probably moreso on P4) than
> > the naive implementation as found in our library:
> 
> ... Exactly because mispredicted branches
> (and max has a 50% misprediction rate in general) is bad the designers
> added support to avoid them: conditional instructions.  More concrete:
> conditional moves.  These instructions are used automatically if you
> tell gcc to use them which is only  the case if you do *not* generate
> plain old i386 code.  Add the -march=pentium4 option of whatever is
> adequate.

My mistake was in testing with -mtune=pentium4 in place of -march=pentium4.

> If somebody wants optimally performing code s/he should be able to get
> it.  This is not possible with the blended code.  And if the appropriate
> compiler options are not used, there is obviously no interest in the
> best performance.

The policy we have generally tried to observe is to write code that 
works correctly regardless of compiler options, but which gives best 
performance on the fastest machines available, on the assumption that 
users who care most about performance are using fast machines.

However, the overwhelming majority of programs are compiled *without*
-march options, by people other than the end users.  Ulrich reports
that the version using bitwise logical operations is only 15% slower 
than the clearly optimal version using conditional move, while the 
current version is over 200% slower if compiled with the generic 
compiler.   Summing over all the programs that are actually run, 
which produces a better overall result?

Probably the compiler should optimize "(a < b) ? b : a" using
conditional moves when it is allowed to, and bitwise operators 
otherwise, unless it has specifically been told to tune for a 
non-pipelined target (e.g. "-mtune=i386").

Nathan Myers
ncm-nospam@cantrip.org


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]