This is the mail archive of the gcc-help@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: gcc-performance::number of local variables

From: "John (Eljay) Love-Jensen" <eljay at adobe dot com>
To: Martin Ettl <ettl dot martin at gmx dot de>, GCC-help <gcc-help at gcc dot gnu dot org>
Date: Mon, 1 Jun 2009 05:03:41 -0700
Subject: Re: gcc-performance::number of local variables

Hi Martin,

Since you did not specify, I presume your platform is Motorola 68060 running
Amiga OS 3.9 on Amiga 3000 with Phase5 CyberStorm, and you are using GCC
4.4.0.

Ultimately, you should profile your routines, and you should look at the
assembly for your platform, and you should have test code which exercises
your routines for a variety of inputs and expected outputs (both the return
value and the out parameter).

For the two const&, since they are scalar types you probably should
pass-by-value or pass-by-const-value.

I would write your routine this way, and then profile, run the test cases,
and (if performance critical, as your inquiry suggests) check the assembly.

double vFoo(
  double const a,
  double const b,
  double& e)
{
  double const s = a + b;
  double const h = s - a;
  e = (s - (a - h)) + (h - b);
  return s;
}

Note that since GCC uses the really, really cool SSA optimization, you
really do not gain much benefit from making the code obscure in order to
encourage the compiler to optimize it for many kinds of optimizations.

That caveat may not apply to manually hoisting loop-invariant code, where
there are non-inline functions involved.  Or even some manual loop
unrolling, in some situations.  BUT if you do those to help improve
performance, PLEASE leave a big flashing comment to the maintenance
programmer to indicate why the code was manually optimized, and PLEASE leave
a non-micro-optimized reference implementation as well.  (Preferably with a
test case which pits the legible reference implementation against the
lovingly hand-tweaked micro-optimized illegible routine.)

Also, do not hand hand-tweak micro-optimize a routine until you've profiled
the reference implementation and determined that there is room for
improvement.

"We should forget about small efficiencies, say about 97% of the time:
premature optimization is the root of all evil."
~ Donald Knuth and/or Tony Hoare

ALSO, beware and be aware that with floating point calculation, seemingly
innocuous reordering of operations may have calculation ramifications
(sometimes serious ones).

q.v. ...

What Every Computer Scientist Should Know About Floating-Point Arithmetic
by David Goldberg
http://www.physics.ohio-state.edu/~dws/grouplinks/floating_point_math.pdf

ONE MORE THING, depending on your platform, you may want to pay close
attention to the variety of relevant floating-point flags.  Some can improve
performance dramatically.  Others can improve IEC 60559 (aka IEEE 754)
compliance greatly, but sometimes at the cost of performance.

Sincerely,
--Eljay

PS: I only mention the 68060 above because for assembly, 680x0 assembly is
my one true love.  I don't enjoy slogging through x86, x86_64, SPARC, Alpha,
or PowerPC assembly.  (And I haven't slogged through 6502 for >3 decades.)
Your mileage may vary.

References:
- gcc-performance::number of local variables
  - From: Martin Ettl

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]