This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: poor optimisation case
On Sun, 2007-08-05 at 16:58 -0400, Tim Prince wrote:
> tristan@wibberley.org wrote:
[snip]
> > The loop unfortunately can't always be written as in -DOLD as the
> > implementation of an iterator adapter might use ?: to special case the
> > first element of a sequence and when used in a generic algorithm which
> > just has the simple loop of -DNEW it ought to be optimised like -DOLD if
> > inlining occurs.
> >
> I don't see why you special case the first iteration of a loop with ?
> inside the loop. Simply write the first iteration separately, and begin
> the loop with the next iteration. It should be a lot clearer both to us
> and to the compiler what is your intention.
> Doesn't this belong on gcc-help? Better peeling optimization needs more
> justification than this.
As above. When using C++ as it's supposed to be used this isn't
possible. If I've got a sequence of values and I want to sum them there
is a generic sum algorithm that I'm supposed to use (and which I
*should* use to avoid unmaintainable spaghetti code which soon turns up
when every thing has to be hand coded to be fast - I've read "numerical
recipes in C" and nearly killed myself by the end of it).
If I've got a 1000000 element sequence and want (on one thread) to
compare what would happen in my generic algorithm (such as std::sum) if
the first element is doubled but (on a second thread) with the sequence
as it is, I'm not supposed to be required to write two versions of a
large piece of code - I'm supposed to be able to just write an iterator
that returns a different value for the first element.
In the case of returning a different value for the first element, I'd
use something in operator* that after inlining would end up as
equivalent to d?d:1 and the compiler should optimise that if it can and
if it would reduce runtime by a massive 33% and text size by quite a
bit. Both of those are true in this case.
Basically, writing two versions of the loop is not an option in real
life because developing on spaghetti C code is costly while elegant C++
code is cheap.
For example, my real (big) case is that I've got an iterator that moves
around an image or video thusly (or 4,5,6 dimensional equivalent):
it = it[1] + 6; // move 6 rows along the 2nd dimension
When I move in all dimensions at once I quite sensibly have a loop from
dimension zero to the top dimension adding
(amount-to-move*stride-of-dimension) to the pointer that the iterator is
implemented with for each one. The strides of the dimensions are
previously recorded in an array that the iterator holds a reference to -
except the first (because when you have a one dimensional image - a
signal - nothing needs to be stored) which is always 1. So everything
gets reduced down correctly by g++, except that this special case is not
moved out of the loop.
So this is not a programming problem that I need help with (I know how
to micro-optimise with C-style ugly-stuff). I'm just reporting a
significant missed optimisation opportunity that will help C++
developers even if not C or fortran developers. The only reason I didn't
reflect the C++ simple-vs-spaghetti code issue in my example is because
I wanted to keep it simple to target one of the problems very
specifically.
--
Tristan Wibberley
Any opinion expressed is mine (or else I'm playing devils advocate for
the sake of a good argument). My employer had nothing to do with this
communication.