This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

optimizations in g77 compiler

From: Andrew Young <aty at sciences dot sdsu dot edu>
To: bug-gcc at gnu dot org
Date: Fri, 8 Feb 2002 13:46:49 -0800 (PST)
Subject: optimizations in g77 compiler

I've recently done some experiments with different optimization options
to the 2.95.2 version of g77 running under Debian Linux (the stable,
"potato" version, with kernel version 2.2.19).  As there was a request
in the documentation distributed with the compiler to report the
effects of various flags, who's the person to report the results to?

I've run this on both a 1400 MHz Athlon system, and an older version on
an 800 MHz Pentium III system run Red Hat Linux.  The results are quite
different for the two systems.

On the Athlon box, I find that there are very considerable differences
in roundoff errors depending on the flags used.  I *assume* that the
options that tend to favor keeping the intermediate calculations in
registers would be more accurate, so I have chosen to use

	-O -ffast-math -fexpensive-optimizations -fregmove -fforce-mem

even though the latter two options make the program run slower, not
faster.  (Of course, I also use -malign-double for everything.) I also
find that -funroll-loops affects precision; but I am not sure whether
it makes things better or worse (it does speed things up a bit).  I
have also added -fomit-frame-pointer, as it is supposed to free up
register space, and clearly affects precision.

I was surprised to find that the P III box seems almost unaffected by
optimization options.

I should point out that by habit I do a lot of cse optimization myself
as I write code, so the compiler options don't buy big advantages on my
programs.  Still, the best speed achieved is about 10% faster than
un-optimized code on the Athlon.  Also, my work is all done in
double-precision floating-point, so the features of various modern
processors that speed up single-precision f.p. calculations are of no
help to me.

By the way, it would be useful if the documentation actually enumerated
the optimizations that are invoked by -O1 and -O2.  There are a few
remarks to the effect that some particular flags are automatically part
of one or the other of these, but no complete list that I could find.

After reading the documentation in the debian "g77-doc" package, I
found that a lot of things left unexplained there are in fact covered
in the "gcc-doc" package.  As these are *separate* packages under
debian, it would help if there were explicit cross-references in the
g77 info files to the appropriate parts of the gcc info files.

Also, the documentation is very vague about which architectures do and
which do not allow certain flags.  If one is not a hardware expert, one
has to try some of them and then discover that the compiler barfs.  The
documentation should be friendlier to the average user who just wants
to compile programs.

I'd also like to find out how to determine the default compiler flags
used.  I tried g77 -v but did not find the output very informative.
Are the defaults compiled in, or is there a configuration file
somewhere on the system?  The "file" command identifies the executables
as "386" even when they are compiled with the -march=i686 (or some
other architecture) flag set.  Any why are there separate -march and
-mcpu flags if the former implies the latter?  In what situation would
one want to set them differently?


As the possible number of combinations of options is very large, and
may in some cases depend on the order in which they are stated (such as
the various -O levels), choosing an optimal set by brute force is out
of the question; it bears considerable resemblance to the infamous
"traveling salesman problem", for which no exact solution is
available.  However, there are heuristic methods of obtaining an
approximate optimization, and I have applied something of the sort
here.  If you are interested in playing around with this sort of thing,
I can supply my shell script that looks for nearly optimal solutions.
I can also supply my source code and some sample data if you'd like to
fiddle with them on other machines.

An interesting sidelight is that the execution times are remarkably
reproducible on the P III system, but consistently vary by about 1%
from one run to the next on the Athlon.  I assume this is a side effect
of the Athlon's OOO execution of instructions, so that (depending on
what else is going on in the background) slightly different execution
sequences occur from run to run.  I have not checked to see whether
this also changes the roundoff errors from one run to the next.

		-- A. T. Young

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]