This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: IRA for GCC 4.4


>  With the compiler from the ira branch on x86_64-linux, here are the
>  timings reported by "gfortran -c -time -save-temps" with and without
>  IRA (two timings provided for each set of option, to check
>  reproducibility)

OK, I come back with fresh numbers from the current IRA branch, rev.
135035, which I believe includes the fix for -O0 compilation time
(thanks, by the way!). I'm still compiling the same huge testcase
(from CP2K), which is a good example of relatively heavy use of
Fortran 95 features. Memory used during compilation was up to 3 GB
when optimization is turned on (this is a 8GB system, and I checked
that disk swap didn't come into play). This is on x86_64-linux.


At -O0: 3% decrease wrt current, no further effect for -fira-algorithm=CB
At -O0 -g: 3% decrease wrt current, slightly smaller (-1.5%) with
-fira-algorithm=CB
At -O1: 7% increase wrt current; -fira-algorithm=CB turns this into
only a 2% increase
At -O2: 5% increase for -fira; only 1.5% increase when
-fira-algorithm=CB is used
At -O2 -ffast-math, -O3 and -O3 -ffast-math: roughly same as -O2, 3%
to 5% increase for -fira, down to a 1%-2% increase when
-fira-algorithm=CB is used.
With -funroll-loops, -ftree-vectorize or both: again, roughly the same.

I've also tried gfortran's -fbounds-check option, which increases a
lot the amount of code emitted by the front-end for a given source,
and haven't seen any significant different from the results reported
above (in particular, no performance degradation).

I've also played with -m32 at various optimization levels, and the
results are again in the same range as above for -m64.


*Conclusions*

All in all, the -O0 performance is now on par with the old allocator,
and at higher optimisation levels, we see a 3% to 5% regression. The
CB algorithm is faster, with a regression of only 1.5% to 2%.

I'll now turn to benchmarking of generated code (I'll run the
Polyhedron benchmark, which is widely known and referred to in the
Fortran community). I don't have the guts to do a systematic check of
memory consumption of the compiler, but I think it'd be nice if
someone could do that.

FX


PS: I attach the file containing all timings. For each set of option,
I ran the compiler twice; when timings differ significantly, that's
because of other users using the machine (which is a rather underused
dual-core biprocessor, with an average load during my tests of 1.09),
and I thus take the smallest number for calculations.

-- 
FX Coudert
http://www.homepages.ucl.ac.uk/~uccafco/

Attachment: timing.txt
Description: Text document


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]