This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: food for optimizer developers


On 08/10/2010 09:51 PM, Ralf W. Grosse-Kunstleve wrote:
I wrote a Fortran to C++ conversion program that I used to convert selected
LAPACK sources. Comparing runtimes with different compilers I get:

                          absolute  relative
ifort 11.1.072            1.790s    1.00
gfortran 4.4.4            2.470s    1.38
g++ 4.4.4                 2.922s    1.63

To get a full picture, it would be nice to see icc times too.
This is under Fedora 13, 64-bit, 12-core Opteron 2.2GHz

All files needed to easily reproduce the results are here:

http://cci.lbl.gov/lapack_fem/

See the README file or the example commands below.

Questions:

- Is there a way to make the g++ version as fast as ifort?


I think it is more important (and harder) to make gfortran closer to ifort.

I can not say about your fragment of LAPACK. But about 15 years ago I worked on manual LAPACK optimization for an Alpha processor. As I remember LAPACK is quite memory bound benchmark. The hottest spot was matrix multiplication which is used in many LAPACK places. The matrix multiplication in LAPACK is already moderately optimized by using temporary variable and that makes it 1.5 faster (if cache is not enough to hold matrices) than normal algorithm. But proper loop optimizations (tiling mostly) could improve it in more 4 times.

So I guess and hope graphite project finally will improve LAPACK by implementing tiling.

After solving memory bound problem, loop vectorization is another important optimization which could improve LAPACK. Unfortunately, GCC vectorizes less loops (it was about 2 time less when last time I checked) than ifort. I did not analyze what is the reason for this.

After solving vectorization problem, another important lower-level loop optimization is modulo scheduling (even if modern x86/x86_64 processor are out of order) because OOO processors can look only through a few branches. And as I remember, Intel compiler does make modulo scheduling frequently. GCC modulo-scheduling is quite constraint.

That is my thoughts but I might be wrong because I have no time to confirm my speculations. If you really want to help GCC developers, you could make comparison analysis of the code generated by ifort and gfortran and find what optimizations GCC misses. GCC has few resources and developers who could solve the problems are very busy. Intel optimization compiler team (besides researchers) is much bigger than whole GCC community. Taking this into account and that they have much more info about their processors, I don't think gfortran will generate a better or equal code for floating point benchmarks in near future.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]