This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |
Other format: | [Raw text] |
we are working tuning some applications/libraries like ATLAS, METIS, BLAS, FFTW (linear algebra libraries) for the new IBM PowerPC 970 processor included in JS20 Bladecenter machines.
As you may know PPC970 is a 32-/64-bit processor based in Power4 but with VMX support (aka. AltiVec).
We have got a big degradation of performance when we enable '-freduce-all-givs' flag in GCC to compile ATLAS libraries. An example of this is that performance drops from 6.5 GF for a SGEMM operation (single precision matrix multiplication) of (800 x 800) size to 4.5 GF when this flag enabled in GCC. Maybe this degradation is produced because SGEMM kernel is written in PPC assembler and using AltiVec calls and not is C coded.
I have read (in man gcc) that you are very interested to know when application runs slower when this flag is enable. So here I am ;).
I attach a tar.gz file where you will find all source code and shell scripts needed to reproduce the degradation of SGEMM routine when this flag is enabled or disabled.
In this package you will only find the isolated SGEMM source code routine (for single precision) extracted from ATLAS library v3.6.0 (http://math-atlas.sourceforge.net/). Also, you will find the required Makefiles to create the library and tester enabling/disabling '-freduce-all-givs'. Finally there is also a shell script to run a benchmark with SGEMM routine (test-ATLAS) for the two versions of binaries and with different sizes of matrix.
Elapsed time for SGEMM operation is obtained in test-ATLAS using Time Base Register of PPC processors. If you want to know how many nanoseconds are each TB tick you have test-TB for this purpose (because in each PPC processor this value changes). So if we want to know the GFlops we use the TB ticks got with test-ATLAS and convert to elapsed time (in seconds) with the value of test-TB. And later we divide 2N^3 (FLOP in a matrix x matrix operations of size N) between total seconds required to perform SGEMM operation.
Also take in mind this MACROS in test-ATLAS.c code: #define MAXFDIM 9000 /* Max memory 940 Mbytes = 3 * MAXFDIM*MAXFDIM*4 */ --> Maximum size of Matrixes for the available memory of the system #define NREP 10 /*100*/ --> Number of repetitions to avoid preloading cache time of CPU.
To finish I must say that obviously this test must be performed in a PPC970 processor based machine like JS20 Bladecenter or Power G5 with GCC v3.3 or higher with AltiVec support.
Sorry if you know about this behavior of ATLAS libraries when this flag is enabled, and this information is not useful for GCC team now ;).
Regards, Bye! -- My current project: Tuning for IBM PowerPC 970-based Blades (http://www.ciri.upc.es/cela_pblade) ---------------------------------------------------------------------- Copyright protects software. Patents protect software monopolies. -- NO to software patents (http://swpat.ffii.org) ---------------------------------------------------------------------- o//o Raúl de la Cruz Martínez E-mail: delacruz at ac upc es o//o CEPBA-IBM Research Institute (CIRI) www.cepba.upc.es/ciri o//o http://people.ac.upc.es/delacruz Tel: +34 93-401 16 49 o//o CIRI C/Jordi Girona 1-3, Despatx C6-S103. BCN, Catalonia, SPAIN ----------------------------------------------------------------------
Attachment:
atlas-degradation.tar.gz
Description: GNU Zip compressed data
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |