[Bug tree-optimization/60997] New: -fopenmp conflicts with -floop-interchange
dominiq at lps dot ens.fr
gcc-bugzilla@gcc.gnu.org
Tue Apr 29 09:54:00 GMT 2014
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60997
Bug ID: 60997
Summary: -fopenmp conflicts with -floop-interchange
Product: gcc
Version: 4.10.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: dominiq at lps dot ens.fr
CC: grosser at gcc dot gnu.org, jakub at gcc dot gnu.org,
mircea.namolaru at inria dot fr
Created attachment 32703
--> http://gcc.gnu.org/bugzilla/attachment.cgi?id=32703&action=edit
Test for three variants of the matrix product
Compiling the attached code with -Ofast gives the following timing at run time
[Book15] Fortran/omp_tst% gfc -Ofast omp_tst_4_db_2.f90
[Book15] Fortran/omp_tst% time a.out
94378416668672.000
Elapsed time = 3.7326660000000000 seconds
94378416668672.000
Elapsed time = 0.57225000000000004 seconds
94378416668672.000
Elapsed time = 6.9233669999999998 seconds
94378416668672.000
Elapsed time = 0.47757300000000003 seconds
11.704u 0.030s 0:11.73 100.0% 0+0k 0+0io 2pf+0w
Adding -floop-interchange at compile time gives
[Book15] Fortran/omp_tst% gfc -Ofast omp_tst_4_db_2.f90 -floop-interchange
[Book15] Fortran/omp_tst% time a.out
94378416668672.000
Elapsed time = 0.57357899999999995 seconds
94378416668672.000
Elapsed time = 0.56863100000000000 seconds
94378416668672.000
Elapsed time = 0.56851499999999999 seconds
94378416668672.000
Elapsed time = 0.47033199999999997 seconds
2.195u 0.015s 0:02.21 99.5% 0+0k 0+0io 0pf+0w
i.e., the three variants of the loop are transformed to the fastest one. Adding
-fopenmp (and -fexternal-blas -framework vecLib) gives
[Book15] Fortran/omp_tst% gfc -Ofast omp_tst_4_db_2.f90 -floop-interchange
-fopenmp -fexternal-blas -framework vecLib
[Book15] Fortran/omp_tst% time a.out
94378416668672.000
Elapsed time = 1.8143670000000001 seconds
94378416668672.000
Elapsed time = 0.12886900000000001 seconds
94378416668672.000
Elapsed time = 2.0025420000000000 seconds
94378416668672.000
Elapsed time = 2.9204999999999998E-002 seconds
31.030u 0.064s 0:04.00 777.2% 0+0k 4+4io 2pf+0w
i.e., the loop interchange is prevented by the -fopenmp option. This is
probably due to the fact that the -fopenmp option is processed before the
graphite optimizations.
The last timings are for the MATMUL intrinsic as a reference (using the system
BLAS gives a 15 times speed-up).
More information about the Gcc-bugs
mailing list