[Bug tree-optimization/60997] New: -fopenmp conflicts with -floop-interchange

dominiq at lps dot ens.fr gcc-bugzilla@gcc.gnu.org
Tue Apr 29 09:54:00 GMT 2014


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60997

            Bug ID: 60997
           Summary: -fopenmp conflicts with -floop-interchange
           Product: gcc
           Version: 4.10.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: dominiq at lps dot ens.fr
                CC: grosser at gcc dot gnu.org, jakub at gcc dot gnu.org,
                    mircea.namolaru at inria dot fr

Created attachment 32703
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=32703&action=edit
Test for three variants of the matrix product

Compiling the attached code with -Ofast gives the following timing at run time

[Book15] Fortran/omp_tst% gfc -Ofast omp_tst_4_db_2.f90
[Book15] Fortran/omp_tst% time a.out
   94378416668672.000     
 Elapsed time =   3.7326660000000000      seconds
   94378416668672.000     
 Elapsed time =  0.57225000000000004      seconds
   94378416668672.000     
 Elapsed time =   6.9233669999999998      seconds
   94378416668672.000     
 Elapsed time =  0.47757300000000003      seconds
11.704u 0.030s 0:11.73 100.0%    0+0k 0+0io 2pf+0w

Adding -floop-interchange at compile time gives

[Book15] Fortran/omp_tst% gfc -Ofast omp_tst_4_db_2.f90 -floop-interchange
[Book15] Fortran/omp_tst% time a.out
   94378416668672.000     
 Elapsed time =  0.57357899999999995      seconds
   94378416668672.000     
 Elapsed time =  0.56863100000000000      seconds
   94378416668672.000     
 Elapsed time =  0.56851499999999999      seconds
   94378416668672.000     
 Elapsed time =  0.47033199999999997      seconds
2.195u 0.015s 0:02.21 99.5%    0+0k 0+0io 0pf+0w

i.e., the three variants of the loop are transformed to the fastest one. Adding
-fopenmp (and -fexternal-blas -framework vecLib) gives

[Book15] Fortran/omp_tst% gfc -Ofast omp_tst_4_db_2.f90 -floop-interchange
-fopenmp -fexternal-blas -framework vecLib
[Book15] Fortran/omp_tst% time a.out
   94378416668672.000     
 Elapsed time =   1.8143670000000001      seconds
   94378416668672.000     
 Elapsed time =  0.12886900000000001      seconds
   94378416668672.000     
 Elapsed time =   2.0025420000000000      seconds
   94378416668672.000     
 Elapsed time =   2.9204999999999998E-002 seconds
31.030u 0.064s 0:04.00 777.2%    0+0k 4+4io 2pf+0w

i.e., the loop interchange is prevented by the -fopenmp option. This is
probably due to the fact that the -fopenmp option is processed before the
graphite optimizations.

The last timings are for the MATMUL intrinsic as a reference (using the system
BLAS gives a 15 times speed-up).



More information about the Gcc-bugs mailing list