[Bug tree-optimization/53346] New: [4.6/4.7/4.8 Regression] Bad vectorization in the proc cptrf2 of rnflow.f90

dominiq at lps dot ens.fr gcc-bugzilla@gcc.gnu.org
Mon May 14 15:44:00 GMT 2012


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53346

             Bug #: 53346
           Summary: [4.6/4.7/4.8 Regression] Bad vectorization in the proc
                    cptrf2 of rnflow.f90
    Classification: Unclassified
           Product: gcc
           Version: 4.8.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
        AssignedTo: unassigned@gcc.gnu.org
        ReportedBy: dominiq@lps.ens.fr
                CC: rguenth@gcc.gnu.org, ubizjak@gmail.com


At revision 187457 (i.e., with pr53340 fixed) on x86_64-apple-darwin10, after

[macbook] test/dbg_rnflow% gfc -c -O3 -ffast-math -funroll-loops timctr.f90
cmpcpt.f90 cptrf2.f90 dger.f90 dgetri.f90 dswap.f90 dtrsm.f90 evlrnf.f90
idamax.f90 main.f90 mattrs.f90 cmpmat.f90 dgemm.f90 dgetf2.f90 dlaswp.f90
dtrmm.f90 dtrti2.f90 extpic.f90 ilaenv.f90 matcnt.f90 reaseq.f90 xerbla.f90
cptrf1.f90 dgemv.f90 dgetrf.f90 dscal.f90 dtrmv.f90 dtrtri.f90 gentrs.f90
lsame.f90 matsim.f90
[macbook] test/dbg_rnflow% makeo ; time a.out > /dev/null                      
                                                                      23.872u
0.349s 0:24.22 99.9%    0+0k 0+0io 0pf+0w[macbook] test/dbg_rnflow%
/opt/gcc/gcc4.8p-187339/bin/gfortran -c -O3 -ffast-math -funroll-loops
evlrnf.f90
[macbook] test/dbg_rnflow% makeo ; time a.out > /dev/null
22.259u 0.346s 0:22.61 99.9%    0+0k 0+0io 0pf+0w
[macbook] test/dbg_rnflow% /opt/gcc/gcc4.8p-187291/bin/gfortran -c -O3
-ffast-math -funroll-loops idamax.f90
[macbook] test/dbg_rnflow% makeo ; time a.out > /dev/null
22.252u 0.345s 0:22.60 99.9%    0+0k 0+0io 0pf+0w
[macbook] test/dbg_rnflow% /opt/gcc/gcc4.8p-187102/bin/gfortran -c -O3
-ffast-math -funroll-loops idamax.f90
[macbook] test/dbg_rnflow% makeo ; time a.out > /dev/null
22.121u 0.346s 0:22.47 99.9%    0+0k 0+0io 0pf+0w

(i.e., working around prpr53342 and a regression for idamax.f90, see 
below), the compilation of cptrf2.f90 (source attached to pr53340) with the
following flags yiels

optimization level      4.4.6   4.5.3   4.6.3   4.7.0   r187457

-O2                      27.8    28.2    28.2    21.8    21.8
-O2 -ftree-vectorize     27.8    28.2    28.2    27.9    27.9
-O3                      22.0    21.3    25.1    25.3    25.3
-O3 -fno-tree-vectorize  22.1    21.3    21.4    21.4    21.4

Note that 4.5/4.6/4.7 vectorize two loops (lines 21 and 29), while 4.8
vectorizes only the loop at line 21 (29: not vectorized: iteration count too
small.).

Looking at my archives I have found that a first regression appeared 
between revisions 162456 and 164728

optimization level      4.6-162456 4.6p-164728

-O2                             28.2    28.3
-O2 -ftree-vectorize            28.1    28.3
-O3                             21.4    29.4
-O3 -fno-tree-vectorize         21.3    21.4
-O3 -ffast-math                 21.4    22.3
-O3 -ffast-math -funroll-loops  21.9    22.4

For the record, as said above the compilation of idamax regressed between 
revisions 187102 and 187291

[macbook] test/dbg_rnflow% /opt/gcc/gcc4.8p-187291/bin/gfortran -c -O3
-ffast-math -funroll-loops idamax.f90
[macbook] test/dbg_rnflow% makeo ; time a.out > /dev/null
22.252u 0.345s 0:22.60 99.9%    0+0k 0+0io 0pf+0w
[macbook] test/dbg_rnflow% /opt/gcc/gcc4.8p-187102/bin/gfortran -c -O3
-ffast-math -funroll-loops idamax.f90
[macbook] test/dbg_rnflow% makeo ; time a.out > /dev/null
22.121u 0.346s 0:22.47 99.9%    0+0k 0+0io 0pf+0w

Although the regression is slightly above the noise margin at the level of 
rnflow.f90, it could be worth to investigate it because:
(1) it is a LAPACK routine (may be slightly modified),
(2) there equivalent intrinsics in F90,
(3) the slowdown may be quite significant at the level of the proc itself.



More information about the Gcc-bugs mailing list