[Bug tree-optimization/25621] Missed optimization when unrolling the loop (splitting up the sum) (only with -ffast-math)

Fri Mar 29 10:07:00 GMT 2013

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=25621

Joost VandeVondele <Joost.VandeVondele at mat dot ethz.ch> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |Joost.VandeVondele at mat
                   |                            |dot ethz.ch
         Depends on|                            |53947

--- Comment #12 from Joost VandeVondele <Joost.VandeVondele at mat dot ethz.ch> 2013-03-29 10:07:06 UTC ---
This has become much more a vectorizer problem. Basically ifort generates code
that is twice as fast for routine S31 of the initial comment. Given that this
is a common dot product, it might be good to see why that happens. Both
compilers fail to notice that S32 is basically the same code hand-unrolled.

Tested with the code in comment #6 (without inlining)

> gfortran -march=native -ffast-math -O3 -fno-inline PR25621.f90
> ./a.out
 default loop  0.56491500000000006     
 hand optimized loop  0.74488600000000016     
> ifort -xHost -O3 -fno-inline PR25621.f90
> ./a.out
 default loop  0.377943000000000     
 hand optimized loop  0.579911000000000