[patch] enabling vectorization by default at -O3

Jack Howarth howarth@bromo.msbb.uc.edu
Thu Sep 6 13:16:00 GMT 2007


Tobias,
   I suspect the other gfortran developers will violently oppose
defaulting -ffast-math on due to the deviations from accuracy it
causes. Also, I believe there are plans to replace -ffast-math
with a set of each of its component optimizations so that they
can be selected individually. I can't find the message that was
mentioned in at the moment but someone claimed to be working on
it.
               Jack

On Thu, Sep 06, 2007 at 11:16:30AM +0200, Tobias Burnus wrote:
> Dorit Nuzman wrote:
> > I'd at least try -ffast-math. -funroll-loops is usually important on
> > powerpc (along with -fvariable-expansion-in-unroller that helps if there
> > are reductions). There are 2 benchmarks that degrade by 2-3% - maybe we
> > could try those with -fvect-cost-model (it's not on by default yet).
> >   
> Result - now on Intel Core2 Duo CPU T7300 @ 2.00GHz.
> 
> (I will also repost the results for the AMD, but currently it is busy
> with "nightly" builds/runs.)
> 
> In general, -ftree-vectorize makes the program faster, but with
> -ffast-math for induct, the program becomes much slower: 66.47s ->
> 88.56s; here -fvect-cost-model helps to re-gain a lot: 63.92s.
> 
> Is there any plan to enable -fvect-cost-model by default? I saw no
> program which was slower with this option (ignoring <1% changes) and
> some (see above) become much faster with that option.
> 
> 
> Compiler options:
> 
> (NV)     gfortran -march=core2 -O3
> (V)      gfortran -march=core2 -O3 -ftree-vectorize
> (NV.F)   gfortran -march=core2 -O3 -ffast-math
> (V.F)    gfortran -march=core2 -O3 -ffast-math -ftree-vectorize
> (V.CM)   gfortran -march=core2 -O3 -ftree-vectorize -fvect-cost-model
> (V.CM.F) gfortran -march=core2 -O3 -ftree-vectorize -fvect-cost-model -ffast-math
> 
> 
> Result (single run; in principle the computer was otherwise idle)
> 
> (The +/- in the last two columns are relative to the vectorized version without the cost model.)
> 
> 
>            NV     V       NV.F    V.F      V.CM   V.CM.F
> --------------------------------------------------------
> ac        15.19  15.09+   12.72  12.74-   15.02+  12.71+
> aermod    36.50  36.72-   35.62  35.97-   36.25+  35.34+
> air       12.78  12.56+    9.65   9.48+   12.34+   9.49=
> capacita  50.43  49.48+   48.88  50.59-   48.43+  49.59+
> channel    3.20   2.79++   3.13   2.80++   2.72+   2.71+
> doduc     56.23  55.87+   48.84  49.34-   55.48+  48.85+
> fatigue   13.94  14.10-   12.37  12.77-   13.62+  12.69+
> gas_dyn   24.98  10.45++  19.64   7.96++  10.40+   7.93+
> induct    64.52  63.77+   66.47* 88.56--  63.50+  63.92++
> linpk     22.39  22.79-   22.10  22.31-   22.12+  22.10+
> mdbx      16.36  16.22+   18.86* 19.10-   16.16+  18.95+
> nf        27.71  27.67+   27.55  27.27+   27.12+  26.84+
> protein   61.86  61.99-   60.57  59.96+   62.40-  59.76+
> rnflow    35.88  35.92-   35.49  35.58+   35.82+  35.46+
> test_fpu  15.37  14.10+   15.80  13.40+   14.18-  13.23+
> tfft       2.89   2.86     2.98   3.00-    2.82+   2.90+
> --------------------------------------------------------
> Geo.Mean  21.02  19.57+   19.92  18.93    19.34+  18.34+
> 
> * Hmm, why is --ffast-math slower? And with vectorization that much slower? I recheck induct (V.F, NV.F) and I could reproduce the timings.
> 
> 
> Tobias



More information about the Gcc-patches mailing list