This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [Patch, fortran] Inline DOT_PRODUCT


On Tue, Feb 28, 2006 at 12:09:09AM +0100, Paul Thomas wrote:
> Jakub,
> 
> >Is GCC 4.2 libgfortran.so expected to be ABI incompatible with GCC 4.1
> >compiled fortran code?
> >If not, then you shouldn't be removing any exported functions from
> >libgfortran.
> > 
> >
> That's a good question - I believe the compatibility has already been 
> compromised but I am not sure. In many ways, I would be happier not 
> removing the existing library functions; at least for the time being.

IMHO 4.1 is still an experimental relase, so just remove dot_product
from trunk/libgfortran. Not from 4.1.1 though.

Perhaps for 4.2 we're ready for a "real" release following the usual
regression fixes only etc. stuff, and at that point we should be more
careful about API/ABI breaking. Perhaps at that point the time is ripe
to introduce symbol versioning too for 4.2.

As for your patch itself, it's ok for trunk and 4.1 once it opens,
with a small nitpick fix: Remove the 

  f->value.function.name =
    gfc_get_string (PREFIX("dot_product_%c%d"), gfc_type_letter (f->ts.type),
		    f->ts.kind);

thing from gfc_resolve_dot_product, since it's not needed anymore and
lest somebody gets confused.

I did some benchmarking as well, and it turns out that with the
correct compile options, performance for large arrays is not that much
worse than ddot from GOTO BLAS.

"default" options:

phi:~/src/gfortran/test/pr26025-blas-dot-matmul% make gdef
gfortran -O2 -o gdef dot-bench.f90 -lgoto -lpthread
phi:~/src/gfortran/test/pr26025-blas-dot-matmul% ./gdef
DOT_PRODUCT test, results in Gflop/s
  array length      BLAS          DOT_PROD       inline

       4            0.15            0.57            0.80
       8            0.42            0.94            0.94
      16            0.49            0.62            0.62
      32            0.78            0.73            0.73
      64            1.21            0.80            0.80
     128            1.64            0.84            0.84
     256            1.91            0.86            0.86
     512            2.18            0.87            0.87
    1024            2.35            0.87            0.87


x87 vs. sse2 doesn't really make any difference on K8:

phi:~/src/gfortran/test/pr26025-blas-dot-matmul% gfortran -O3 -ffast-math -funroll-loops -march=k8 -std=f95 -Wall dot-bench.f90 -lgoto -lpthread
phi:~/src/gfortran/test/pr26025-blas-dot-matmul% ./a.out
DOT_PRODUCT test, results in Gflop/s
  array length      BLAS          DOT_PROD       inline

       4            0.18            0.90            0.83
       8            0.42            1.03            1.02
      16            0.49            1.13            1.12
      32            0.77            1.08            1.07
      64            1.20            0.97            0.96
     128            1.63            0.84            0.83
     256            1.91            0.86            0.85
     512            2.18            0.87            0.87
    1024            2.35            0.87            0.87

phi:~/src/gfortran/test/pr26025-blas-dot-matmul% make gopt
gfortran -O3 -ffast-math -funroll-loops -mfpmath=sse -msse2 -march=k8 -std=f95 -Wall -o gopt dot-bench.f90 -lgoto -lpthread
phi:~/src/gfortran/test/pr26025-blas-dot-matmul% ./gopt
DOT_PRODUCT test, results in Gflop/s
  array length      BLAS          DOT_PROD       inline

       4            0.19            0.81            0.80
       8            0.41            1.01            0.97
      16            0.49            1.11            1.12
      32            0.77            1.07            1.04
      64            1.20            0.96            0.95
     128            1.63            0.84            0.83
     256            1.91            0.86            0.86
     512            2.19            0.87            0.87
    1024            2.34            0.88            0.87


And finally, with vectorization performance for small arrays is
reduced while for bigger ones it goes much faster:

phi:~/src/gfortran/test/pr26025-blas-dot-matmul% make gvect
gfortran -O3 -ffast-math -funroll-loops -mfpmath=sse -msse2 -march=k8 -ftree-vectorize -std=f95 -Wall -o gvect dot-bench.f90 -lgoto -lpthread
phi:~/src/gfortran/test/pr26025-blas-dot-matmul% ./gvect
DOT_PRODUCT test, results in Gflop/s
  array length      BLAS          DOT_PROD       inline

       4            0.14            0.41            0.49
       8            0.40            0.89            0.86
      16            0.86            1.25            1.20
      32            1.30            1.46            1.41
      64            1.72            1.59            1.56
     128            2.05            1.66            1.65
     256            2.18            1.63            1.64
     512            2.34            1.69            1.69
    1024            2.43            1.72            1.72


-- 
Janne Blomqvist

Attachment: pgp00000.pgp
Description: PGP signature


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]