This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: Inline DOT_PRODUCT revisited - to libgfortran/m4 gurus
- From: Paul Thomas <paulthomas2 at wanadoo dot fr>
- To: Steven Bosscher <stevenb dot gcc at gmail dot com>
- Cc: gcc-patches at gcc dot gnu dot org, "'fortran at gcc dot gnu dot org'" <fortran at gcc dot gnu dot org>, Dominique Dhumieres <dominiq at lps dot ens dot fr>
- Date: Sun, 26 Feb 2006 16:19:13 +0100
- Subject: Re: Inline DOT_PRODUCT revisited - to libgfortran/m4 gurus
- References: <43730120.7030808@wanadoo.fr> <200602252337.39067.steven@gcc.gnu.org>
Steven,
How are you? It's good to hear from you.
See the original message from November 2005 here:
http://gcc.gnu.org/ml/gcc-patches/2005-11/msg00686.html
On Thursday 10 November 2005 09:13, Paul Thomas wrote:
an inline version of DOT_PRODUCT, which is never slower that the
original library version and is very much faster for small vectors.
Amazingly your patch still applied almost cleanly, and it seems to do
what it is supposed to do. The test case you had for dot_product was
useless because it got optimized to an empty loop (which for whatever
reason was not removed itself). The new test case is attached below.
As is the updated patch, which is just a re-diff.
The empty loop was forced and the timing subtracted from the time to do
the loop with dot_product in it. I agree that the difference is small
but it was all part of a cunning plan.....
.... the correct outcome of which is attached and some results
below(obtained on the system that the regtesting is being carried out
on). It may be seen that gfortran-4.1(with library dot_product) begins
to win out for larger arrays but that gfortran-4.2(with inline) is a
clear winner for small arrays; ie. the inline has a smaller overhead but
seems to be slightly less well optimised for each product and sum. I
will submit with figures obtained on a quiet system.
I agree that the patch should be applied - I was a bit surprised that
folk lost interest at the time and that my call for a bit of help in
expunging all memory of dot_product from the library went unheeded. I
have done all the necessary on the library, seem to have regenerated
correctly and am in the midst of regtesting. I am not entirely sure
that I'll have time to write the changelog entries this evening but will
commit a definitive patch tomorrow morning.
Paul
[prt@localhost dot_product]# /svn-4.2/bin/gfortran -O3
-fdump-tree-original dottest.f90;./a.out
DOT_PRODUCT test
array length time(ns)
4 32.80
8 16.40
16 115.60
32 219.00
64 427.80
128 818.00
256 1715.70
512 2783.90
1024 6160.70
[prt@localhost dot_product]# ifort -O3 -fdump-tree-original
dottest.f90;./a.out
ifort: Command line warning: ignoring unknown option '-fdump-tree-original'
DOT_PRODUCT test
array length time(ns)
4 21.90
8 31.40
16 95.80
32 188.70
64 342.40
128 713.00
256 1285.90
512 2768.60
1024 5358.10
[prt@localhost dot_product]# export
LD_LIBRARY_PATH=/svn-4.1/lib:/opt/intel/fc/9.0/lib
[prt@localhost dot_product]# /svn-4.1/bin/gfortran -O3
-fdump-tree-original dottest.f90;./a.out
DOT_PRODUCT test
array length time(ns)
4 71.00
8 77.40
16 153.20
32 189.50
64 388.10
128 704.40
256 1414.40
512 2904.90
1024 5477.70
implicit real(8) (x-z)
integer(8), parameter :: n= 10000000
real(8) :: t1 = 0.0, t2 = 0.0, t3 = 0.0
real(8) :: dt1
real(4) :: xin(1024), yin(1024), z(n)
integer(8) :: i, j, m
xin = (/(dble(j),j=1,1024)/)
yin = xin
print '(a/a/)', "DOT_PRODUCT test", " array length time(ns)"
do l = 2,10
m = 2**l
t1 = tim ()
do i = 1, n
z(i) = dot_product (xin(1:m), yin(1:m))
end do
t2 = tim ()
do i = 1, n
z(i) = 0d0
end do
t3 = tim ()
dt1 = 1e9_8 * (2d0*t2 - t1 - t3) / (dble (n))
print '(5x,i4,12x,f8.2,12x,f8.2)', m, dt1
end do
contains
! Do not use secnds - various compilers do not support it.
real(8) function tim ()
character(12) :: ddate, dtime, zone
integer(4) :: elems (8)
call date_and_time (ddate, dtime, zone, elems)
tim = 60.0_8 * elems(6) + elems(7) + 0.001_8 * elems(8)
end function tim
end