Posted in: http://gcc.gnu.org/ml/fortran/2005-10/msg00443.html I have been investigating the relatively poor performance of gfortran for some of the Polyhedron Benchmark Tests (www.polyhedron.com). I already discussed a couple of days ago how test_fpu.f90 exposed some weakness in the dependency analysis. I am developing a patch will do somewhat more than the "draft patch" discussed there. As posted on the Wiki (http://gcc.gnu.org/wiki/GFortranResults), two real offenders are induct.f90 and kepler.f90 (I have confirmed this in an ifc/gfc comparison that I will post tonight or tomorrow.). As mentioned there, profiling indicates that the intrinsic dot_product is taking >50% of the time. Subsequently I have confirmed this by the simple expedient of adding a repeat copy of the section of code that calls dot_product. The difference is of the same order as the difference between gfc and DF6.0 execution times. It turns out that gfc is slow because it is making temporary array descriptors for the actual arguments of dot_product. Since these are only of length 13, the temporary making slugs down gfc a lot. This can be confirmed as follows: real, dimension(12) :: x, y real :: z do i = 1, 10000000 z = dot_product(x,y) end do end takes 0.15s under DF6.0 and 45.5s for gfc! When rewritten as real, dimension(:), pointer :: x, y real :: z allocate (x(12), y(12)) do i = 1, 10000000 z = dot_product (x,y) end do end the time increases slightly for DF6.0, to 0.27s. gfc now comes in with a creditable 0.39s. The code within the loop for both versions appears below. Apparently the allocation of the descriptor structures and the assignments to them cause the enormous slow-down. I think that the lesson is that constant array references need to be taken out of loops or their use should automatically generate a pointer. I rather like the latter because I suspect it to be more easily implementable. Paul Thomas Non_pointer version if (i <= 10000000) { while (1) { { logical4 D.573; { struct array1_real4 parm.1; struct array1_real4 parm.0; parm.0.dtype = 281; parm.0.dim[0].lbound = 1; parm.0.dim[0].ubound = 12; parm.0.dim[0].stride = 1; parm.0.data = (void *) (real4[0:] *) &x[0]; parm.0.offset = 0; parm.1.dtype = 281; parm.1.dim[0].lbound = 1; parm.1.dim[0].ubound = 12; parm.1.dim[0].stride = 1; parm.1.data = (void *) (real4[0:] *) &y[0]; parm.1.offset = 0; z = _gfortran_dot_product_r4 (&parm.0, &parm.1); } L.1:; D.573 = i == 10000000; i = i + 1; if (D.573) goto L.2; else (void) 0; } } } else { (void) 0; } L.2:; and for the pointer version if (i <= 10000000) { while (1) { { logical4 D.573; z = _gfortran_dot_product_r4 (&x, &y); L.1:; D.573 = i == 10000000; i = i + 1; if (D.573) goto L.2; else (void) 0; } } } else { (void) 0; } L.2:;
Confirmed (note IE sucks).
I wonder if we could get the aliasing mechanism to say that this array descriptor is not changed and move the stores out of the loop.
Subject: RE: Temporary constant array descriptors being declared at wrong binding level. Andrew, It turns out that the real overhead is the function call. I posted a patch to inline DOT_PRODUCT, which performs better than the library version. It is on my list of things to do that I resubmit this patch - it needs a changeover from inline to library at a vector length ~16-32. I need to study this as a function of platform and arrya reference types. Best regards Paul > -----Message d'origine----- > De : pinskia at gcc dot gnu dot org [mailto:gcc-bugzilla@gcc.gnu.org] > Envoyé : jeudi 19 janvier 2006 17:12 > À : THOMAS Paul Richard 169137 > Objet : [Bug fortran/24520] Temporary constant array descriptors being > declared at wrong binding level. > > > > > ------- Comment #2 from pinskia at gcc dot gnu dot org > 2006-01-19 16:11 ------- > I wonder if we could get the aliasing mechanism to say that this array > descriptor is not changed and move the stores out of the loop. > > > -- > > > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=24520 > > ------- You are receiving this mail because: ------- > You reported the bug, or are watching the reporter. >
You can expose the bug now with: real, dimension(12) :: x, y real :: z do i = 1, 10000000 z = g(x,y) end do print *, x contains function g(x, y) real, dimension(:) :: x, y real g x = x + y end function end
Yes, it is not quite as spectacular as before but present nonetheless. By comparing pointer and non-pointer cases, I measure an overhead of 12 +/- 7 ns on a 2.4Ghz PIV. I have no idea why the error is so large but it bobs around, according to the size of the array; eg. for array size N = 1, it is 19ns, for N = 16 is 16ns, whilst n = 4 is only hit for 6ns. In preparing the array TRANSFER intrinsic, I have learned more about parameter passing than I like to think about. *sigh* I think it might be an easy matter to promote the case of a constant descriptor up to the procedure scope. I t has been pushed onto the TODO stack. Paul
All the issues with dot product have been sorted, as far as I know. Paul