For calculation involving multidimensional array multiplication followed by a sum along first dimension, GCC performs the steps separately - the element-by-element array multiplication is completed first. Function gfortran_sum_r8 is called next to calculate the sum. A better process would be to keep an accumulator updated as the element-by-element array multiplication is carried out. This has following benefits: i. gfortran_sum_r8 call is eliminated. ii. there is no longer a need for temporary array to hold array multiplication result. subroutine sum_test(Rx,Ry,Rz,nx,ny) implicit none integer(kind=kind(1)), intent(in) :: nx,ny real(kind=kind(1.0d0)), dimension(nx,ny), intent(in) :: Rx,Ry real(kind=kind(1.0d0)), dimension(ny), intent(out) :: Rz Rz = sum(Rx * Ry, 1) end subroutine sum_test Other relevant information: 1. Compile flags: -O3 -ffast-math -m64 -march=amdfam10 2. gfortran version: gfortran -v Using built-in specs. Target: x86_64-unknown-linux-gnu Configured with: /tmp/src/gcc-4.3.0/configure --prefix=/opt/amd/gcc-4.3.0 --enable-languages=c,c++,fortran --enable-stage1-checking --with-as=/opt/amd/gcc-4.3.0/bin/as --with-ld=/opt/amd/gcc-4.3.0/bin/ld --with-mpfr=/tmp/install/mpfr-2.3.0 --with-gmp=/tmp/install/gmp-4.2.2 Thread model: posix gcc version 4.3.1 20080312 (prerelease) (GCC) 3. model name: AMD Phenom(tm) 8650 Triple-Core Processor 4. flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt rdtscp lm 3dnowext 3dnow constant_tsc pni cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy altmovcr8 abm sse4a misalignsse 3dnowprefetch osvw
Confirmed. The middle-end array work will address this in a generic way.
Is this something for the FE to do?
This is not a job for the FE.
> This is not a job for the FE. How could the middle-end do the job if __gfortran_sum_r8 is not inlined/scalarized (see pr43829)?
OK, I thought you meant that this would be something for a separate Fortran front end optimization pass. Expanding SUM differently is a job for the FE, yes.
I believe just gfc_conv_intrinsic_arith needs to be adjusted so that it also handles se->ss case, at least for optimize && !optimize_size. Currently it just handles the case where those intrinsics return a scalar.
(In reply to comment #4) > (see pr43829) > I think it is a duplicate of (or close to) pr43829. Marked as depending on it so that I don't forget it.
So, are you goint to take care of this?
(In reply to comment #8) > So, are you goint to take care of this? > Sure.
(In reply to comment #7) > (In reply to comment #4) > > (see pr43829) > > > > I think it is a duplicate of (or close to) pr43829. > Marked as depending on it so that I don't forget it. This is fixed for the 4.7.0 version. Closing. *** This bug has been marked as a duplicate of bug 43829 ***