[Bug fortran/36840] New: Fortran complex array multiply missed optimization

Tue Jul 15 18:04:00 GMT 2008

Complex array multiply uses scalar instructions instead of using packed
instructions.

    subroutine complex_mult_test(Iy, Ix, nx)
    implicit none
      integer(kind=kind(1)), intent(in) :: nx
      complex(kind=kind((1.0d0,1.0d0))), dimension(nx), intent(inout) :: Iy
      complex(kind=kind((1.0d0,1.0d0))), dimension(nx), intent(in) :: Ix

      Iy = Iy * Ix
    end subroutine complex_mult_test

   Code produced by GCC compiler inside the loop body:
        movsd  0x8(%rsi),%xmm3
        movsd  (%rdi),%xmm5
        inc    %rax
        movsd  0x8(%rdi),%xmm4
        movsd  (%rsi),%xmm2
        add    $0x10,%rsi
        movapd %xmm3,%xmm1
        mulsd  %xmm5,%xmm3
        movapd %xmm2,%xmm0
        mulsd  %xmm4,%xmm1
        mulsd  %xmm5,%xmm0
        mulsd  %xmm4,%xmm2
        subsd  %xmm1,%xmm0
        addsd  %xmm3,%xmm2
        movsd  %xmm0,(%rdi)
        movsd  %xmm2,0x8(%rdi)

   A complex multiply (x0,y0)*(x1,y1)=(x0*x1-y0*y1,x0*y1+x1*y0).
   This could implemented using packed instructions.
   Following instructions will be useful.
   i.  movhpd, movddup, shufpd to arrange data properly.
   ii. mulpd to do two multiply at once
   iii. addsubpd to combine the addition and subtraction.

   Hand coding we get 9 instructions
        movupd  (%rdi),%xmm2         //xmm2: x0,y0
        movddup (%rsi),%xmm0         //xmm0: x1,x1
        mulpd %xmm2,%xmm0            //xmm0: x1*x0,x1*y0
        movddup 0x8(%rsi),%xmm1      //xmm1: y1,y1
        shufpd $0x1,%xmm2,%xmm2      //xmm2: y0,x0
        mulpd %xmm2,%xmm1            //xmm1: y0*y1,x0*y1
        addsubpd %xmm0,%xmm1         //xmm1: x0*x1-y0*y1,x0*y1+x1*y0
        movlpd %xmm1,(%rdi)
        movhpd %xmm1,0x8(%rdi)

Other relevant information:
1. Compile flags: -O3 -ffast-math -m64 -march=amdfam10

2. gfortran version: gfortran -v
Using built-in specs.
Target: x86_64-unknown-linux-gnu
Configured with: /tmp/src/gcc-4.3.0/configure --prefix=/opt/amd/gcc-4.3.0
--enable-languages=c,c++,fortran --enable-stage1-checking
--with-as=/opt/amd/gcc-4.3.0/bin/as --with-ld=/opt/amd/gcc-4.3.0/bin/ld
--with-mpfr=/tmp/install/mpfr-2.3.0 --with-gmp=/tmp/install/gmp-4.2.2
Thread model: posix
gcc version 4.3.1 20080312 (prerelease) (GCC)

3. model name: AMD Phenom(tm) 8650 Triple-Core Processor
4. flags     : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt rdtscp lm
3dnowext 3dnow constant_tsc pni cx16 popcnt lahf_lm cmp_legacy svm extapic
cr8_legacy altmovcr8 abm sse4a misalignsse 3dnowprefetch osvw

-- 
           Summary: Fortran complex array multiply missed optimization
           Product: gcc
           Version: 4.3.1
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: fortran
        AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: rajiv dot adhikary at amd dot com

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36840