[Bug fortran/36840] New: Fortran complex array multiply missed optimization
rajiv dot adhikary at amd dot com
gcc-bugzilla@gcc.gnu.org
Tue Jul 15 18:04:00 GMT 2008
Complex array multiply uses scalar instructions instead of using packed
instructions.
subroutine complex_mult_test(Iy, Ix, nx)
implicit none
integer(kind=kind(1)), intent(in) :: nx
complex(kind=kind((1.0d0,1.0d0))), dimension(nx), intent(inout) :: Iy
complex(kind=kind((1.0d0,1.0d0))), dimension(nx), intent(in) :: Ix
Iy = Iy * Ix
end subroutine complex_mult_test
Code produced by GCC compiler inside the loop body:
movsd 0x8(%rsi),%xmm3
movsd (%rdi),%xmm5
inc %rax
movsd 0x8(%rdi),%xmm4
movsd (%rsi),%xmm2
add $0x10,%rsi
movapd %xmm3,%xmm1
mulsd %xmm5,%xmm3
movapd %xmm2,%xmm0
mulsd %xmm4,%xmm1
mulsd %xmm5,%xmm0
mulsd %xmm4,%xmm2
subsd %xmm1,%xmm0
addsd %xmm3,%xmm2
movsd %xmm0,(%rdi)
movsd %xmm2,0x8(%rdi)
A complex multiply (x0,y0)*(x1,y1)=(x0*x1-y0*y1,x0*y1+x1*y0).
This could implemented using packed instructions.
Following instructions will be useful.
i. movhpd, movddup, shufpd to arrange data properly.
ii. mulpd to do two multiply at once
iii. addsubpd to combine the addition and subtraction.
Hand coding we get 9 instructions
movupd (%rdi),%xmm2 //xmm2: x0,y0
movddup (%rsi),%xmm0 //xmm0: x1,x1
mulpd %xmm2,%xmm0 //xmm0: x1*x0,x1*y0
movddup 0x8(%rsi),%xmm1 //xmm1: y1,y1
shufpd $0x1,%xmm2,%xmm2 //xmm2: y0,x0
mulpd %xmm2,%xmm1 //xmm1: y0*y1,x0*y1
addsubpd %xmm0,%xmm1 //xmm1: x0*x1-y0*y1,x0*y1+x1*y0
movlpd %xmm1,(%rdi)
movhpd %xmm1,0x8(%rdi)
Other relevant information:
1. Compile flags: -O3 -ffast-math -m64 -march=amdfam10
2. gfortran version: gfortran -v
Using built-in specs.
Target: x86_64-unknown-linux-gnu
Configured with: /tmp/src/gcc-4.3.0/configure --prefix=/opt/amd/gcc-4.3.0
--enable-languages=c,c++,fortran --enable-stage1-checking
--with-as=/opt/amd/gcc-4.3.0/bin/as --with-ld=/opt/amd/gcc-4.3.0/bin/ld
--with-mpfr=/tmp/install/mpfr-2.3.0 --with-gmp=/tmp/install/gmp-4.2.2
Thread model: posix
gcc version 4.3.1 20080312 (prerelease) (GCC)
3. model name: AMD Phenom(tm) 8650 Triple-Core Processor
4. flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt rdtscp lm
3dnowext 3dnow constant_tsc pni cx16 popcnt lahf_lm cmp_legacy svm extapic
cr8_legacy altmovcr8 abm sse4a misalignsse 3dnowprefetch osvw
--
Summary: Fortran complex array multiply missed optimization
Product: gcc
Version: 4.3.1
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: fortran
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: rajiv dot adhikary at amd dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36840
More information about the Gcc-bugs
mailing list