[Bug fortran/78611] New: -march=native makes code 3x slower

Wed Nov 30 11:58:00 GMT 2016

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78611

            Bug ID: 78611
           Summary: -march=native makes code 3x slower
           Product: gcc
           Version: 6.2.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: fortran
          Assignee: unassigned at gcc dot gnu.org
          Reporter: pepalogik at seznam dot cz
  Target Milestone: ---

Created attachment 40199
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=40199&action=edit
Source code, include files, and inputs

Hi,

I encountered the problem in version 5.4.0, then installed 6.2.0, and it's
still the same. Details below and test case attached.

jenda@VivoBook ~/Bug reports/gfortran/6/PhSh1 $ gfortran-6 -v
Using built-in specs.
COLLECT_GCC=gfortran-6
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/6/lto-wrapper
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Ubuntu
6.2.0-3ubuntu11~16.04' --with-bugurl=file:///usr/share/doc/gcc-6/README.Bugs
--enable-languages=c,ada,c++,java,go,d,fortran,objc,obj-c++ --prefix=/usr
--program-suffix=-6 --enable-shared --enable-linker-build-id
--libexecdir=/usr/lib --without-included-gettext --enable-threads=posix
--libdir=/usr/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu
--enable-libstdcxx-debug --enable-libstdcxx-time=yes
--with-default-libstdcxx-abi=new --enable-gnu-unique-object
--disable-vtable-verify --enable-libmpx --enable-plugin --with-system-zlib
--disable-browser-plugin --enable-java-awt=gtk --enable-gtk-cairo
--with-java-home=/usr/lib/jvm/java-1.5.0-gcj-6-amd64/jre --enable-java-home
--with-jvm-root-dir=/usr/lib/jvm/java-1.5.0-gcj-6-amd64
--with-jvm-jar-dir=/usr/lib/jvm-exports/java-1.5.0-gcj-6-amd64
--with-arch-directory=amd64 --with-ecj-jar=/usr/share/java/eclipse-ecj.jar
--enable-objc-gc --enable-multiarch --disable-werror --with-arch-32=i686
--with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib
--with-tune=generic --enable-checking=release --build=x86_64-linux-gnu
--host=x86_64-linux-gnu --target=x86_64-linux-gnu
Thread model: posix
gcc version 6.2.0 20160901 (Ubuntu 6.2.0-3ubuntu11~16.04)
jenda@VivoBook ~/Bug reports/gfortran/6/PhSh1 $ gfortran-6 phsh1.f -std=legacy
-I. -o default/phsh1
jenda@VivoBook ~/Bug reports/gfortran/6/PhSh1 $ cd default/
jenda@VivoBook ~/Bug reports/gfortran/6/PhSh1/default $ time ./phsh1 < ../bmtz
 Slab or Bulk calculation?
 input 1 for Slab or 0 for Bulk
 Input the MTZ value from the substrate calculation

real    72m51.345s
user    72m48.584s
sys     0m0.968s
jenda@VivoBook ~/Bug reports/gfortran/6/PhSh1/default $ cd ..
jenda@VivoBook ~/Bug reports/gfortran/6/PhSh1 $ gfortran-6 phsh1.f -std=legacy
-I. -march=native -o march/phsh1
jenda@VivoBook ~/Bug reports/gfortran/6/PhSh1 $ cd march/
jenda@VivoBook ~/Bug reports/gfortran/6/PhSh1/march $ time ./phsh1 < ../bmtz
 Slab or Bulk calculation?
 input 1 for Slab or 0 for Bulk
 Input the MTZ value from the substrate calculation

real    217m56.080s
user    217m52.092s
sys     0m1.096s

As shown, code compiled with -march=native is 3x slower. All outputs are
identical, so it is solely a performance issue. Adding -O3 isn't very helpful.
My CPU is Intel(R) Core(TM) i3-3217U CPU @ 1.80GHz with these flags:
fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush
dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc
arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu
pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm pcid
sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer xsave avx f16c lahf_lm epb
tpr_shadow vnmi flexpriority ept vpid fsgsbase smep erms xsaveopt dtherm ida
arat pln pts

The code is an old, single-threaded F77 program calculating crystal potentials.
Profiler shows that almost all the time is spent in subroutine MTZ.