This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug libfortran/78379] Processor-specific versions for matmul
- From: "jvdelisle at gcc dot gnu.org" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: Thu, 17 Nov 2016 19:57:21 +0000
- Subject: [Bug libfortran/78379] Processor-specific versions for matmul
- Auto-submitted: auto-generated
- References: <bug-78379-4@http.gcc.gnu.org/bugzilla/>
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78379
Jerry DeLisle <jvdelisle at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |jvdelisle at gcc dot gnu.org
--- Comment #3 from Jerry DeLisle <jvdelisle at gcc dot gnu.org> ---
I did apply your second patch:
I do not get any improvement and results are diminished from current trunk, so
I am missing something. This is same machine I used showing results in 51119.
It does have avx.
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb
rdtscp lm constant_tsc rep_good nopl nonstop_tsc extd_apicid aperfmperf
eagerfpu pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 popcnt aes xsave
avx f16c lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse
3dnowprefetch osvw ibs xop skinit wdt lwp fma4 tce nodeid_msr tbm topoext
perfctr_core perfctr_nb cpb hw_pstate vmmcall bmi1 arat npt lbrv svm_lock
nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter
pfthreshold
$ gfc -static-libgfortran -finline-matmul-limit=0 -Ofast -o compare_mavx
compare.f90
$ ./a.out
=========================================================
================ MEASURED GIGAFLOPS =
=========================================================
Matmul Matmul
fixed Matmul variable
Size Loops explicit refMatmul assumed explicit
=========================================================
2 2000 5.043 0.045 0.091 0.150
4 2000 1.417 0.235 0.353 0.325
8 2000 2.016 0.634 0.862 2.021
16 2000 5.332 2.834 2.239 2.929
32 2000 6.169 3.496 1.931 3.289
64 2000 2.656 2.836 2.655 2.657
128 2000 2.898 3.286 2.901 2.901
256 477 3.157 3.429 3.156 3.157
512 59 3.082 2.356 3.133 3.126
1024 7 3.102 1.363 3.144 3.136
2048 1 3.099 1.685 3.144 3.140