[patch, libfortran] AMD-specific versions of library matmul
Jerry DeLisle
jvdelisle@charter.net
Thu May 25 14:11:00 GMT 2017
On 05/25/2017 03:45 AM, Thomas Koenig wrote:
> Hello world,
>
> the attached patch speeds up the library version of matmul for AMD chips
> by selecting AVX128 instructions and, depending on which instructions
> are supported, either FMA3 (aka FMA) or FMA4.
>
> Jerry tested this on his AMD systems, and found a speedup vs. the
> current code of around 10%.
>
> I have been unable to test this on a Ryzen system (the new compile farm
> machines won't accept my login yet). From the benchmarks I have read,
> this method should also work fairly well on a Ryzen.
>
> So, OK for trunk?
Yes, OK. Maybe test Ryzen first?
I just confirmed access to the Ryzen machines so I plan to get set up and test
there.
Time to start looking under the hood.
cat /proc/cpuinfo gives for flags:
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36
clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm
constant_tsc rep_good nopl nonstop_tsc extd_apicid aperfmperf eagerfpu pni
pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c
rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse
3dnowprefetch osvw skinit wdt tce topoext perfctr_core perfctr_nb bpext
perfctr_l2 mwaitx hw_pstate vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap
clflushopt sha_ni xsaveopt xsavec xgetbv1 xsaves clzero irperf arat npt lbrv
svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter
pfthreshold avic overflow_recov succor smca
More information about the Gcc-patches
mailing list