This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug libfortran/78379] Processor-specific versions for matmul
- From: "jvdelisle at gcc dot gnu.org" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: Tue, 22 Nov 2016 20:53:41 +0000
- Subject: [Bug libfortran/78379] Processor-specific versions for matmul
- Auto-submitted: auto-generated
- References: <bug-78379-4@http.gcc.gnu.org/bugzilla/>
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78379
--- Comment #20 from Jerry DeLisle <jvdelisle at gcc dot gnu.org> ---
(In reply to Thomas Koenig from comment #18)
> Created attachment 40119 [details]
> Version that works (AVX only)
>
> Here is a version that should only do AVX stuff on Intel processors.
> Optimization for other processor types could come later.
This is interesting. This patch works fine on the AMD processors I tested.
Looking at the disaasembly the vanilla matmul does use the xmm registers but
does not use any vector instructions. Peak with this is about 9.3 gflops.
With -mavx and -mprefer-avx128 the peak is 10.0 gflops or about 7.5%
improvement.
I think get this patch committed and then we can work on the AMD side. I know
Steve is running an FX series AMD processor. Once this patch goes in, I will
give it a spin there. The FX are clearly better than this generation of APU
which is more focused on using the on chip GPU features (which are pretty good)
We will also want to keep an eye on the Zen based processors which I expect
will behave more like Intel regarding the vector instructions (well we will see
anyway)