This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug libfortran/78379] Processor-specific versions for matmul


https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78379

--- Comment #20 from Jerry DeLisle <jvdelisle at gcc dot gnu.org> ---
(In reply to Thomas Koenig from comment #18)
> Created attachment 40119 [details]
> Version that works (AVX only)
> 
> Here is a version that should only do AVX stuff on Intel processors.
> Optimization for other processor types could come later.

This is interesting. This patch works fine on the AMD processors I tested.

Looking at the disaasembly the vanilla matmul does use the xmm registers but
does not use any vector instructions. Peak with this is about 9.3 gflops.

With -mavx and -mprefer-avx128 the peak is 10.0 gflops or about 7.5%
improvement.

I think get this patch committed and then we can work on the AMD side. I know
Steve is running an FX series AMD processor. Once this patch goes in, I will
give it a spin there. The FX are clearly better than this generation of APU
which is more focused on using the on chip GPU features (which are pretty good)

We will also want to keep an eye on the Zen based processors which I expect
will behave more like Intel regarding the vector instructions (well we will see
anyway)

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]