This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

[Bug libfortran/78379] Processor-specific versions for matmul

From: "jvdelisle at gcc dot gnu.org" <gcc-bugzilla at gcc dot gnu dot org>
To: gcc-bugs at gcc dot gnu dot org
Date: Tue, 22 Nov 2016 20:53:41 +0000
Subject: [Bug libfortran/78379] Processor-specific versions for matmul
Auto-submitted: auto-generated
References: <bug-78379-4@http.gcc.gnu.org/bugzilla/>

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78379

--- Comment #20 from Jerry DeLisle <jvdelisle at gcc dot gnu.org> ---
(In reply to Thomas Koenig from comment #18)
> Created attachment 40119 [details]
> Version that works (AVX only)
> 
> Here is a version that should only do AVX stuff on Intel processors.
> Optimization for other processor types could come later.

This is interesting. This patch works fine on the AMD processors I tested.

Looking at the disaasembly the vanilla matmul does use the xmm registers but
does not use any vector instructions. Peak with this is about 9.3 gflops.

With -mavx and -mprefer-avx128 the peak is 10.0 gflops or about 7.5%
improvement.

I think get this patch committed and then we can work on the AMD side. I know
Steve is running an FX series AMD processor. Once this patch goes in, I will
give it a spin there. The FX are clearly better than this generation of APU
which is more focused on using the on chip GPU features (which are pretty good)

We will also want to keep an eye on the Zen based processors which I expect
will behave more like Intel regarding the vector instructions (well we will see
anyway)

References:
- [Bug libfortran/78379] New: Processor-specific versions for matmul
  - From: tkoenig at gcc dot gnu.org

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]