[patch, fortran] Enable FMA for AVX2 and AVX512F for matmul

Jakub Jelinek jakub@redhat.com
Thu Mar 2 08:43:00 GMT 2017


On Wed, Mar 01, 2017 at 10:00:08PM +0100, Thomas Koenig wrote:
> @@ -101,7 +93,7 @@
>  `static void
>  'matmul_name` ('rtype` * const restrict retarray, 
>  	'rtype` * const restrict a, 'rtype` * const restrict b, int try_blas,
> -	int blas_limit, blas_call gemm) __attribute__((__target__("avx2")));
> +	int blas_limit, blas_call gemm) __attribute__((__target__("avx2,fma")));
>  static' include(matmul_internal.m4)dnl
>  `#endif /* HAVE_AVX2 */
>  

I guess the question here is if there are any CPUs that have AVX2 but don't
have FMA3.  If there are none, then this is not controversial, if there are
some, it depends on how widely they are used compared to ones that have both
AVX2 and FMA3.  Going just from our -march= bitsets, it seems if there is
PTA_AVX2, then there is also PTA_FMA: haswell, broadwell, skylake, skylake-avx512, knl,
bdver4, znver1, there are CPUs that have just PTA_AVX and not PTA_AVX2 and
still have PTA_FMA: bdver2, bdver3 (but that is not relevant to this patch).

> @@ -110,7 +102,7 @@
>  `static void
>  'matmul_name` ('rtype` * const restrict retarray, 
>  	'rtype` * const restrict a, 'rtype` * const restrict b, int try_blas,
> -	int blas_limit, blas_call gemm) __attribute__((__target__("avx512f")));
> +	int blas_limit, blas_call gemm) __attribute__((__target__("avx512f,fma")));
>  static' include(matmul_internal.m4)dnl
>  `#endif  /* HAVE_AVX512F */
>  

I think this change is not needed, because the EVEX encoded
VFMADD???[SP][DS] instructions etc. are in AVX512F ISA, not in FMA3 ISA
(which has just the VEX encoded ones).
Which is why I'm seeing the fmas in my libgfortran even without your patch.
Thus I think you should remove this from your patch.

> @@ -147,7 +141,8 @@
>  #endif  /* HAVE_AVX512F */
>  
>  #ifdef HAVE_AVX2
> -      	  if (__cpu_model.__cpu_features[0] & (1 << FEATURE_AVX2))
> +      	  if ((__cpu_model.__cpu_features[0] & (1 << FEATURE_AVX2))
> +	     && (__cpu_model.__cpu_features[0] & (1 << FEATURE_FMA)))
>  	    {
>  	      matmul_p = matmul_'rtype_code`_avx2;
>  	      goto tailcall;

and this too.

	Jakub



More information about the Gcc-patches mailing list