[PATCH] Enable GCC support for AMX
Hongyu Wang
wwwhhhyyy333@gmail.com
Fri Sep 18 08:31:55 GMT 2020
Hi Kirill,
Very Appreciated for your review again
I just update the patch with adding XSAVE dependency and use
__builtin_cpu_supports for runtime test.
Re-based on Sept. 15 trunk and tested with sde. Kindly PING.
Hongyu Wang <wwwhhhyyy333@gmail.com> 于2020年9月12日周六 上午1:00写道:
> Hi
>
> Thanks for your review, and sorry for the late reply. It took a while
> to finish the runtime test.
>
> > > diff --git a/gcc/config.gcc b/gcc/config.gcc
> > > index 797f0ad5edd..d0e59e86a5c 100644
> > > --- a/gcc/config.gcc
> > > +++ b/gcc/config.gcc
> > > @@ -412,7 +412,7 @@ i[34567]86-*-*)
> > > waitpkgintrin.h cldemoteintrin.h
> avx512bf16vlintrin.h
> > > avx512bf16intrin.h enqcmdintrin.h
> serializeintrin.h
> > > avx512vp2intersectintrin.h
> avx512vp2intersectvlintrin.h
> > > - tsxldtrkintrin.h"
> > > + tsxldtrkintrin.h amxtileintrin.h amxint8intrin.h
> amxbf16intrin.h"
> >
> > Line more than 80 chars.
> >
> > > ;;
> > > x86_64-*-*)
> > > cpu_type=i386
> > > @@ -447,7 +447,7 @@ x86_64-*-*)
> > > waitpkgintrin.h cldemoteintrin.h
> avx512bf16vlintrin.h
> > > avx512bf16intrin.h enqcmdintrin.h
> serializeintrin.h
> > > avx512vp2intersectintrin.h
> avx512vp2intersectvlintrin.h
> > > - tsxldtrkintrin.h"
> > > + tsxldtrkintrin.h amxtileintrin.h amxint8intrin.h
> amxbf16intrin.h"
> >
> > Ditto.
>
> Changed.
>
> >
> > > diff --git a/gcc/config/i386/amxbf16intrin.h
> b/gcc/config/i386/amxbf16intrin.h
> > > new file mode 100644
> > > index 00000000000..df0e2262d50
> > > --- /dev/null
> > > +++ b/gcc/config/i386/amxbf16intrin.h
> > > @@ -0,0 +1,25 @@
> > > +#if !defined _IMMINTRIN_H_INCLUDED
> > > +#error "Never use <amxbf16intrin.h> directly; include <immintrin.h>
> instead."
> > > +#endif
> > > +
> > > +#ifndef _AMXBF16INTRIN_H_INCLUDED
> > > +#define _AMXBF16INTRIN_H_INCLUDED
> > > +
> > > +#if !defined(__AMX_BF16__)
> > > +#pragma GCC push_options
> > > +#pragma GCC target("amx-bf16")
> > > +#define __DISABLE_AMX_BF16__
> > > +#endif /* __AMX_BF16__ */
> > > +
> > > +#if defined(__x86_64__) && defined(__AMX_BF16__)
> > > +#define _tile_dpbf16ps(dst,src1,src2)
> \
> > > + __asm__ volatile\
> > > + ("{tdpbf16ps\t%%tmm"#src2", %%tmm"#src1",
> %%tmm"#dst"|tdpbf16ps\t%%tmm"#dst", %%tmm"#src1", %%tmm"#src2"}" ::)
> > > +#endif
> >
> > I hope in future we'll replace it with unspecs at least...
>
> Currently we think it is redundant to add builtins with just const int
> parameters, which are supposed to be replaced in the future.
>
> >
> > > diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt
> > > index c9f7195d423..9389dc24948 100644
> > > diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> > > index bca8c856dc8..a46e31f5862 100644
> > > --- a/gcc/doc/invoke.texi
> > > +++ b/gcc/doc/invoke.texi
> > > @@ -1357,6 +1357,7 @@ See RS/6000 and PowerPC Options.
> > > -mvpclmulqdq -mavx512bitalg -mmovdiri -mmovdir64b
> -mavx512vpopcntdq @gol
> > > -mavx5124fmaps -mavx512vnni -mavx5124vnniw -mprfchw -mrdpid @gol
> > > -mrdseed -msgx -mavx512vp2intersect -mserialize -mtsxldtrk@gol
> > > +-mamx-tile -mamx-int8 -mamx-bf16@gol
> >
> > Add space please.
>
> Changed.
>
> >
> > > diff --git a/gcc/testsuite/gcc.target/i386/amxbf16-asmintel-2.c
> b/gcc/testsuite/gcc.target/i386/amxbf16-asmintel-2.c
> > > new file mode 100644
> > > index 00000000000..605a44df3f8
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.target/i386/amxbf16-asmintel-2.c
> > > @@ -0,0 +1,4 @@
> > > +/* { dg-do assemble { target { ! ia32 } } } */
> > > +/* { dg-options "-O2 -mamx-bf16 -masm=intel" } */
> > > +/* { dg-require-effective-target amx_bf16 } */
> > > +#include"amxbf16-asmintel-1.c"
> >
> > I didn't get it. We ususally use second tescase to actually execute
> > it and (well, a little) verify that semantics is ok. E.g. that
> > operands order is correct. Could you please do that?
> > This applies to all *-2.c cases.
> > I've checked and looks like public SDE simulator supports AMX.
> >
>
> Added runtime test. Tested and passed under SDE.
>
> Also, we adjust the intrinsic call to accept macro parameters.
>
> Updated patch.
>
> > --
> > K
> > Hello,
> >
> > On 03 сен 08:17, H.J. Lu wrote:
> > > On Thu, Sep 3, 2020 at 8:08 AM Kirill Yukhin via Gcc-patches
> > > <gcc-patches@gcc.gnu.org> wrote:
> > > >
> > > > Hello,
> > > >
> > > > On 06 июл 09:58, Hongyu Wang via Gcc-patches wrote:
> > > > > Hi:
> > > > >
> > > > > This patch is about to support Intel Advanced Matrix Extensions
> (AMX)
> > > > > which will be enabled in GLC.
> > > > >
> > > > > AMX is a new 64-bit programming paradigm consisting of two
> > > > > compo nents: a set of 2-dimensional registers (tiles) representing
> > > > > sub-arrays from a larger 2-dimensional memory image,
> > > > > and an accelerator able to operate on tiles
> > > > >
> > > > > Supported instructions are
> > > > >
> > > > >
> AMX-TILE:ldtilecfg/sttilecfg/tileloadd/tileloaddt1/tilezero/tilerelease
> > > > > AMX-INT8:tdpbssd/tdpbsud/tdpbusd/tdpbuud
> > > > > AMX-BF16:tdpbf16ps
> > > > >
> > > > > The intrinsics adopts constant tile register number as its input
> parameters.
> > > > >
> > > > > For detailed information, please refer to
> > > > >
> https://software.intel.com/content/dam/develop/public/us/en/documents/architecture-instruction-set-extensions-programming-reference.pdf
> > > > >
> > > > > Bootstrap ok, regression test on i386/x86 backend is ok.
> > > > >
> > > > > OK for master?
> > > >
> > > > I was trying to apply your patch to recent master and got
> > > > compilation error:
> > > >
> > > > g++ -std=gnu++11 -fno-PIE -c -g -O2 -DIN_GCC -fno-exceptions
> -fno-rtti -fasynchronous-unwind-tables -W -Wall -Wno-narrowi
> > > > ng -Wwrite-strings -Wcast-qual -Wmissing-format-attribute
> -Woverloaded-virtual -pedantic -Wno-long-long -Wno-variadic-macros -Wn
> > > > o-overlength-strings -fno-common -DHAVE_CONFIG_H -I. -I.
> -I/export/kyukhin/gcc/src/gcc -I/export/kyukhin/gcc/src/gcc/. -I/expor
> > > > t/kyukhin/gcc/src/gcc/../include
> -I/export/kyukhin/gcc/src/gcc/../libcpp/include
> -I/export/kyukhin/gcc/src/gcc/../libdecnumber
> > > > -I/export/kyukhin/gcc/src/gcc/../libdecnumber/bid -I../libdecnumber
> -I/export/kyukhin/gcc/src/gcc/../libbacktrace -o i386-opti
> > > > ons.o -MT i386-options.o -MMD -MP -MF ./.deps/i386-options.TPo
> /export/kyukhin/gcc/src/gcc/config/i386/i386-options.c
> > > > /export/kyukhin/gcc/src/gcc/config/i386/i386-options.c: In function
> ‘bool ix86_option_override_internal(bool, gcc_options*, gcc_
> > > > options*)’:
> > > > /export/kyukhin/gcc/src/gcc/config/i386/i386-options.c:2263:41:
> error: ‘PTA_AMX_TILE’ was not declared in this scope
> > > > if (((processor_alias_table[i].flags & PTA_AMX_TILE) != 0)
> > > > ^
> > > > /export/kyukhin/gcc/src/gcc/config/i386/i386-options.c:2267:41:
> error: ‘PTA_AMX_INT8’ was not declared in this scope
> > > > if (((processor_alias_table[i].flags & PTA_AMX_INT8) != 0)
> > > > ^
> > > > /export/kyukhin/gcc/src/gcc/config/i386/i386-options.c:2271:41:
> error: ‘PTA_AMX_BF16’ was not declared in this scope
> > > > if (((processor_alias_table[i].flags & PTA_AMX_BF16) != 0)
> > > >
> > > > Could you please fix that?
> > >
> > > Here is the rebased patch against
> > >
> > > commit 3c219134152f645103f2fcd50735b177ccd76cde
> > > Author: Jonathan Wakely <jwakely@redhat.com>
> > > Date: Thu Sep 3 12:38:50 2020 +0100
> > >
> > > libstdc++: Optimise GCD algorithms
> > >
> > > Thanks.
> > >
> > > --
> > > H.J.
> >
> > > diff --git a/gcc/config.gcc b/gcc/config.gcc
> > > index 797f0ad5edd..d0e59e86a5c 100644
> > > --- a/gcc/config.gcc
> > > +++ b/gcc/config.gcc
> > > @@ -412,7 +412,7 @@ i[34567]86-*-*)
> > > waitpkgintrin.h cldemoteintrin.h
> avx512bf16vlintrin.h
> > > avx512bf16intrin.h enqcmdintrin.h
> serializeintrin.h
> > > avx512vp2intersectintrin.h
> avx512vp2intersectvlintrin.h
> > > - tsxldtrkintrin.h"
> > > + tsxldtrkintrin.h amxtileintrin.h amxint8intrin.h
> amxbf16intrin.h"
> >
> > Line more than 80 chars.
> >
> > > ;;
> > > x86_64-*-*)
> > > cpu_type=i386
> > > @@ -447,7 +447,7 @@ x86_64-*-*)
> > > waitpkgintrin.h cldemoteintrin.h
> avx512bf16vlintrin.h
> > > avx512bf16intrin.h enqcmdintrin.h
> serializeintrin.h
> > > avx512vp2intersectintrin.h
> avx512vp2intersectvlintrin.h
> > > - tsxldtrkintrin.h"
> > > + tsxldtrkintrin.h amxtileintrin.h amxint8intrin.h
> amxbf16intrin.h"
> >
> > Ditto.
> >
> > > diff --git a/gcc/config/i386/amxbf16intrin.h
> b/gcc/config/i386/amxbf16intrin.h
> > > new file mode 100644
> > > index 00000000000..df0e2262d50
> > > --- /dev/null
> > > +++ b/gcc/config/i386/amxbf16intrin.h
> > > @@ -0,0 +1,25 @@
> > > +#if !defined _IMMINTRIN_H_INCLUDED
> > > +#error "Never use <amxbf16intrin.h> directly; include <immintrin.h>
> instead."
> > > +#endif
> > > +
> > > +#ifndef _AMXBF16INTRIN_H_INCLUDED
> > > +#define _AMXBF16INTRIN_H_INCLUDED
> > > +
> > > +#if !defined(__AMX_BF16__)
> > > +#pragma GCC push_options
> > > +#pragma GCC target("amx-bf16")
> > > +#define __DISABLE_AMX_BF16__
> > > +#endif /* __AMX_BF16__ */
> > > +
> > > +#if defined(__x86_64__) && defined(__AMX_BF16__)
> > > +#define _tile_dpbf16ps(dst,src1,src2)
> \
> > > + __asm__ volatile\
> > > + ("{tdpbf16ps\t%%tmm"#src2", %%tmm"#src1",
> %%tmm"#dst"|tdpbf16ps\t%%tmm"#dst", %%tmm"#src1", %%tmm"#src2"}" ::)
> > > +#endif
> >
> > I hope in future we'll replace it with unspecs at least...
> >
> > > diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt
> > > index c9f7195d423..9389dc24948 100644
> > > diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> > > index bca8c856dc8..a46e31f5862 100644
> > > --- a/gcc/doc/invoke.texi
> > > +++ b/gcc/doc/invoke.texi
> > > @@ -1357,6 +1357,7 @@ See RS/6000 and PowerPC Options.
> > > -mvpclmulqdq -mavx512bitalg -mmovdiri -mmovdir64b
> -mavx512vpopcntdq @gol
> > > -mavx5124fmaps -mavx512vnni -mavx5124vnniw -mprfchw -mrdpid @gol
> > > -mrdseed -msgx -mavx512vp2intersect -mserialize -mtsxldtrk@gol
> > > +-mamx-tile -mamx-int8 -mamx-bf16@gol
> >
> > Add space please.
> >
> > > diff --git a/gcc/testsuite/gcc.target/i386/amxbf16-asmintel-2.c
> b/gcc/testsuite/gcc.target/i386/amxbf16-asmintel-2.c
> > > new file mode 100644
> > > index 00000000000..605a44df3f8
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.target/i386/amxbf16-asmintel-2.c
> > > @@ -0,0 +1,4 @@
> > > +/* { dg-do assemble { target { ! ia32 } } } */
> > > +/* { dg-options "-O2 -mamx-bf16 -masm=intel" } */
> > > +/* { dg-require-effective-target amx_bf16 } */
> > > +#include"amxbf16-asmintel-1.c"
> >
> > I didn't get it. We ususally use second tescase to actually execute
> > it and (well, a little) verify that semantics is ok. E.g. that
> > operands order is correct. Could you please do that?
> > This applies to all *-2.c cases.
> > I've checked and looks like public SDE simulator supports AMX.
> >
> > --
> > K
>
--
Regards,
Hongyu, Wang
-------------- next part --------------
A non-text attachment was scrubbed...
Name: GCC_AMX_support_v3.patch
Type: text/x-patch
Size: 65206 bytes
Desc: not available
URL: <https://gcc.gnu.org/pipermail/gcc-patches/attachments/20200918/ec1e5d72/attachment-0001.bin>
More information about the Gcc-patches
mailing list