[PATCH][GCC][AArch64] Dot Product SIMD patterns [Patch (5/8)]

Thu Oct 12 15:49:00 GMT 2017

> -----Original Message-----
> From: Richard Earnshaw (lists) [mailto:Richard.Earnshaw@arm.com]
> Sent: 12 October 2017 13:58
> To: Tamar Christina; James Greenhalgh
> Cc: gcc-patches@gcc.gnu.org; nd; Marcus Shawcroft
> Subject: Re: [PATCH][GCC][AArch64] Dot Product SIMD patterns [Patch
> (5/8)]
> 
> On 06/10/17 13:45, Tamar Christina wrote:
> > Hi All,
> >
> > This is a respin with the feedback suggested.
> >
> > Regtested on arm-none-eabi, armeb-none-eabi, aarch64-none-elf and
> > aarch64_be-none-elf with no issues found.
> >
> > Ok for trunk?
> >
> > gcc/
> > 2017-10-06  Tamar Christina  <tamar.christina@arm.com>
> >
> >         * config/aarch64/aarch64-builtins.c
> >         (aarch64_types_quadopu_lane_qualifiers): New.
> >         (TYPES_QUADOPU_LANE): New.
> >         * config/aarch64/aarch64-simd.md (aarch64_<sur>dot<vsi2qi>): New.
> >         (<sur>dot_prod<vsi2qi>, aarch64_<sur>dot_lane<vsi2qi>): New.
> >         (aarch64_<sur>dot_laneq<vsi2qi>): New.
> >         * config/aarch64/aarch64-simd-builtins.def (sdot, udot): New.
> >         (sdot_lane, udot_lane, sdot_laneq, udot_laneq): New.
> >         * config/aarch64/iterators.md (sur): Add UNSPEC_SDOT,
> UNSPEC_UDOT.
> >         (Vdottype, DOTPROD): New.
> >         (sur): Add SDOT and UDOT.
> 
> OK if this passes a native bootstrap.

Boostrapped on aarch64-none-linux-gnu and no issues.

Thanks,
Tamar

> 
> R.
> > ________________________________________
> > From: Tamar Christina
> > Sent: Tuesday, September 5, 2017 7:42:40 PM
> > To: James Greenhalgh
> > Cc: gcc-patches@gcc.gnu.org; nd; Richard Earnshaw; Marcus Shawcroft
> > Subject: Re: [PATCH][GCC][AArch64] Dot Product SIMD patterns [Patch
> > (5/8)]
> >
> >>
> >> ________________________________________
> >> From: James Greenhalgh <james.greenhalgh@arm.com>
> >> Sent: Monday, September 4, 2017 12:01 PM
> >> To: Tamar Christina
> >> Cc: gcc-patches@gcc.gnu.org; nd; Richard Earnshaw; Marcus Shawcroft
> >> Subject: Re: [PATCH][GCC][AArch64] Dot Product SIMD patterns [Patch
> >> (5/8)]
> >>
> >> On Fri, Sep 01, 2017 at 02:22:17PM +0100, Tamar Christina wrote:
> >>> Hi All,
> >>>
> >>> This patch adds the instructions for Dot Product to AArch64 along
> >>> with the intrinsics and vectorizer pattern.
> >>>
> >>> Armv8.2-a dot product supports 8-bit element values both signed and
> >>> unsigned.
> >>>
> >>> Dot product is available from Arm8.2-a and onwards.
> >>>
> >>> Regtested and bootstrapped on aarch64-none-elf and no issues.
> >>>
> >>> Ok for trunk?
> >>>
> >>> gcc/
> >>> 2017-09-01  Tamar Christina  <tamar.christina@arm.com>
> >>>
> >>>       * config/aarch64/aarch64-builtins.c
> >>>       (aarch64_types_quadopu_lane_qualifiers): New.
> >>>       (TYPES_QUADOPU_LANE): New.
> >>>       * config/aarch64/aarch64-simd.md (aarch64_<sur>dot<dot_mode>):
> New.
> >>>       (<sur>dot_prod<dot_mode>, aarch64_<sur>dot_lane<dot_mode>):
> New.
> >>>       (aarch64_<sur>dot_laneq<dot_mode>): New.
> >>>       * config/aarch64/aarch64-simd-builtins.def (sdot, udot): New.
> >>>       (sdot_lane, udot_lane, sdot_laneq, udot_laneq): New.
> >>>       * config/aarch64/iterators.md (UNSPEC_SDOT, UNSPEC_UDOT):
> New.
> >>>       (DOT_MODE, dot_mode, Vdottype, DOTPROD): New.
> >>>       (sur): Add SDOT and UDOT.
> >>>
> >>> --
> >>
> >>> diff --git a/gcc/config/aarch64/aarch64-simd.md
> >>> b/gcc/config/aarch64/aarch64-simd.md
> >>> index
> >>>
> f3e084f8778d70c82823b92fa80ff96021ad26db..21d46c84ab317c2d62afdf8c48
> >>> 117886aaf483b0 100644
> >>> --- a/gcc/config/aarch64/aarch64-simd.md
> >>> +++ b/gcc/config/aarch64/aarch64-simd.md
> >>> @@ -386,6 +386,87 @@
> >>>  }
> >>>  )
> >>>
> >>> +;; These instructions map to the __builtins for the Dot Product
> operations.
> >>> +(define_insn "aarch64_<sur>dot<dot_mode>"
> >>> +  [(set (match_operand:VS 0 "register_operand" "=w")
> >>> +     (unspec:VS [(match_operand:VS 1 "register_operand" "0")
> >>> +                 (match_operand:<DOT_MODE> 2 "register_operand" "w")
> >>> +                 (match_operand:<DOT_MODE> 3 "register_operand" "w")]
> >>> +             DOTPROD))]
> >>> +  "TARGET_DOTPROD"
> >>> +  "<sur>dot\\t%0.<Vtype>, %2.<Vdottype>, %3.<Vdottype>"
> >>> +  [(set_attr "type" "neon_dot")]
> >>
> >> Would there be a small benefit in modelling this as:
> >>
> >>   [(set (match_operand:VS 0 "register_operand" "=w")
> >>         (add:VS ((match_operand:VS 1 "register_operand" "0")
> >>                  (unsepc:VS [(match_operand:<DOT_MODE> 2
> "register_operand" "w")
> >>                     (match_operand:<DOT_MODE> 3 "register_operand" "w")]
> >>                 DOTPROD)))]
> >>
> >
> > Maybe, I can't think of anything at the moment, but it certainly won't hurt.
> >
> >>
> >>> +)
> >>> +
> >>> +;; These expands map to the Dot Product optab the vectorizer checks
> for.
> >>> +;; The auto-vectorizer expects a dot product builtin that also does
> >>> +an ;; accumulation into the provided register.
> >>> +;; Given the following pattern
> >>> +;;
> >>> +;; for (i=0; i<len; i++) {
> >>> +;;     c = a[i] * b[i];
> >>> +;;     r += c;
> >>> +;; }
> >>> +;; return result;
> >>> +;;
> >>> +;; This can be auto-vectorized to
> >>> +;; r  = a[0]*b[0] + a[1]*b[1] + a[2]*b[2] + a[3]*b[3]; ;; ;; given
> >>> +enough iterations.  However the vectorizer can keep unrolling the
> >>> +loop ;; r += a[4]*b[4] + a[5]*b[5] + a[6]*b[6] + a[7]*b[7]; ;; r +=
> >>> +a[8]*b[8] + a[9]*b[9] + a[10]*b[10] + a[11]*b[11]; ;; ...
> >>> +;;
> >>> +;; and so the vectorizer provides r, in which the result has to be
> accumulated.
> >>> +(define_expand "<sur>dot_prod<dot_mode>"
> >>> +  [(set (match_operand:VS 0 "register_operand")
> >>> +     (unspec:VS [(match_operand:<DOT_MODE> 1 "register_operand")
> >>> +                 (match_operand:<DOT_MODE> 2 "register_operand")
> >>> +                 (match_operand:VS 3 "register_operand")]
> >>> +             DOTPROD))]
> >>
> >> This is just an expand that always ends in a DONE, so doesn't need
> >> the full description here, just:
> >>
> >>   [(match_operand:VS 0 "register_operand)
> >>    (match_operand:<DOT_MODE> 1 "register_operand")
> >>    (match_operand:<DOT_MODE> 2 "register_operand")
> >>    (match_operand:VS 3 "register_operand")]
> >
> > yes but I use the unspec to match the <sur> iterator to generate the
> > signed and unsigned versions of the optab.
> >
> >>
> >>> diff --git a/gcc/config/aarch64/iterators.md
> >>> b/gcc/config/aarch64/iterators.md index
> >>>
> cceb57525c7aa44933419bd317b1f03a7b76f4c4..533c12cca916669195e9b09452
> >>> 7ee0de31542b12 100644
> >>> --- a/gcc/config/aarch64/iterators.md
> >>> +++ b/gcc/config/aarch64/iterators.md
> >>> @@ -354,6 +354,8 @@
> >>>      UNSPEC_SQRDMLSH     ; Used in aarch64-simd.md.
> >>>      UNSPEC_FMAXNM       ; Used in aarch64-simd.md.
> >>>      UNSPEC_FMINNM       ; Used in aarch64-simd.md.
> >>> +    UNSPEC_SDOT              ; Used in aarch64-simd.md.
> >>> +    UNSPEC_UDOT              ; Used in aarch64-simd.md.
> >>>  ])
> >>>
> >>>  ;;
> >>> ------------------------------------------------------------------
> >>> @@ -810,6 +812,13 @@
> >>>  (define_mode_attr vsi2qi [(V2SI "v8qi") (V4SI "v16qi")])
> >>> (define_mode_attr VSI2QI [(V2SI "V8QI") (V4SI "V16QI")])
> >>>
> >>> +;; Mapping attribute for Dot Product input modes based on result
> mode.
> >>> +(define_mode_attr DOT_MODE [(V2SI "V8QI") (V4SI "V16QI")])
> >>> +(define_mode_attr dot_mode [(V2SI "v8qi") (V4SI "v16qi")])
> >>
> >> Are these not identical to the two lines above in the context?
> >>
> >>>  (define_mode_attr vsi2qi [(V2SI "v8qi") (V4SI "v16qi")])
> >>> (define_mode_attr VSI2QI [(V2SI "V8QI") (V4SI "V16QI")])
> >>
> >> Thanks,
> >> James