This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Improve V?TImode shifts (PR target/82370)


On 24 Oct 19:14, Uros Bizjak wrote:
> On Tue, Oct 24, 2017 at 4:46 PM, Jakub Jelinek <jakub@redhat.com> wrote:
> > On Tue, Oct 24, 2017 at 05:44:44AM -0700, H.J. Lu wrote:
> >> > What I can see from config/atom.md:
> >> > ;; if palignr or psrldq
> >> > (define_insn_reservation  "atom_sseishft_2" 1
> >> >   (and (eq_attr "cpu" "atom")
> >> >        (and (eq_attr "type" "sseishft")
> >> >             (and (eq_attr "atom_unit" "sishuf")
> >> >                  (match_operand 2 "immediate_operand"))))
> >> >   "atom-simple-0")
> >> >
> >> > This leads back to initial commit of atom.md.
> >> > So, discrimination of psrldq and pslldq looks intentional.
> >> >
> >> > On the over hand, I see in Software Optimization Guide, Table 14-2 that
> >> > PSRLDQ and PSLLDQ occupy same line which directs both insns to port-0 (p 14-18).
> >> > So, looking from that point, definition for PSLLDQ which allow either of port-0
> >> > and port-1 looks wrong (atom-simple-either reservation).
> >> >
> >> > In absence of other information, I'd play on safe side and leave things as they
> >> > occur right now.
> >> >
> >>
> >> I prefer to leave atom.md ASIS.  As for (set_attr "atom_unit"
> >> "sishuf"), it was added
> >> for
> >>
> >> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=44615
> >>
> >> You can drop (set_attr "atom_unit" "sishuf") if gcc.target/i386/sse2-vec-2a.c
> >> still compiles.
> >
> > No, it was added earlier than that, that PR was about insns with psrldq with
> > implicit immediate (which don't have a CONST_INT operands[2]).  This insn
> > does have it, the testcase passes regardless of whether sishuf or other is
> > used, it is purely a tuning thing.
> >
> > In any case, here is an updated patch that just preserves the status quo
> > (psrldq having the sishuf unit, pslldq not) using a simple code attribute.
> 
> Agner Fog's tables confirm Jakub's observation:
> 
> PSLL/RL/RAW/D/Q (x)mm,(x)mm 2 FP0 5 5
> PSLL/RL/RAW/D/Q (x)xmm,i 1 FP0 1 1
> PSLL/RLDQ xmm,i 1 FP0 1 1
> 
> I fail to see how could left and right shifts use different units.
> Since the test passes, let's change pslldq to use sishuf unit. There
> is no better alternative from the list of units.
Then I bet your patch is OK for main trunk.

--
Thanks, K
> 
> > 2017-10-24  Jakub Jelinek  <jakub@redhat.com>
> >
> >         PR target/82370
> >         * config/i386/sse.md (VIMAX_AVX2): Remove V4TImode.
> >         (VIMAX_AVX2_AVX512BW, VIMAX_AVX512VL): New mode iterators.
> >         (vec_shl_<mode>): Remove unused expander.
> >         (avx512bw_<shift_insn><mode>3): New define_insn.
> >         (atom_shift_unit): New code iterator.
> >         (<sse2_avx2>_ashl<mode>3, <sse2_avx2>_lshr<mode>3): Replaced by ...
> >         (<sse2_avx2>_<shift_insn><mode>3): ... this.  New define_insn.
> >
> >         * gcc.target/i386/pr82370.c: New test.
> 
> OK with the change od pslldq's unit to sishuf.
> 
> Thanks,
> Uros.
> 
> > --- gcc/config/i386/sse.md.jj   2017-10-20 16:30:35.286208652 +0200
> > +++ gcc/config/i386/sse.md      2017-10-24 16:29:54.848934888 +0200
> > @@ -371,10 +371,17 @@ (define_mode_iterator V16FI
> >    [V16SF V16SI])
> >
> >  ;; ??? We should probably use TImode instead.
> > -(define_mode_iterator VIMAX_AVX2
> > +(define_mode_iterator VIMAX_AVX2_AVX512BW
> >    [(V4TI "TARGET_AVX512BW") (V2TI "TARGET_AVX2") V1TI])
> >
> > -;; ??? This should probably be dropped in favor of VIMAX_AVX2.
> > +;; Suppose TARGET_AVX512BW as baseline
> > +(define_mode_iterator VIMAX_AVX512VL
> > +  [V4TI (V2TI "TARGET_AVX512VL") (V1TI "TARGET_AVX512VL")])
> > +
> > +(define_mode_iterator VIMAX_AVX2
> > +  [(V2TI "TARGET_AVX2") V1TI])
> > +
> > +;; ??? This should probably be dropped in favor of VIMAX_AVX2_AVX512BW.
> >  (define_mode_iterator SSESCALARMODE
> >    [(V4TI "TARGET_AVX512BW") (V2TI "TARGET_AVX2") TI])
> >
> > @@ -10778,9 +10785,9 @@ (define_insn "<shift_insn><mode>3<mask_n
> >     (set_attr "mode" "<sseinsnmode>")])
> >
> >
> > -(define_expand "vec_shl_<mode>"
> > +(define_expand "vec_shr_<mode>"
> >    [(set (match_dup 3)
> > -       (ashift:V1TI
> > +       (lshiftrt:V1TI
> >          (match_operand:VI_128 1 "register_operand")
> >          (match_operand:SI 2 "const_0_to_255_mul_8_operand")))
> >     (set (match_operand:VI_128 0 "register_operand") (match_dup 4))]
> > @@ -10791,48 +10798,26 @@ (define_expand "vec_shl_<mode>"
> >    operands[4] = gen_lowpart (<MODE>mode, operands[3]);
> >  })
> >
> > -(define_insn "<sse2_avx2>_ashl<mode>3"
> > -  [(set (match_operand:VIMAX_AVX2 0 "register_operand" "=x,v")
> > -       (ashift:VIMAX_AVX2
> > -        (match_operand:VIMAX_AVX2 1 "register_operand" "0,v")
> > -        (match_operand:SI 2 "const_0_to_255_mul_8_operand" "n,n")))]
> > -  "TARGET_SSE2"
> > +(define_insn "avx512bw_<shift_insn><mode>3"
> > +  [(set (match_operand:VIMAX_AVX512VL 0 "register_operand" "=v")
> > +       (any_lshift:VIMAX_AVX512VL
> > +        (match_operand:VIMAX_AVX512VL 1 "nonimmediate_operand" "vm")
> > +        (match_operand:SI 2 "const_0_to_255_mul_8_operand" "n")))]
> > +  "TARGET_AVX512BW"
> >  {
> >    operands[2] = GEN_INT (INTVAL (operands[2]) / 8);
> > -
> > -  switch (which_alternative)
> > -    {
> > -    case 0:
> > -      return "pslldq\t{%2, %0|%0, %2}";
> > -    case 1:
> > -      return "vpslldq\t{%2, %1, %0|%0, %1, %2}";
> > -    default:
> > -      gcc_unreachable ();
> > -    }
> > +  return "vp<vshift>dq\t{%2, %1, %0|%0, %1, %2}";
> >  }
> > -  [(set_attr "isa" "noavx,avx")
> > -   (set_attr "type" "sseishft")
> > +  [(set_attr "type" "sseishft")
> >     (set_attr "length_immediate" "1")
> > -   (set_attr "prefix_data16" "1,*")
> > -   (set_attr "prefix" "orig,vex")
> > +   (set_attr "prefix" "maybe_evex")
> >     (set_attr "mode" "<sseinsnmode>")])
> >
> > -(define_expand "vec_shr_<mode>"
> > -  [(set (match_dup 3)
> > -       (lshiftrt:V1TI
> > -        (match_operand:VI_128 1 "register_operand")
> > -        (match_operand:SI 2 "const_0_to_255_mul_8_operand")))
> > -   (set (match_operand:VI_128 0 "register_operand") (match_dup 4))]
> > -  "TARGET_SSE2"
> > -{
> > -  operands[1] = gen_lowpart (V1TImode, operands[1]);
> > -  operands[3] = gen_reg_rtx (V1TImode);
> > -  operands[4] = gen_lowpart (<MODE>mode, operands[3]);
> > -})
> > +(define_code_attr atom_shift_unit [(ashift "*") (lshiftrt "sishuf")])
> >
> > -(define_insn "<sse2_avx2>_lshr<mode>3"
> > +(define_insn "<sse2_avx2>_<shift_insn><mode>3"
> >    [(set (match_operand:VIMAX_AVX2 0 "register_operand" "=x,v")
> > -       (lshiftrt:VIMAX_AVX2
> > +       (any_lshift:VIMAX_AVX2
> >          (match_operand:VIMAX_AVX2 1 "register_operand" "0,v")
> >          (match_operand:SI 2 "const_0_to_255_mul_8_operand" "n,n")))]
> >    "TARGET_SSE2"
> > @@ -10842,9 +10827,9 @@ (define_insn "<sse2_avx2>_lshr<mode>3"
> >    switch (which_alternative)
> >      {
> >      case 0:
> > -      return "psrldq\t{%2, %0|%0, %2}";
> > +      return "p<vshift>dq\t{%2, %0|%0, %2}";
> >      case 1:
> > -      return "vpsrldq\t{%2, %1, %0|%0, %1, %2}";
> > +      return "vp<vshift>dq\t{%2, %1, %0|%0, %1, %2}";
> >      default:
> >        gcc_unreachable ();
> >      }
> > @@ -10852,7 +10837,7 @@ (define_insn "<sse2_avx2>_lshr<mode>3"
> >    [(set_attr "isa" "noavx,avx")
> >     (set_attr "type" "sseishft")
> >     (set_attr "length_immediate" "1")
> > -   (set_attr "atom_unit" "sishuf")
> > +   (set_attr "atom_unit" "<atom_shift_unit>")
> >     (set_attr "prefix_data16" "1,*")
> >     (set_attr "prefix" "orig,vex")
> >     (set_attr "mode" "<sseinsnmode>")])
> > --- gcc/testsuite/gcc.target/i386/pr82370.c.jj  2017-10-24 16:22:16.665464886 +0200
> > +++ gcc/testsuite/gcc.target/i386/pr82370.c     2017-10-24 16:22:16.665464886 +0200
> > @@ -0,0 +1,18 @@
> > +/* PR target/82370 */
> > +/* { dg-do compile } */
> > +/* { dg-options "-O2 -mavx512vl -mavx512bw -masm=att" } */
> > +/* { dg-final { scan-assembler-times "vpslldq\[ \t]\+\\\$5, \\(%\[a-z0-9,]*\\), %xmm\[0-9]\+" 1 } } */
> > +/* { dg-final { scan-assembler-times "vpsrldq\[ \t]\+\\\$5, \\(%\[a-z0-9,]*\\), %xmm\[0-9]\+" 1 } } */
> > +/* { dg-final { scan-assembler-times "vpslldq\[ \t]\+\\\$5, \\(%\[a-z0-9,]*\\), %ymm\[0-9]\+" 1 } } */
> > +/* { dg-final { scan-assembler-times "vpsrldq\[ \t]\+\\\$5, \\(%\[a-z0-9,]*\\), %ymm\[0-9]\+" 1 } } */
> > +/* { dg-final { scan-assembler-times "vpslldq\[ \t]\+\\\$5, \\(%\[a-z0-9,]*\\), %zmm\[0-9]\+" 1 } } */
> > +/* { dg-final { scan-assembler-times "vpsrldq\[ \t]\+\\\$5, \\(%\[a-z0-9,]*\\), %zmm\[0-9]\+" 1 } } */
> > +
> > +#include <x86intrin.h>
> > +
> > +__m512i f1 (__m512i *x) { return _mm512_bslli_epi128 (*x, 5); }
> > +__m512i f2 (__m512i *x) { return _mm512_bsrli_epi128 (*x, 5); }
> > +__m256i f3 (__m256i *x) { return _mm256_bslli_epi128 (*x, 5); }
> > +__m256i f4 (__m256i *x) { return _mm256_bsrli_epi128 (*x, 5); }
> > +__m128i f5 (__m128i *x) { return _mm_bslli_si128 (*x, 5); }
> > +__m128i f6 (__m128i *x) { return _mm_bsrli_si128 (*x, 5); }
> >
> >
> >         Jakub


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]