[PATCH] i386: Generate standard floating point scalar operation patterns

Tue May 21 15:54:00 GMT 2019

On Wed, May 15, 2019 at 2:29 PM Richard Sandiford
<richard.sandiford@arm.com> wrote:
>
> "H.J. Lu" <hjl.tools@gmail.com> writes:
> > On Thu, Feb 7, 2019 at 9:49 AM H.J. Lu <hjl.tools@gmail.com> wrote:
> >>
> >> Standard scalar operation patterns which preserve the rest of the vector
> >> look like
> >>
> >>      (vec_merge:V2DF
> >>        (vec_duplicate:V2DF
> >>          (op:DF (vec_select:DF (reg/v:V2DF 85 [ x ])
> >>                 (parallel [ (const_int 0 [0])]))
> >>          (reg:DF 87))
> >>        (reg/v:V2DF 85 [ x ])
> >>        (const_int 1 [0x1])]))
> >>
> >> Add such pattens to i386 backend and convert VEC_CONCAT patterns to
> >> standard standard scalar operation patterns.
>
> It looks like there's some variety in the patterns used, e.g.:
>
> (define_insn "<sse>_vm<code><mode>3<mask_scalar_name><round_saeonly_scalar_name>"
>   [(set (match_operand:VF_128 0 "register_operand" "=x,v")
>         (vec_merge:VF_128
>           (smaxmin:VF_128
>             (match_operand:VF_128 1 "register_operand" "0,v")
>             (match_operand:VF_128 2 "vector_operand" "xBm,<round_saeonly_scalar_constraint>"))
>          (match_dup 1)
>          (const_int 1)))]
>   "TARGET_SSE"
>   "@
>    <maxmin_float><ssescalarmodesuffix>\t{%2, %0|%0, %<iptr>2}
>    v<maxmin_float><ssescalarmodesuffix>\t{<round_saeonly_scalar_mask_op3>%2, %1, %0<mask_scalar_operand3>|%0<mask_scalar_operand3>, %1, %<iptr>2<round_saeonly_scalar_mask_op3>}"
>   [(set_attr "isa" "noavx,avx")
>    (set_attr "type" "sse")
>    (set_attr "btver2_sse_attr" "maxmin")
>    (set_attr "prefix" "<round_saeonly_scalar_prefix>")
>    (set_attr "mode" "<ssescalarmode>")])
>
> makes the operand a full vector operation, which seems simpler.

This pattern is used to implement scalar smaxmin intrinsics.

> The above would then be:
>
>       (vec_merge:V2DF
>         (op:V2DF
>           (reg:V2DF 85)
>           (vec_duplicate:V2DF (reg:DF 87)))
>         (reg/v:V2DF 85 [ x ])
>         (const_int 1 [0x1])]))
>
> I guess technically the two have different faulting behaviour though,
> since the smaxmin gets applied to all elements, not just element 0.

This is the issue.   We don't use the correct mode for scalar instructions:

---
#include <immintrin.h>

__m128d
foo1 (__m128d x, double *p)
{
  __m128d y = _mm_load_sd (p);
  return _mm_max_pd (x, y);
}
---

movq (%rdi), %xmm1
maxpd %xmm1, %xmm0
ret

Here is the updated patch to add standard floating point scalar
operation patterns to i386 backend.    Then we can do

---
#include <immintrin.h>

extern __inline __m128d __attribute__((__gnu_inline__,
__always_inline__, __artificial__))
_new_mm_max_pd (__m128d __A, __m128d __B)
{
  __A[0] = __A[0] > __B[0] ? __A[0] : __B[0];
  return __A;
}

__m128d
foo2 (__m128d x, double *p)
{
  __m128d y = _mm_load_sd (p);
  return _new_mm_max_pd (x, y);
}

maxsd (%rdi), %xmm0
ret

We should use generic vector operations to implement i386 intrinsics
as much as we can.

> The patch seems very specific.  E.g. why just PLUS, MINUS, MULT and DIV?

This patch only adds  +, -, *, /, > and <.    We can add more if there
are testcases
for them.

> Thanks,
> Richard
>
>
> >>
> >> gcc/
> >>
> >>         PR target/54855
> >>         * simplify-rtx.c (simplify_binary_operation_1): Convert
> >>         VEC_CONCAT patterns to standard standard scalar operation
> >>         patterns.
> >>         * config/i386/sse.md (*<sse>_vm<plusminus_insn><mode>3): New.
> >>         (*<sse>_vm<multdiv_mnemonic><mode>3): Likewise.
> >>
> >> gcc/testsuite/
> >>
> >>         PR target/54855
> >>         * gcc.target/i386/pr54855-1.c: New test.
> >>         * gcc.target/i386/pr54855-2.c: Likewise.
> >>         * gcc.target/i386/pr54855-3.c: Likewise.
> >>         * gcc.target/i386/pr54855-4.c: Likewise.
> >>         * gcc.target/i386/pr54855-5.c: Likewise.
> >>         * gcc.target/i386/pr54855-6.c: Likewise.
> >>         * gcc.target/i386/pr54855-7.c: Likewise.
> >
> > PING:
> >
> > https://gcc.gnu.org/ml/gcc-patches/2019-02/msg00398.html

Thanks.

-- 
H.J.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-i386-Generate-standard-floating-point-scalar-operati.patch
Type: application/x-patch
Size: 14321 bytes
Desc: not available
URL: <http://gcc.gnu.org/pipermail/gcc-patches/attachments/20190521/19ac269b/attachment.bin>