[PATCH] Allow {nearby,r}int{,f} vectorization on x86 with sse4.1 and later (PR target/93078)

Jakub Jelinek jakub@redhat.com
Sat Dec 28 11:57:00 GMT 2019


On Sat, Dec 28, 2019 at 11:48:12AM +0100, Uros Bizjak wrote:
> On Sat, Dec 28, 2019 at 10:33 AM Jakub Jelinek <jakub@redhat.com> wrote:
> >
> > Hi!
> >
> > In i386.md, we have nearbyint<mode>2 and rint<mode>2 patterns that expand
> > SF/DF/XF mode patterns to rounding instructions.  For pre-sse4.1 that is
> > done using XFmode and so inappropriate for vectorization, but for sse4.1
> > and later we can just use the {,v}{round,rndscale}p{s,d} instructions
> > when we emit {,v}rounds{s,d} for SF/DF mode.
> 
> In i386-builtins.c, ix86_builtin_vectorized_function, we already have:
> 
> --cut here--
>     CASE_CFN_RINT:
>       /* The round insn does not trap on denormals.  */
>       if (flag_trapping_math || !TARGET_SSE4_1)
> break;
> 
>       if (out_mode == DFmode && in_mode == DFmode)
> {
>  if (out_n == 2 && in_n == 2)
>    return ix86_get_builtin (IX86_BUILTIN_RINTPD);
>  else if (out_n == 4 && in_n == 4)
>    return ix86_get_builtin (IX86_BUILTIN_RINTPD256);
> }
>       if (out_mode == SFmode && in_mode == SFmode)
> {
>  if (out_n == 4 && in_n == 4)
>    return ix86_get_builtin (IX86_BUILTIN_RINTPS);
>  else if (out_n == 8 && in_n == 8)
>    return ix86_get_builtin (IX86_BUILTIN_RINTPS256);
> }
>       break;
> --cut here--

Ok, will test removing that stuff, seems nothing in the headers uses that.

> which is converting rint functions to corresponding x86 builtin. If we
> want to go through generic path, then the above code is probably
> redundant and should be removed together with corresponding builtins.
> OTOH, the existing code also bails out for flag_trapping_math, so this
> condition should also be considered in named expanders.

The conditions are:
(define_expand "nearbyint<mode>2"
  [(use (match_operand:MODEF 0 "register_operand"))
   (use (match_operand:MODEF 1 "nonimmediate_operand"))]
  "(TARGET_USE_FANCY_MATH_387
    && (!(SSE_FLOAT_MODE_P (<MODE>mode) && TARGET_SSE_MATH)
          || TARGET_MIX_SSE_I387)
    && !flag_trapping_math)
   || (TARGET_SSE4_1 && TARGET_SSE_MATH)"
and:
(define_expand "rint<mode>2"
  [(use (match_operand:MODEF 0 "register_operand"))
   (use (match_operand:MODEF 1 "nonimmediate_operand"))]
  "TARGET_USE_FANCY_MATH_387
   || (SSE_FLOAT_MODE_P (<MODE>mode) && TARGET_SSE_MATH)"
Only nearbyint tests flag_trapping_math, and only for the pre-sse4.1 case,
with sse4.1 it is enabled regardless of that (just depends on
TARGET_SSE_MATH, but I think for vectorization we don't really test that,
vectorization is always done in sse*).

	Jakub



More information about the Gcc-patches mailing list