[GCC][PATCH][mid-end] Optimize x * copysign (1.0, y) [Patch (1/2)]

Mon Jun 12 16:27:00 GMT 2017

Richard Biener <rguenther@suse.de> writes:
> On Mon, 12 Jun 2017, Tamar Christina wrote:
>> Hi All,
>> 
>> this patch implements a optimization rewriting
>> 
>> x * copysign (1.0, y) and 
>> x * copysign (-1.0, y) 
>> 
>> to:
>> 
>> x ^ (y & (1 << sign_bit_position))
>> 
>> This is done by creating a special builtin during matching and generate the
>> appropriate instructions during expand. This new builtin is called XORSIGN.
>> 
>> The expansion of xorsign depends on if the backend has an appropriate optab
>> available. If this is not the case then we use a modified version of
>> the existing
>> copysign which does not take the abs value of the first argument as a
>> fall back.
>> 
>> This patch is a revival of a previous patch
>> https://gcc.gnu.org/ml/gcc-patches/2015-10/msg00069.html
>> 
>> Bootstrapped on both aarch64-none-linux-gnu and x86_64 with no issues.
>> Regression done on aarch64-none-linux-gnu and no regressions.
>> 
>> Ok for trunk?
>
> Without looking at the patch a few comments.
>
> First, nowadays please add an internal function instead of builtins.
> You can even take advantage of Richards work to directly tie those
> to optabs (he might want to chime in to tell you how).  You don't need
> the fortran FE changes in that case.

Yeah, it should just be a case of adding:

DEF_INTERNAL_OPTAB_FN (XORSIGN, ECF_CONST, xorsign, binary)

to internal-fn.def.  The supposedly useful thing about this is that it
automatically extends to vectors, so you shouldn't need the xorsign
vector builtins or the aarch64_builtin_vectorized_function change.

However, we don't yet support SLP vectorisation of internal functions.
I have a patch for that that I've been looking for an excuse to post
(at the moment I think it only helps SVE).  If this goes in I can
post it as a follow-on.

In:

> diff --git a/gcc/testsuite/gcc.dg/vec-xorsign_exec.c b/gcc/testsuite/gcc.dg/vec-xorsign_exec.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..f8c8befd336c7f2743a1621d3b0f53d78bab9df7
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vec-xorsign_exec.c
> @@ -0,0 +1,53 @@
> +/* { dg-do run } */
> +/* { dg-options "-O2 -ftree-vectorize -fdump-tree-vect-details" } */
> +/* { dg-additional-options "-march=armv8-a" { target { aarch64*-*-* } } }*/
> +
> +extern void abort ();
> +
> +#define N 16
> +float a[N] = {-0.1f, -3.2f, -6.3f, -9.4f,
> +	      -12.5f, -15.6f, -18.7f, -21.8f,
> +	      24.9f, 27.1f, 30.2f, 33.3f,
> +	      36.4f, 39.5f, 42.6f, 45.7f};
> +float b[N] = {-1.2f, 3.4f, -5.6f, 7.8f,
> +	      -9.0f, 1.0f, -2.0f, 3.0f,
> +	      -4.0f, -5.0f, 6.0f, 7.0f,
> +	      -8.0f, -9.0f, 10.0f, 11.0f};
> +float r[N];
> +
> +float ad[N] = {-0.1fd,  -3.2d,  -6.3d,  -9.4d,
> +               -12.5d, -15.6d, -18.7d, -21.8d,
> +                24.9d,  27.1d,  30.2d,  33.3d,
> +                36.4d,  39.5d,  42.6d, 45.7d};
> +float bd[N] = {-1.2d,  3.4d, -5.6d,  7.8d,
> +               -9.0d,  1.0d, -2.0d,  3.0d,
> +               -4.0d, -5.0d,  6.0d,  7.0d,
> +               -8.0d, -9.0d, 10.0d, 11.0d};
> +float rd[N];

Looks like these last three were meant to be doubles.

> +
> +int
> +main (void)
> +{
> +  int i;
> +
> +  for (i = 0; i < N; i++)
> +    r[i] = a[i] * _builtin_copysignf (1.0f, b[i]);
> +
> +  /* check results:  */
> +  for (i = 0; i < N; i++)
> +    if (r[i] != a[i] * __builtin_copysignf (1.0f, b[i]))
> +      abort ();
> +
> +  for (i = 0; i < N; i++)
> +    rd[i] = ad[i] * _builtin_copysignd (1.0d, bd[i]);
> +
> +  /* check results:  */
> +  for (i = 0; i < N; i++)
> +    if (r[i] != ad[i] * __builtin_copysignd (1.0d, bd[i]))
> +      abort ();
> +
> +
> +  return 0;
> +}
> +
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */

Why does only one loop get vectorised?

Thanks,
Richard