[GCC][PATCH][mid-end] Optimize x * copysign (1.0, y) [Patch (1/2)]
Richard Sandiford
richard.sandiford@linaro.org
Mon Jun 12 16:27:00 GMT 2017
Richard Biener <rguenther@suse.de> writes:
> On Mon, 12 Jun 2017, Tamar Christina wrote:
>> Hi All,
>>
>> this patch implements a optimization rewriting
>>
>> x * copysign (1.0, y) and
>> x * copysign (-1.0, y)
>>
>> to:
>>
>> x ^ (y & (1 << sign_bit_position))
>>
>> This is done by creating a special builtin during matching and generate the
>> appropriate instructions during expand. This new builtin is called XORSIGN.
>>
>> The expansion of xorsign depends on if the backend has an appropriate optab
>> available. If this is not the case then we use a modified version of
>> the existing
>> copysign which does not take the abs value of the first argument as a
>> fall back.
>>
>> This patch is a revival of a previous patch
>> https://gcc.gnu.org/ml/gcc-patches/2015-10/msg00069.html
>>
>> Bootstrapped on both aarch64-none-linux-gnu and x86_64 with no issues.
>> Regression done on aarch64-none-linux-gnu and no regressions.
>>
>> Ok for trunk?
>
> Without looking at the patch a few comments.
>
> First, nowadays please add an internal function instead of builtins.
> You can even take advantage of Richards work to directly tie those
> to optabs (he might want to chime in to tell you how). You don't need
> the fortran FE changes in that case.
Yeah, it should just be a case of adding:
DEF_INTERNAL_OPTAB_FN (XORSIGN, ECF_CONST, xorsign, binary)
to internal-fn.def. The supposedly useful thing about this is that it
automatically extends to vectors, so you shouldn't need the xorsign
vector builtins or the aarch64_builtin_vectorized_function change.
However, we don't yet support SLP vectorisation of internal functions.
I have a patch for that that I've been looking for an excuse to post
(at the moment I think it only helps SVE). If this goes in I can
post it as a follow-on.
In:
> diff --git a/gcc/testsuite/gcc.dg/vec-xorsign_exec.c b/gcc/testsuite/gcc.dg/vec-xorsign_exec.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..f8c8befd336c7f2743a1621d3b0f53d78bab9df7
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vec-xorsign_exec.c
> @@ -0,0 +1,53 @@
> +/* { dg-do run } */
> +/* { dg-options "-O2 -ftree-vectorize -fdump-tree-vect-details" } */
> +/* { dg-additional-options "-march=armv8-a" { target { aarch64*-*-* } } }*/
> +
> +extern void abort ();
> +
> +#define N 16
> +float a[N] = {-0.1f, -3.2f, -6.3f, -9.4f,
> + -12.5f, -15.6f, -18.7f, -21.8f,
> + 24.9f, 27.1f, 30.2f, 33.3f,
> + 36.4f, 39.5f, 42.6f, 45.7f};
> +float b[N] = {-1.2f, 3.4f, -5.6f, 7.8f,
> + -9.0f, 1.0f, -2.0f, 3.0f,
> + -4.0f, -5.0f, 6.0f, 7.0f,
> + -8.0f, -9.0f, 10.0f, 11.0f};
> +float r[N];
> +
> +float ad[N] = {-0.1fd, -3.2d, -6.3d, -9.4d,
> + -12.5d, -15.6d, -18.7d, -21.8d,
> + 24.9d, 27.1d, 30.2d, 33.3d,
> + 36.4d, 39.5d, 42.6d, 45.7d};
> +float bd[N] = {-1.2d, 3.4d, -5.6d, 7.8d,
> + -9.0d, 1.0d, -2.0d, 3.0d,
> + -4.0d, -5.0d, 6.0d, 7.0d,
> + -8.0d, -9.0d, 10.0d, 11.0d};
> +float rd[N];
Looks like these last three were meant to be doubles.
> +
> +int
> +main (void)
> +{
> + int i;
> +
> + for (i = 0; i < N; i++)
> + r[i] = a[i] * _builtin_copysignf (1.0f, b[i]);
> +
> + /* check results: */
> + for (i = 0; i < N; i++)
> + if (r[i] != a[i] * __builtin_copysignf (1.0f, b[i]))
> + abort ();
> +
> + for (i = 0; i < N; i++)
> + rd[i] = ad[i] * _builtin_copysignd (1.0d, bd[i]);
> +
> + /* check results: */
> + for (i = 0; i < N; i++)
> + if (r[i] != ad[i] * __builtin_copysignd (1.0d, bd[i]))
> + abort ();
> +
> +
> + return 0;
> +}
> +
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
Why does only one loop get vectorised?
Thanks,
Richard
More information about the Gcc-patches
mailing list