This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH][GCC][AArch64] Vectorise __builtin_signbit on aarch64
- From: Richard Sandiford <richard dot sandiford at arm dot com>
- To: Przemyslaw Wirkus <Przemyslaw dot Wirkus at arm dot com>
- Cc: "gcc-patches\@gcc.gnu.org" <gcc-patches at gcc dot gnu dot org>, nd <nd at arm dot com>, Richard Earnshaw <Richard dot Earnshaw at arm dot com>, James Greenhalgh <James dot Greenhalgh at arm dot com>, Marcus Shawcroft <Marcus dot Shawcroft at arm dot com>
- Date: Fri, 22 Mar 2019 11:18:50 +0000
- Subject: Re: [PATCH][GCC][AArch64] Vectorise __builtin_signbit on aarch64
- References: <VI1PR0801MB2062E38C8E95A28B004FCFF4E4420@VI1PR0801MB2062.eurprd08.prod.outlook.com>
Hi,
Przemyslaw Wirkus <Przemyslaw.Wirkus@arm.com> writes:
> Hi all,
>
> Vectorise __builtin_signbit (v4sf) with unsigned shift right vector
> instruction.
>
> Bootstrapped and tested on aarch64-none-linux-gnu.
>
> Assembly output for:
> $ aarch64-elf-gcc -S -O3 signbitv4sf.c -dp
>
> Before patch:
>
> foo:
> adrp x3, in // 37 [c=4 l=4] *movdi_aarch64/12
> adrp x2, out // 40 [c=4 l=4] *movdi_aarch64/12
> add x3, x3, :lo12:in // 39 [c=4 l=4] add_losym_di
> add x2, x2, :lo12:out // 42 [c=4 l=4] add_losym_di
> mov x0, 0 // 3 [c=4 l=4] *movdi_aarch64/3
> .p2align 3,,7
> .L2:
> ldr w1, [x3, x0] // 10 [c=16 l=4] *zero_extendsidi2_aarch64/1
> and w1, w1, -2147483648 // 11 [c=4 l=4] andsi3/1
> str w1, [x2, x0] // 16 [c=4 l=4] *movsi_aarch64/8
> add x0, x0, 4 // 17 [c=4 l=4] *adddi3_aarch64/0
> cmp x0, 4096 // 19 [c=4 l=4] cmpdi/1
> bne .L2 // 20 [c=4 l=4] condjump
> ret // 50 [c=0 l=4] *do_return
>
> After patch:
>
> foo:
> adrp x2, in // 36 [c=4 l=4] *movdi_aarch64/12
> adrp x1, out // 39 [c=4 l=4] *movdi_aarch64/12
> add x2, x2, :lo12:in // 38 [c=4 l=4] add_losym_di
> add x1, x1, :lo12:out // 41 [c=4 l=4] add_losym_di
> mov x0, 0 // 3 [c=4 l=4] *movdi_aarch64/3
> .p2align 3,,7
> .L2:
> ldr q0, [x2, x0] // 10 [c=8 l=4] *aarch64_simd_movv4sf/0
> ushr v0.4s, v0.4s, 31 // 11 [c=12 l=4] aarch64_simd_lshrv4si
> str q0, [x1, x0] // 15 [c=4 l=4] *aarch64_simd_movv4si/2
> add x0, x0, 16 // 16 [c=4 l=4] *adddi3_aarch64/0
> cmp x0, 4096 // 18 [c=4 l=4] cmpdi/1
> bne .L2 // 19 [c=4 l=4] condjump
> ret // 49 [c=0 l=4] *do_return
>
> Thanks,
> Przemyslaw
>
> gcc/ChangeLog:
>
> 2019-03-20 Przemyslaw Wirkus <przemyslaw.wirkus@arm.com>
>
> * config/aarch64/aarch64-builtins.c
> (aarch64_builtin_vectorized_function): Added CASE_CFN_SIGNBIT.
> * config/aarch64/aarch64-simd-builtins.def: (signbit)
> Extend to V4SF mode.
> * config/aarch64/aarch64-simd.md (signbitv4sf2): New expand
> defined.
I think it'd be better to add a new IFN_SIGNBIT internal function
that maps to signbit_optab. That way the compiler will know what
the vector function does and there'll be no need to add a new
built-in function.
Thanks,
Richard