[PATCH][Arm] Auto-vectorization for MVE: vsub

Mon Sep 7 07:20:46 GMT 2020

Hi Ramana,

On 8/21/20 10:33 PM, Ramana Radhakrishnan wrote:
> On Mon, Aug 17, 2020 at 7:42 PM Dennis Zhang <Dennis.Zhang@arm.com> wrote:
>>
>>
>> Hi all,
>>
>> This patch enables MVE vsub instructions for auto-vectorization.
>> It adds RTL templates for MVE vsub instructions using 'minus' instead of
>> unspec expression to make the instructions recognizable for vectorization.
>> MVE target is added in sub<mode>3 optab. The sub<mode>3 optab is
>> modified to use a mode iterator that selects available modes for various
>> targets correspondingly.
>> MVE vector modes are enabled in arm_preferred_simd_mode in arm.c to
>> support vectorization.
>>
>> This patch also fixes 'vreinterpretq_*.c' MVE intrinsic tests. The tests
>> generate wrong instruction numbers because of unexpected icf optimization.
>> This bug is exposed by the MVE vector modes enabled in this patch,
>> therefore it is corrected in this patch to avoid test failures.
>>
>> MVE instructions are documented here:
>> https://developer.arm.com/architectures/instruction-sets/simd-isas/helium/helium-intrinsics
>>
> 
> Hi Dennis,
> 
> Thanks for this patch . However a quick read suggests  at first glance
> that it could do with some refactoring or indeed further breaking
> down.
> 
> 1. The refactor for TARGET_NEON_IWWMMXT and friends which I don't get
> the motivation for obviously on a quick read. I'll try and read that
> again. Please document why these complex TARGET_ macros exist and how
> they are expected to be used in the machine description and what they
> are indicated to do.

Thanks for the questions.
The macros are used in the iterators as conditions to enable modes 
separately for different targets. The reason to define these macros is 
to make the iterators short.
And about why using conditions for the iterators, the aim is to put 
different modes in a single expander. Otherwise the expander would 
repeat several times for different sets of modes supported by different 
targets.

> 2. It seems odd that we would have
>   "&& ((<MODE>mode != V2SFmode && <MODE>mode != V4SFmode)
> +    || flag_unsafe_math_optimizations))" apply to TARGET_NEON but not
> apply this to TARGET_MVE_FLOAT in the sub<mode>3 expander. The point
> is that if it isn't safe to vectorize a subtract for Neon, why is it
> safe to do the same for MVE ? This was done in 2010 by Julian to fix
> PR target/43703 - isn't this applicable on MVE as well ?

I agree with this after investigation. I've add 
flag_unsafe_math_optimizations fot MVE_FLOAT target.

> 3. I'm also going to quibble a bit about the use of VSEL as the name
> of an iterator as that conflates it with the instruction vsel and it's
> not obvious what's going on here.

I have changed the name to VNIM_COND, which means NONE, IWWMMXT and MVE 
according to conditions.
I've add comments to document the aim of the iterator.
Please let me know if you think it needs further fix.

> 
> 
>> This patch also fixes 'vreinterpretq_*.c' MVE intrinsic tests. The tests
>> generate wrong instruction numbers because of unexpected icf optimization.
>> This bug is exposed by the MVE vector modes enabled in this patch,
>> therefore it is corrected in this patch to avoid test failures.
>>
> 
> I'm a bit confused as to why this got exposed because of the new MVE
> vector modes exposed by this patch.

The aim of the tests is only to check the reinterpret intrinsics working 
well.
However the two functions in each test contain icf optimization pattern 
and then the second function is folded due to same code. The icf pattern 
is not expected but to make the test pass, the author only checked the 
instruction count for the first function.
With my patch that enables MVE vector modes in arm_preferred_simd_mode, 
the estimated code size is smaller so that the code is inlined from the 
first function back to the second one in inlining optimization after icf 
optimization. Then the instruction count changes.
Because the icf is not the expected pattern to be tested but causes 
above mentioned issues, -fno-ipa-icf is used to avoid unstable 
instruction count in these tests.

> 
>> The patch is regtested for arm-none-eabi and bootstrapped for
>> arm-none-linux-gnueabihf.
>>
> Bootstrapped and regression tested for arm-none-linux-gnueabihf with a
> --with-fpu=neon in the configuration ?

Yes, for arm-none-linux-gnueabihf bootstrap there is --with-fpu=neon.
Should I test it without this configuration?

The new patch is attached.
I updated the comments for the iterator and the macros.

Many thanks!
Dennis

gcc/ChangeLog:

2020-08-27  Dennis Zhang  <dennis.zhang@arm.com>

	* config/arm/arm.c (arm_preferred_simd_mode): Enable MVE vector modes.
	* config/arm/arm.h (TARGET_NEON_IWMMXT): New macro.
	(TARGET_NEON_IWMMXT_MVE, TARGET_NEON_IWMMXT_MVE_FP): Likewise.
	(TARGET_NEON_MVE_HFP): Likewise.
	* config/arm/iterators.md (VNIM_COND): New mode iterator to enable
	modes according to corresponding targets.
	* config/arm/mve.md (mve_vsubq<mode>): New entry for vsub instruction
	using expression 'minus'.
	(mve_vsubq_f<mode>): Use minus instead of VSUBQ_F unspec.
	* config/arm/neon.md (sub<mode>3): Removed here. Integrated in the
	sub<mode>3 in vec-common.md
	* config/arm/vec-common.md (sub<mode>3): Enable MVE target. Use VSEL
	to select available modes. Exclude TARGET_NEON_FP16INST from
	TARGET_NEON statement. Intergrate TARGET_NEON_FP16INST which is
	originally in neon.md.

gcc/testsuite/ChangeLog:

2020-08-27  Dennis Zhang  <dennis.zhang@arm.com>

	* gcc.target/arm/mve/intrinsics/vreinterpretq_f16.c: Use additional
	option -fno-ipa-icf and change the instruction count from 8 to 16.
	* gcc.target/arm/mve/intrinsics/vreinterpretq_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vreinterpretq_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vreinterpretq_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vreinterpretq_s64.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vreinterpretq_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vreinterpretq_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vreinterpretq_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vreinterpretq_u64.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vreinterpretq_u8.c: Likewise.
	* gcc.target/arm/mve/mve.exp: Include tests in subdir 'vect'.
	* gcc.target/arm/mve/vect/vect_sub_0.c: New test.
	* gcc.target/arm/mve/vect/vect_sub_1.c: New test.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: mve-vect-sub-20200902.patch
Type: text/x-patch
Size: 17481 bytes
Desc: mve-vect-sub-20200902.patch
URL: <https://gcc.gnu.org/pipermail/gcc-patches/attachments/20200907/6d015ad1/attachment-0001.bin>