[PATCH][Arm] Auto-vectorization for MVE: vsub
Dennis Zhang
Dennis.Zhang@arm.com
Mon Sep 7 07:20:46 GMT 2020
Hi Ramana,
On 8/21/20 10:33 PM, Ramana Radhakrishnan wrote:
> On Mon, Aug 17, 2020 at 7:42 PM Dennis Zhang <Dennis.Zhang@arm.com> wrote:
>>
>>
>> Hi all,
>>
>> This patch enables MVE vsub instructions for auto-vectorization.
>> It adds RTL templates for MVE vsub instructions using 'minus' instead of
>> unspec expression to make the instructions recognizable for vectorization.
>> MVE target is added in sub<mode>3 optab. The sub<mode>3 optab is
>> modified to use a mode iterator that selects available modes for various
>> targets correspondingly.
>> MVE vector modes are enabled in arm_preferred_simd_mode in arm.c to
>> support vectorization.
>>
>> This patch also fixes 'vreinterpretq_*.c' MVE intrinsic tests. The tests
>> generate wrong instruction numbers because of unexpected icf optimization.
>> This bug is exposed by the MVE vector modes enabled in this patch,
>> therefore it is corrected in this patch to avoid test failures.
>>
>> MVE instructions are documented here:
>> https://developer.arm.com/architectures/instruction-sets/simd-isas/helium/helium-intrinsics
>>
>
> Hi Dennis,
>
> Thanks for this patch . However a quick read suggests at first glance
> that it could do with some refactoring or indeed further breaking
> down.
>
> 1. The refactor for TARGET_NEON_IWWMMXT and friends which I don't get
> the motivation for obviously on a quick read. I'll try and read that
> again. Please document why these complex TARGET_ macros exist and how
> they are expected to be used in the machine description and what they
> are indicated to do.
Thanks for the questions.
The macros are used in the iterators as conditions to enable modes
separately for different targets. The reason to define these macros is
to make the iterators short.
And about why using conditions for the iterators, the aim is to put
different modes in a single expander. Otherwise the expander would
repeat several times for different sets of modes supported by different
targets.
> 2. It seems odd that we would have
> "&& ((<MODE>mode != V2SFmode && <MODE>mode != V4SFmode)
> + || flag_unsafe_math_optimizations))" apply to TARGET_NEON but not
> apply this to TARGET_MVE_FLOAT in the sub<mode>3 expander. The point
> is that if it isn't safe to vectorize a subtract for Neon, why is it
> safe to do the same for MVE ? This was done in 2010 by Julian to fix
> PR target/43703 - isn't this applicable on MVE as well ?
I agree with this after investigation. I've add
flag_unsafe_math_optimizations fot MVE_FLOAT target.
> 3. I'm also going to quibble a bit about the use of VSEL as the name
> of an iterator as that conflates it with the instruction vsel and it's
> not obvious what's going on here.
I have changed the name to VNIM_COND, which means NONE, IWWMMXT and MVE
according to conditions.
I've add comments to document the aim of the iterator.
Please let me know if you think it needs further fix.
>
>
>> This patch also fixes 'vreinterpretq_*.c' MVE intrinsic tests. The tests
>> generate wrong instruction numbers because of unexpected icf optimization.
>> This bug is exposed by the MVE vector modes enabled in this patch,
>> therefore it is corrected in this patch to avoid test failures.
>>
>
> I'm a bit confused as to why this got exposed because of the new MVE
> vector modes exposed by this patch.
The aim of the tests is only to check the reinterpret intrinsics working
well.
However the two functions in each test contain icf optimization pattern
and then the second function is folded due to same code. The icf pattern
is not expected but to make the test pass, the author only checked the
instruction count for the first function.
With my patch that enables MVE vector modes in arm_preferred_simd_mode,
the estimated code size is smaller so that the code is inlined from the
first function back to the second one in inlining optimization after icf
optimization. Then the instruction count changes.
Because the icf is not the expected pattern to be tested but causes
above mentioned issues, -fno-ipa-icf is used to avoid unstable
instruction count in these tests.
>
>> The patch is regtested for arm-none-eabi and bootstrapped for
>> arm-none-linux-gnueabihf.
>>
> Bootstrapped and regression tested for arm-none-linux-gnueabihf with a
> --with-fpu=neon in the configuration ?
Yes, for arm-none-linux-gnueabihf bootstrap there is --with-fpu=neon.
Should I test it without this configuration?
The new patch is attached.
I updated the comments for the iterator and the macros.
Many thanks!
Dennis
gcc/ChangeLog:
2020-08-27 Dennis Zhang <dennis.zhang@arm.com>
* config/arm/arm.c (arm_preferred_simd_mode): Enable MVE vector modes.
* config/arm/arm.h (TARGET_NEON_IWMMXT): New macro.
(TARGET_NEON_IWMMXT_MVE, TARGET_NEON_IWMMXT_MVE_FP): Likewise.
(TARGET_NEON_MVE_HFP): Likewise.
* config/arm/iterators.md (VNIM_COND): New mode iterator to enable
modes according to corresponding targets.
* config/arm/mve.md (mve_vsubq<mode>): New entry for vsub instruction
using expression 'minus'.
(mve_vsubq_f<mode>): Use minus instead of VSUBQ_F unspec.
* config/arm/neon.md (sub<mode>3): Removed here. Integrated in the
sub<mode>3 in vec-common.md
* config/arm/vec-common.md (sub<mode>3): Enable MVE target. Use VSEL
to select available modes. Exclude TARGET_NEON_FP16INST from
TARGET_NEON statement. Intergrate TARGET_NEON_FP16INST which is
originally in neon.md.
gcc/testsuite/ChangeLog:
2020-08-27 Dennis Zhang <dennis.zhang@arm.com>
* gcc.target/arm/mve/intrinsics/vreinterpretq_f16.c: Use additional
option -fno-ipa-icf and change the instruction count from 8 to 16.
* gcc.target/arm/mve/intrinsics/vreinterpretq_f32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vreinterpretq_s16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vreinterpretq_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vreinterpretq_s64.c: Likewise.
* gcc.target/arm/mve/intrinsics/vreinterpretq_s8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vreinterpretq_u16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vreinterpretq_u32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vreinterpretq_u64.c: Likewise.
* gcc.target/arm/mve/intrinsics/vreinterpretq_u8.c: Likewise.
* gcc.target/arm/mve/mve.exp: Include tests in subdir 'vect'.
* gcc.target/arm/mve/vect/vect_sub_0.c: New test.
* gcc.target/arm/mve/vect/vect_sub_1.c: New test.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: mve-vect-sub-20200902.patch
Type: text/x-patch
Size: 17481 bytes
Desc: mve-vect-sub-20200902.patch
URL: <https://gcc.gnu.org/pipermail/gcc-patches/attachments/20200907/6d015ad1/attachment-0001.bin>
More information about the Gcc-patches
mailing list