Bug 107674 - [11/12/13 Regressions] arm: MVE codegen regressions on VCTP and vector LDR/STR instructions
Summary: [11/12/13 Regressions] arm: MVE codegen regressions on VCTP and vector LDR/ST...
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: target (show other bugs)
Version: 12.2.1
: P3 normal
Target Milestone: 11.4
Assignee: Not yet assigned to anyone
URL:
Keywords: missed-optimization
Depends on:
Blocks:
 
Reported: 2022-11-14 11:00 UTC by Stam Markianos-Wright
Modified: 2023-04-18 15:12 UTC (History)
1 user (show)

See Also:
Host:
Target: arm*
Build:
Known to work:
Known to fail:
Last reconfirmed: 2022-11-14 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Stam Markianos-Wright 2022-11-14 11:00:13 UTC
We've found a couple of performance regressions with Arm MVE.  These can be seen here:
https://godbolt.org/z/onPjfW4zj

* Between GCC 11 and 12 we seem to have started emitting a strange vmrs/sxth/vmsr instruction sequence after the vctp instruction.  I suspect this is something to do with the introduction of MODE_VECTOR_BOOL during that period.

* Between GCC 12 and 13 we are no longer merging the pointer increments by #16  into the ldr/strs and we have some random movs that aren't needed either.  This also happened in GCC 11, but we want to keep the improved codegen of GCC 12 here ;)

  This looks like a change in register allocation:

      Choosing alt 0 in insn 24:  (0) =w  (1) Ux  (2) Up {mve_vldrhq_z_sv8hi}
      Creating newreg=149, assigning class CORE_REGS to INC/DEC result r149
      Creating newreg=150 from oldreg=134, assigning class VPR_REG to r150
bad vs good
      Choosing alt 0 in insn 24:  (0) =w  (1) Ux  (2) Up {mve_vldrhq_z_sv8hi}
      Creating newreg=149 from oldreg=134, assigning class VPR_REG to r149

Does anyone have any further ideas on why these may have changed or how to fix them?

Thanks!
Comment 1 GCC Commits 2023-02-02 10:02:23 UTC
The master branch has been updated by Andre Simoes Dias Vieira <avieira@gcc.gnu.org>:

https://gcc.gnu.org/g:75b58e77706e8b5057770f040005950940a9a0f5

commit r13-5646-g75b58e77706e8b5057770f040005950940a9a0f5
Author: Andre Vieira <andre.simoesdiasvieira@arm.com>
Date:   Thu Feb 2 10:00:57 2023 +0000

    arm: Fix sign of MVE predicate mve_pred16_t [PR 107674]
    
    The ACLE defines mve_pred16_t as an unsigned short.  This patch makes sure GCC
    treats the predicate as an unsigned type, rather than signed.
    
    gcc/ChangeLog:
    
            PR target/107674
            * config/arm/arm-builtins.cc (arm_simd_builtin_type): Rewrite to use
            new qualifiers parameter and use unsigned short type for MVE predicate.
            (arm_init_builtin): Call arm_simd_builtin_type with qualifiers
            parameter.
            (arm_init_crypto_builtins): Likewise.
    
    gcc/testsuite/ChangeLog:
    
            PR target/107674
            * gcc.target/arm/mve/mve_vpt.c: New test.
Comment 2 GCC Commits 2023-02-02 10:02:28 UTC
The master branch has been updated by Andre Simoes Dias Vieira <avieira@gcc.gnu.org>:

https://gcc.gnu.org/g:d45ec8a732f449647afa89e46b80a4e0614ec28d

commit r13-5647-gd45ec8a732f449647afa89e46b80a4e0614ec28d
Author: Andre Vieira <andre.simoesdiasvieira@arm.com>
Date:   Thu Feb 2 10:01:13 2023 +0000

    arm: Remove unnecessary zero-extending of MVE predicates before use [PR 107674]
    
    This patch teaches GCC that zero-extending a MVE predicate from 16-bits to
    32-bits and then only using 16-bits is a no-op.
    It does so in two steps:
    - it lets gcc know that it can access any MVE predicate mode using any other MVE
    predicate mode without needing to copy it, using the TARGET_MODES_TIEABLE_P hook,
    - it teaches simplify_subreg to optimize a subreg with a vector outermode, by
    replacing this outermode with a same-sized integer mode and trying the
    avalailable optimizations, then if successful it surrounds the result with a
    subreg casting it back to the original vector outermode.
    
    gcc/ChangeLog:
    
            PR target/107674
            * config/arm/arm.cc (arm_hard_regno_mode_ok): Use new MACRO.
            (arm_modes_tieable_p): Make MVE predicate modes tieable.
            * config/arm/arm.h (VALID_MVE_PRED_MODE):  New define.
            * simplify-rtx.cc (simplify_context::simplify_subreg): Teach
            simplify_subreg to simplify subregs where the outermode is not scalar.
    
    gcc/testsuite/ChangeLog:
    
            * gcc.target/arm/mve/mve_vpt.c: Change to remove unecessary zero-extend.
Comment 3 Stam Markianos-Wright 2023-04-04 10:36:28 UTC
Thank you, Andre for fixing the Part 1 in this ticket :)

The part 2 we've found to be a regression since r13-416-g485a0ae0982abe and is also the reason why the mve_*_memory_nodes tests are currently failing.
Comment 4 GCC Commits 2023-04-06 15:36:53 UTC
The master branch has been updated by Richard Earnshaw <rearnsha@gcc.gnu.org>:

https://gcc.gnu.org/g:ddc9b5ee13cd686c8674f92d46045563c06a23ea

commit r13-7114-gddc9b5ee13cd686c8674f92d46045563c06a23ea
Author: Richard Earnshaw <rearnsha@arm.com>
Date:   Thu Apr 6 14:44:30 2023 +0100

    arm: mve: fix auto-inc generation [PR107674]
    
    My change r13-416-g485a0ae0982abe caused the compiler to stop
    generating auto-inc operations on mve loads and stores.  The fix
    is to check whether there is a replacement register available
    when in strict mode and the register is still a pseudo.
    
    gcc:
    
            PR target/107674
            * config/arm/arm.cc (arm_effective_regno): New function.
            (mve_vector_mem_operand): Use it.
Comment 5 Stam Markianos-Wright 2023-04-18 15:12:08 UTC
Thanks Richard! I believe this is now fixed. This is likely not applicable for backporting as Andre's changes included some mid-end additions and it's only a missed-optimization regression -- hence closing this ticket.