This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [ARM] ARM NEON support part 1/7: VFPv3 support
- From: Richard Earnshaw <rearnsha at arm dot com>
- To: Julian Brown <julian at codesourcery dot com>
- Cc: GCC Patches <gcc-patches at gcc dot gnu dot org>, Paul Brook <paul at codesourcery dot com>
- Date: Thu, 07 Jun 2007 18:12:42 +0100
- Subject: Re: [ARM] ARM NEON support part 1/7: VFPv3 support
- References: <4661DE3A.20307@codesourcery.com>
On Sat, 2007-06-02 at 22:16 +0100, Julian Brown wrote:
> This series of patches adds support for ARM's "Advanced SIMD Extension"
> NEON, as well as version 3 of the VFP architecture and scheduling
> support for ARM's Cortex-A8 core. The first three patches form the bulk
> of the implementation, and the remaining four patches provide
> incremental improvements.
>
> The first patch adds support for the VFPv3 instruction set. There are
> mainly two features added, one being an extended register set for
> double-precision registers (32 up from 16), the second being added
> immediate-constant loading instructions "fconsts" and "fconstd". The
> special handling of registers D0-D7 isn't actually required for VFPv3,
> but is needed for the follow-up NEON patch.
>
> (The patch series has been tested together with no regressions,
> targetting arm-none-eabi. See final part for further test information).
>
> OK?
>
OK, if you address the points below.
R.
> Julian
>
> ChangeLog (vfpv3-support)
>
> Julian Brown <julian@codesourcery.com>
>
> gcc/
> (arm_print_operand): Implement new code 'G' for VFPv3 floating-point
> constants, represented as a integer indices.
^^
Not needed.
> --- .pc/vfpv3-support/gcc/config/arm/aout.h 2007-06-02 13:45:47.000000000 -0700
> +++ gcc/config/arm/aout.h 2007-06-02 13:46:00.000000000 -0700
> @@ -68,6 +68,10 @@
> "s8", "s9", "s10", "s11", "s12", "s13", "s14", "s15", \
> "s16", "s17", "s18", "s19", "s20", "s21", "s22", "s23", \
> "s24", "s25", "s26", "s27", "s28", "s29", "s30", "s31", \
> + "d16", "?16", "d17", "?17", "d18", "?18", "d19", "?19", \
> + "d20", "?20", "d21", "?21", "d22", "?22", "d23", "?23", \
> + "d24", "?24", "d25", "?25", "d26", "?26", "d27", "?27", \
> + "d28", "?28", "d29", "?29", "d30", "?30", "d31", "?31", \
> "vfpcc" \
> }
I think the ?<num> registers deserve a comment.
> @@ -8808,6 +8913,17 @@ vfp_output_fldmd (FILE * stream, unsigne
> count++;
> }
>
> + /* FLDMD may not load more than 16 doubleword registers at a time. Split the
> + load into multiple parts if we have to handle more than 16 registers.
> + FIXME: This will increase the maximum size of the epilogue, which will
> + need altering elsewhere. */
Either this should be fixed, or this comment should be removed.
> #define FIRST_VFP_REGNUM 63
> -#define LAST_VFP_REGNUM 94
> +#define D7_VFP_REGNUM 78 /* Registers 77 and 78 == VFP reg D7. */
> +#define LAST_VFP_REGNUM (TARGET_VFP3 ? 126 : 94)
Given that 94 is the same as LAST_LO_VFP_REGNUM (below), I think this
macro should be defined in terms of it (and another for the top of the
range).
> #define IS_VFP_REGNUM(REGNUM) \
> (((REGNUM) >= FIRST_VFP_REGNUM) && ((REGNUM) <= LAST_VFP_REGNUM))
>
> +/* VFP registers are split into two types: those defined by VFP versions < 3
> + have D registers overlaid on consecutive pairs of S registers. VFP version 3
> + defines 16 new D registers (d16-d31) which, for simplicity and correctness
> + in various parts of the backend, we implement as "fake" single-precision
> + registers (which would be S32-S63, but cannot be used in that way). The
> + following macros define these ranges of registers. */
> +#define LAST_LO_VFP_REGNUM 94
> +#define FIRST_HI_VFP_REGNUM 95
> @@ -958,24 +995,33 @@ extern int arm_structure_size_boundary;
> function parameters. It is quite good to use lr since other calls may
> clobber it anyway. Allocate r0 through r3 in reverse order since r3 is
> least likely to contain a function parameter; in addition results are
> - returned in r0. */
> + returned in r0.
> + For VFP/VFPv3, allocate caller-saved registers first (D0-D7), then D16-D31,
> + then D8-D15. The reason for doing this is to attempt to reduce register
> + pressure when both single- and double-precision registers are used in a
> + function, but hopefully not force double-precision registers to be
> + callee-saved when it's not necessary. */
>
Hmm, minor point, but shouldn't the HIGH DP registers be used before
D0-D7? That should give better code if both SP and DP are needed...