This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [ARM] ARM NEON support part 1/7: VFPv3 support


On Sat, 2007-06-02 at 22:16 +0100, Julian Brown wrote:
> This series of patches adds support for ARM's "Advanced SIMD Extension" 
> NEON, as well as version 3 of the VFP architecture and scheduling 
> support for ARM's Cortex-A8 core. The first three patches form the bulk 
> of the implementation, and the remaining four patches provide 
> incremental improvements.
> 
> The first patch adds support for the VFPv3 instruction set. There are 
> mainly two features added, one being an extended register set for 
> double-precision registers (32 up from 16), the second being added 
> immediate-constant loading instructions "fconsts" and "fconstd". The 
> special handling of registers D0-D7 isn't actually required for VFPv3, 
> but is needed for the follow-up NEON patch.
> 
> (The patch series has been tested together with no regressions, 
> targetting arm-none-eabi. See final part for further test information).
> 
> OK?
> 

OK, if you address the points below.

R.

> Julian
> 
> ChangeLog (vfpv3-support)
> 
> 	Julian Brown  <julian@codesourcery.com>
> 
>      gcc/

>      (arm_print_operand): Implement new code 'G' for VFPv3 floating-point
>      constants, represented as a integer indices.
                                ^^
Not needed.


> --- .pc/vfpv3-support/gcc/config/arm/aout.h	2007-06-02 13:45:47.000000000 -0700
> +++ gcc/config/arm/aout.h	2007-06-02 13:46:00.000000000 -0700
> @@ -68,6 +68,10 @@
>    "s8",  "s9",  "s10", "s11", "s12", "s13", "s14", "s15", \
>    "s16", "s17", "s18", "s19", "s20", "s21", "s22", "s23", \
>    "s24", "s25", "s26", "s27", "s28", "s29", "s30", "s31", \
> +  "d16", "?16", "d17", "?17", "d18", "?18", "d19", "?19", \
> +  "d20", "?20", "d21", "?21", "d22", "?22", "d23", "?23", \
> +  "d24", "?24", "d25", "?25", "d26", "?26", "d27", "?27", \
> +  "d28", "?28", "d29", "?29", "d30", "?30", "d31", "?31", \
>    "vfpcc"					   \
>  }

I think the ?<num> registers deserve a comment.

> @@ -8808,6 +8913,17 @@ vfp_output_fldmd (FILE * stream, unsigne
>        count++;
>      }
>  
> +  /* FLDMD may not load more than 16 doubleword registers at a time. Split the
> +     load into multiple parts if we have to handle more than 16 registers.
> +     FIXME: This will increase the maximum size of the epilogue, which will
> +     need altering elsewhere.  */

Either this should be fixed, or this comment should be removed.


>  #define FIRST_VFP_REGNUM	63
> -#define LAST_VFP_REGNUM		94
> +#define D7_VFP_REGNUM		78  /* Registers 77 and 78 == VFP reg D7.  */
> +#define LAST_VFP_REGNUM		(TARGET_VFP3 ? 126 : 94)

Given that 94 is the same as LAST_LO_VFP_REGNUM (below), I think this
macro should be defined in terms of it (and another for the top of the
range).

>  #define IS_VFP_REGNUM(REGNUM) \
>    (((REGNUM) >= FIRST_VFP_REGNUM) && ((REGNUM) <= LAST_VFP_REGNUM))
>  
> +/* VFP registers are split into two types: those defined by VFP versions < 3
> +   have D registers overlaid on consecutive pairs of S registers. VFP version 3
> +   defines 16 new D registers (d16-d31) which, for simplicity and correctness
> +   in various parts of the backend, we implement as "fake" single-precision
> +   registers (which would be S32-S63, but cannot be used in that way).  The
> +   following macros define these ranges of registers.  */
> +#define LAST_LO_VFP_REGNUM	94
> +#define FIRST_HI_VFP_REGNUM	95


> @@ -958,24 +995,33 @@ extern int arm_structure_size_boundary;
>     function parameters.  It is quite good to use lr since other calls may
>     clobber it anyway.  Allocate r0 through r3 in reverse order since r3 is
>     least likely to contain a function parameter; in addition results are
> -   returned in r0.  */
> +   returned in r0.
> +   For VFP/VFPv3, allocate caller-saved registers first (D0-D7), then D16-D31,
> +   then D8-D15.  The reason for doing this is to attempt to reduce register
> +   pressure when both single- and double-precision registers are used in a
> +   function, but hopefully not force double-precision registers to be
> +   callee-saved when it's not necessary. */
>  

Hmm, minor point, but shouldn't the HIGH DP registers be used before
D0-D7?  That should give better code if both SP and DP are needed...






Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]