[Patch, ARM] New feature to minimize the literal load for armv7-m target

Wed Nov 20 15:58:00 GMT 2013

On 06/11/13 06:10, Terry Guo wrote:
> Hi,
> 
> This patch intends to minimize the use of literal pool for some armv7-m
> targets that have slower speed to load data from flash than to fetch
> instruction from flash. The normal literal load instruction is now replaced
> by MOVW/MOVT instructions. A new option -mslow-flash-data is created for
> this purpose. So far this feature doesn't support PIC code and target that
> isn't based on armv7-m.
> 
> Tested with GCC regression test on QEMU for cortex-m3. No new regressions.
> Is it OK to trunk?
> 
> BR,
> Terry
> 
> 2013-11-06  Terry Guo  <terry.guo@arm.com>
> 
>                  * doc/invoke.texi (-mslow-flash-data): Document new option.
>                  * config/arm/arm.opt (mslow-flash-data): New option.
>                  * config/arm/arm-protos.h
> (arm_max_const_double_inline_cost): Declare it.
>                  * config/arm/arm.h (TARGET_USE_MOVT): Always true when
> disable literal pools.
literal pools are disabled.

>                  (arm_disable_literal_pool): Declare it.
>                  * config/arm/arm.c (arm_disable_literal_pool): New
> variable.
>                  (arm_option_override): Handle new option.
>                  (thumb2_legitimate_address_p): Invalid certain address
> format.

Invalidate.  What address formats?

>                  (arm_max_const_double_inline_cost): New function.
>                  * config/arm/arm.md (types.md): Include it a little
> earlier.

Include it before ...

>                  (use_literal_pool): New attribute.
>                  (enabled): Use new attribute.
>                  (split pattern): Replace symbol+offset with MOVW/MOVT.
> 
> 

Comments inline.

> diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
> index 1781b75..25927a1 100644
> --- a/gcc/config/arm/arm.h
> +++ b/gcc/config/arm/arm.h
> @@ -554,6 +556,9 @@ extern int arm_arch_thumb_hwdiv;
>     than core registers.  */
>  extern int prefer_neon_for_64bits;
>  
> +/* Nonzero if shouldn't use literal pool in generated code.  */
'if we shouldn't use literal pools'

> +extern int arm_disable_literal_pool;

This should be a bool, values stored in it should be true/false not 1/0.

> diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
> index 78554e8..de2a9c0 100644
> --- a/gcc/config/arm/arm.c
> +++ b/gcc/config/arm/arm.c
> @@ -864,6 +864,9 @@ int arm_arch_thumb_hwdiv;
>     than core registers.  */
>  int prefer_neon_for_64bits = 0;
>  
> +/* Nonzero if shouldn't use literal pool in generated code.  */
> +int arm_disable_literal_pool = 0;

Similar comments to above.

> @@ -6348,6 +6361,25 @@ thumb2_legitimate_address_p (enum machine_mode mode, rtx x, int strict_p)
>  		  && thumb2_legitimate_index_p (mode, xop0, strict_p)));
>      }
>  
> +  /* Normally we can assign constant values to its target register without
'to target registers'

> +     the help of constant pool.  But there are cases we have to use constant
> +     pool like:
> +     1) assign a label to register.
> +     2) sign-extend a 8bit value to 32bit and then assign to register.
> +
> +     Constant pool access in format:
> +     (set (reg r0) (mem (symbol_ref (".LC0"))))
> +     will cause the use of literal pool (later in function arm_reorg).
> +     So here we mark such format as an invalid format, then compiler
'then the compiler'

> @@ -16114,6 +16146,18 @@ push_minipool_fix (rtx insn, HOST_WIDE_INT address, rtx *loc,
>    minipool_fix_tail = fix;
>  }
>  
> +/* Return maximum allowed cost of synthesizing a 64-bit constant VAL inline.
> +   Returns 99 if we always want to synthesize the value.  */

Needs to mention that the cost is in terms of 'insns' (see the function
below it).

> +int
> +arm_max_const_double_inline_cost ()
> +{
> +  /* Let the value get synthesized to avoid the use of literal pools.  */
> +  if (arm_disable_literal_pool)
> +    return 99;
> +
> +  return ((optimize_size || arm_ld_sched) ? 3 : 4);
> +}
> +
>  /* Return the cost of synthesizing a 64-bit constant VAL inline.
>     Returns the number of insns needed, or 99 if we don't know how to
>     do it.  */

> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index adbc45b..a5991cb 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -534,6 +534,7 @@ Objective-C and Objective-C++ Dialects}.
>  -mfix-cortex-m3-ldrd @gol
>  -munaligned-access @gol
>  -mneon-for-64bits @gol
> +-mslow-flash-data @gol
>  -mrestrict-it}
>  
>  @emph{AVR Options}
> @@ -12295,6 +12296,12 @@ Enables using Neon to handle scalar 64-bits operations. This is
>  disabled by default since the cost of moving data from core registers
>  to Neon is high.
>  
> +@item -mslow-flash-data
> +@opindex mslow-flash-data
> +Assume loading data from flash is slower than fetching instruction.
> +Therefore literal load is minimized for better performance.
> +This option is off by default.
> +

Needs to mention the limitation on which processors can support this, ie
only v7 m-profile.

R.