[PATCH] ira: Scale save/restore costs of callee save registers with block frequency

Vladimir Makarov vmakarov@redhat.com
Thu Oct 5 16:16:17 GMT 2023


On 10/3/23 10:07, Surya Kumari Jangala wrote:
> ira: Scale save/restore costs of callee save registers with block frequency
>
> In assign_hard_reg(), when computing the costs of the hard registers, the
> cost of saving/restoring a callee-save hard register in prolog/epilog is
> taken into consideration. However, this cost is not scaled with the entry
> block frequency. Without scaling, the cost of saving/restoring is quite
> small and this can result in a callee-save register being chosen by
> assign_hard_reg() even though there are free caller-save registers
> available. Assigning a callee save register to a pseudo that is live
> in the entire function and across a call will cause shrink wrap to fail.

Thank you for addressing this part of code.  Sometimes changes looking 
obvious have unpredicted results.  I remember experimenting with 
different heuristics for this code long time ago when 32-bit x86 target 
was the major one and this was the best variant I found.  Since a lot of 
changes happened since then, I decided to benchmark your change.

This change is increasing x86-64 spec2017 code size by 0.67% in 
average.  The increase is very stable for 20 spec2017 benchmarks. Only 
code for bwaves is smaller (by 0.01%).  The specfp2017 performance is 
the same.  There is one positive impact, specin2017 improved by 0.6% 
(8.59 vs 8.54) mainly because of improvement of xalamcbmk (2.5%) and 
exchange (5%).

So I propose to make this change only when it is not an optimization for 
the code size.  Also please be prepared that there might be testsuite 
failures on other targets: some targets are overconstrained by tests 
expecting specific generated code.

> 2023-10-03  Surya Kumari Jangala  <jskumari@linux.ibm.com>
>
> gcc/
> 	PR rtl-optimization/111673
> 	* ira-color.cc (assign_hard_reg): Scale save/restore costs of
> 	callee save registers with block frequency.
>
> gcc/testsuite/
> 	PR rtl-optimization/111673
> 	* gcc.target/powerpc/pr111673/c: New test.
> ---
>
> diff --git a/gcc/ira-color.cc b/gcc/ira-color.cc
> index f2e8ea34152..eb20c52310d 100644
> --- a/gcc/ira-color.cc
> +++ b/gcc/ira-color.cc
> @@ -2175,7 +2175,8 @@ assign_hard_reg (ira_allocno_t a, bool retry_p)
>   	    add_cost = ((ira_memory_move_cost[mode][rclass][0]
>   		         + ira_memory_move_cost[mode][rclass][1])
>   		        * saved_nregs / hard_regno_nregs (hard_regno,
> -							  mode) - 1);
> +							  mode) - 1)
> +			* REG_FREQ_FROM_BB (ENTRY_BLOCK_PTR_FOR_FN (cfun));
>   	    cost += add_cost;
>   	    full_cost += add_cost;
>   	  }
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr111673.c b/gcc/testsuite/gcc.target/powerpc/pr111673.c
> new file mode 100644
> index 00000000000..e0c0f85460a
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr111673.c
> @@ -0,0 +1,17 @@
> +/* { dg-do compile { target lp64 } } */
> +/* { dg-options "-O2 -fdump-rtl-pro_and_epilogue" } */
> +
> +/* Verify there is an early return without the prolog and shrink-wrap
> +   the function. */
> +
> +int f (int);
> +int
> +advance (int dz)
> +{
> +  if (dz > 0)
> +    return (dz + dz) * dz;
> +  else
> +    return dz * f (dz);
> +}
> +
> +/* { dg-final { scan-rtl-dump-times "Performing shrink-wrapping" 1 "pro_and_epilogue" } } */
>



More information about the Gcc-patches mailing list