[PATCH] optimize costly division in rtx_cost

Alexander Monakov amonakov@ispras.ru
Mon Jan 20 09:37:00 GMT 2020


Ping.

On Sun, 5 Jan 2020, Alexander Monakov wrote:

> Hi,
> 
> I noticed there's a costly signed 64-bit division in rtx_cost on x86 as well as
> any other target where UNITS_PER_WORD is implemented like TARGET_64BIT ? 8 : 4.
> It's also evident that rtx_cost does redundant work for a SET rtx argument.
> 
> Obviously the variable named 'factor' rarely exceeds 1, so in the majority of
> cases it can be computed with a well-predictable branch rather than a division.
> 
> This patch makes rtx_cost do the division only in case mode is wider than
> UNITS_PER_WORD, and also moves a test for a SET up front to avoid redundancy.
> No functional change.
> 
> Bootstrapped on x86_64, ok for trunk?
> 
> To illustrate the improvement this buys, for tramp3d -O2 compilation, I got
>     
>     before:
>            73887675319      instructions:u
>     
>            72438432200      cycles:u
>              924298569      idq.ms_uops:u
>           102603799255      uops_dispatched.thread:u
>     
>     after:
>            73888371724      instructions:u
>     
>            72386986612      cycles:u
>              802744775      idq.ms_uops:u
>           102096987220      uops_dispatched.thread:u
> 
> (this is on Sandybridge, idq.ms_uops are uops going via the microcode sequencer,
> so the unneeded division is responsible for a good fraction of them)
> 
> 	* rtlanal.c (rtx_cost): Handle a SET up front. Avoid division if the
> 	mode is not wider than UNITS_PER_WORD.
> 
> diff --git a/gcc/rtlanal.c b/gcc/rtlanal.c
> index 9a7afccefb8..c7ab86e228b 100644
> --- a/gcc/rtlanal.c
> +++ b/gcc/rtlanal.c
> @@ -4207,18 +4207,23 @@ rtx_cost (rtx x, machine_mode mode, enum rtx_code outer_code,
>    const char *fmt;
>    int total;
>    int factor;
> +  unsigned mode_size;
>  
>    if (x == 0)
>      return 0;
>  
> -  if (GET_MODE (x) != VOIDmode)
> +  if (GET_CODE (x) == SET)
> +    /* A SET doesn't have a mode, so let's look at the SET_DEST to get
> +       the mode for the factor.  */
> +    mode = GET_MODE (SET_DEST (x));
> +  else if (GET_MODE (x) != VOIDmode)
>      mode = GET_MODE (x);
>  
> +  mode_size = estimated_poly_value (GET_MODE_SIZE (mode));
> +
>    /* A size N times larger than UNITS_PER_WORD likely needs N times as
>       many insns, taking N times as long.  */
> -  factor = estimated_poly_value (GET_MODE_SIZE (mode)) / UNITS_PER_WORD;
> -  if (factor == 0)
> -    factor = 1;
> +  factor = mode_size > UNITS_PER_WORD ? mode_size / UNITS_PER_WORD : 1;
>  
>    /* Compute the default costs of certain things.
>       Note that targetm.rtx_costs can override the defaults.  */
> @@ -4243,14 +4248,6 @@ rtx_cost (rtx x, machine_mode mode, enum rtx_code outer_code,
>        /* Used in combine.c as a marker.  */
>        total = 0;
>        break;
> -    case SET:
> -      /* A SET doesn't have a mode, so let's look at the SET_DEST to get
> -	 the mode for the factor.  */
> -      mode = GET_MODE (SET_DEST (x));
> -      factor = estimated_poly_value (GET_MODE_SIZE (mode)) / UNITS_PER_WORD;
> -      if (factor == 0)
> -	factor = 1;
> -      /* FALLTHRU */
>      default:
>        total = factor * COSTS_N_INSNS (1);
>      }
> 



More information about the Gcc-patches mailing list