[PATCH] rs6000: inefficient 64-bit constant generation for consecutive 1-bits

Tue Sep 15 13:56:01 GMT 2020

Hi!

On Thu, Sep 10, 2020 at 04:58:03PM -0500, Peter Bergner wrote:
> Generating arbitrary 64-bit constants on POWER can take up to 5 instructions.
> However, some special constants can be generated in fewer instructions.
> One special class of constants we don't handle, is constants that have one
> set of consecutive 1-bits.  These can be generated with a "li rT,-1"
> followed by a "rldic rX,rT,SH,MB" instruction.  The following patch
> implements this idea.

Cool.

> +/* Helper for num_insns_constant_gpr and rs6000_emit_set_long_const.
> +   Return TRUE if VALUE contains one set of consecutive 1-bits.  Also set
> +   *SH and *MB to values needed to generate VALUE with the rldic instruction.
> +   We accept consecutive 1-bits that wrap from MSB to LSB, ex: 0xff00...00ff.
> +   Otherwise, return FALSE.  */
> +
> +static bool
> +has_consecutive_ones (unsigned HOST_WIDE_INT value, int *sh, int *mb)
> +{
> +  unsigned HOST_WIDE_INT nlz, ntz, mask;
> +  unsigned HOST_WIDE_INT allones = -1;
> +
> +  ntz = ctz_hwi (value);
> +  nlz = clz_hwi (value);
> +  mask = (allones >> nlz) & (allones << ntz);
> +  if (value == mask)
> +    {
> +      /* Compute beginning and ending bit numbers, using IBM bit numbering.  */
> +      *mb = nlz;
> +      *sh = ntz;
> +      return true;
> +    }
> +
> +  /* Check if the inverted value contains consecutive ones.  We can create
> +     that constant by basically swapping the MB and ME bit numbers.  */
> +  value = ~value;
> +  ntz = ctz_hwi (value);
> +  nlz = clz_hwi (value);
> +  mask = (allones >> nlz) & (allones << ntz);
> +  if (value == mask)
> +    {
> +      /* Compute beginning and ending bit numbers, using IBM bit numbering.  */
> +      *mb = GET_MODE_BITSIZE (DImode) - ntz;
> +      *sh = GET_MODE_BITSIZE (DImode) - nlz;
> +      return true;
> +    }
> +
> +  *sh = *mb = 0;
> +  return false;
> +}

rs6000_is_valid_shift_mask handles this already (but it requires you to
pass in the shift needed).  rs6000_is_valid_mask will handle it.
rs6000_is_valid_and_mask does not get a shift count parameter, so cannot
use rldic currently.

Please improve something there instead?

> -  HOST_WIDE_INT ud1, ud2, ud3, ud4;
> +  HOST_WIDE_INT ud1, ud2, ud3, ud4, value = c;

Do not put declarations for uninitialised and initialised variables on
one line, please.

> +(define_insn "rldic"
> +  [(set (match_operand:DI 0 "gpc_reg_operand" "=r")
> +	(unspec:DI [(match_operand:DI 1 "gpc_reg_operand" "r")
> +		    (match_operand:DI 2 "u6bit_cint_operand" "n")
> +		    (match_operand:DI 3 "u6bit_cint_operand" "n")]
> +		   UNSPEC_RLDIC))]
> +  "TARGET_POWERPC64"
> +  "rldic %0,%1,%2,%3")

Don't use an unspec please.  Unspecs prohibit most optimisation.  For
example, nothing can now see what actual value is calculated here (you
can make that a bit better by using REG_EQ* notes, but it is not as good
as simply describing what the actual insns do).

> +/* { dg-final { scan-assembler "rldic r?\[0-9\]+,r?\[0-9\]+,8,8" } } */
> +/* { dg-final { scan-assembler "rldic r?\[0-9\]+,r?\[0-9\]+,24,8" } } */
> +/* { dg-final { scan-assembler "rldic r?\[0-9\]+,r?\[0-9\]+,40,8" } } */
> +/* { dg-final { scan-assembler "rldic r?\[0-9\]+,r?\[0-9\]+,40,48" } } */
> +/* { dg-final { scan-assembler "rldic r?\[0-9\]+,r?\[0-9\]+,40,23" } } */

Please use {} quotes, and \m\M.  \d can be helpful, too.

Segher