[PATCH][GCC][AArch64] optimize float immediate moves (1 /4) - infrastructure.

Thu Jul 27 15:43:00 GMT 2017

On Wed, Jul 26, 2017 at 05:00:05PM +0100, Tamar Christina wrote:
> Hi James,
> 
> I have updated the patch and have responded to your question blow.
> 
> Ok for trunk?

Ok, with a few small changes.

> > >  static bool
> > > @@ -5857,12 +5955,6 @@ aarch64_preferred_reload_class (rtx x,
> > reg_class_t regclass)
> > >        return NO_REGS;
> > >      }
> > >
> > > -  /* If it's an integer immediate that MOVI can't handle, then
> > > -     FP_REGS is not an option, so we return NO_REGS instead.  */
> > > -  if (CONST_INT_P (x) && reg_class_subset_p (regclass, FP_REGS)
> > > -      && !aarch64_simd_imm_scalar_p (x, GET_MODE (x)))
> > > -    return NO_REGS;
> > > -
> > 
> > I don't understand this relaxation could you explain what this achieves and
> > why it is safe in this patch?
> 
> Because this should be left up to the pattern to decide what should happen
> and not reload.  Leaving the check here also means you'll do a reasonably
> expensive check twice for each constant
> you can emit a move for.
> 
> Removing extra restriction on the constant classes leaves it up to
> aarch64_legitimate_constant_p to decide if if the constant can be emitted as
> a move or should be forced to memory.
> aarch64_legitimate_constant_p also calls aarch64_cannot_force_const_mem.
> 
> The documentation for TARGET_PREFERRED_RELOAD_CLASS also states: 
> 
> "One case where TARGET_PREFERRED_RELOAD_CLASS must not return rclass is if x
> is a legitimate constant which cannot be loaded into some register class. By
> returning NO_REGS you can force x into a memory location. For example, rs6000
> can load immediate values into general-purpose registers, but does not have
> an instruction for loading an immediate value into a floating-point register,
> so TARGET_PREFERRED_RELOAD_CLASS returns NO_REGS when x is a floating-point
> constant. If the constant can't be loaded into any kind of register, code
> generation will be better if TARGET_LEGITIMATE_CONSTANT_P makes the constant
> illegitimate instead of using TARGET_PREFERRED_RELOAD_CLASS. "
> 
> So it seems that not only did the original constraint not add anything, we
> may also generate better code now.

Thanks for the explanation.

> diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
> index e397ff4afa73cfbc7e192fd5686b1beff9bbbadf..fd20576d23cfdc48761f65e41762b2a5a71f3e61 100644
> --- a/gcc/config/aarch64/aarch64-protos.h
> +++ b/gcc/config/aarch64/aarch64-protos.h
> @@ -326,6 +326,8 @@ bool aarch64_emit_approx_sqrt (rtx, rtx, bool);
>  void aarch64_expand_call (rtx, rtx, bool);
>  bool aarch64_expand_movmem (rtx *);
>  bool aarch64_float_const_zero_rtx_p (rtx);
> +bool aarch64_float_const_rtx_p (rtx);
> +bool aarch64_can_const_movi_rtx_p (rtx x, machine_mode mode);

This list should be alphabetical, first by type, then by name. So I'd
expect this to fit in just above aarch64_const_vec_all_same_int_p .

>  bool aarch64_function_arg_regno_p (unsigned);
>  bool aarch64_fusion_enabled_p (enum aarch64_fusion_pairs);
>  bool aarch64_gen_movmemqi (rtx *);
> @@ -353,7 +355,6 @@ bool aarch64_regno_ok_for_base_p (int, bool);
>  bool aarch64_regno_ok_for_index_p (int, bool);
>  bool aarch64_simd_check_vect_par_cnst_half (rtx op, machine_mode mode,
>  					    bool high);
> -bool aarch64_simd_imm_scalar_p (rtx x, machine_mode mode);
>  bool aarch64_simd_imm_zero_p (rtx, machine_mode);
>  bool aarch64_simd_scalar_immediate_valid_for_move (rtx, machine_mode);
>  bool aarch64_simd_shift_imm_p (rtx, machine_mode, bool);
> @@ -488,4 +489,6 @@ std::string aarch64_get_extension_string_for_isa_flags (unsigned long,
>  
>  rtl_opt_pass *make_pass_fma_steering (gcc::context *ctxt);
>  
> +bool aarch64_reinterpret_float_as_int (rtx value, unsigned HOST_WIDE_INT *fail);

This isn't defined in common/config/aarch64-common.c so shouldn't be in
the section for functions which will be defined there. It should be in
the list above between aarch64_regno_ok_for_index_p and
aarch64_simd_check_vect_par_cnst_half.

> +
>  #endif /* GCC_AARCH64_PROTOS_H */
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index d753666ef67fc524260c41f36743df3649a0a98a..b1ddd77823e50e63439e497f695f3fad9bd9efc9 100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -147,6 +147,8 @@ static bool aarch64_builtin_support_vector_misalignment (machine_mode mode,
>  							 const_tree type,
>  							 int misalignment,
>  							 bool is_packed);
> +static machine_mode
> +aarch64_simd_container_mode (machine_mode mode, unsigned width);
>  
>  /* Major revision number of the ARM Architecture implemented by the target.  */
>  unsigned aarch64_architecture_version;
> @@ -4723,6 +4725,62 @@ aarch64_legitimize_address_displacement (rtx *disp, rtx *off, machine_mode mode)
>    return true;
>  }
>  
> +/* Return the binary representation of floating point constant VALUE in INTVAL.
> +   If the value cannot be converted, return false without setting INTVAL.
> +   The conversion is done in the given MODE.  */
> +bool
> +aarch64_reinterpret_float_as_int (rtx value, unsigned HOST_WIDE_INT *intval)
> +{
> +  machine_mode mode = GET_MODE (value);
> +  if (GET_CODE (value) != CONST_DOUBLE
> +      || !SCALAR_FLOAT_MODE_P (mode)
> +      || GET_MODE_BITSIZE (mode) > HOST_BITS_PER_WIDE_INT)
> +    return false;
> +
> +  unsigned HOST_WIDE_INT ival = 0;
> +
> +  /* Only support up to DF mode.  */
> +  gcc_assert (GET_MODE_BITSIZE (mode) <= GET_MODE_BITSIZE (DFmode));

I'm somewhere a bit hypothetical here, but on a machine with a 128-bit
HOST_WIDE_INT, you would hit this case and take the assert. Just make
this another case in your if statement above:

  machine_mode mode = GET_MODE (value);
  if (GET_CODE (value) != CONST_DOUBLE
      || !SCALAR_FLOAT_MODE_P (mode)
      || GET_MODE_BITSIZE (mode) > HOST_BITS_PER_WIDE_INT
      || GET_MODE_BITSIZE (mode) > GET_MODE_BITSIZE (DFmode))
    return false;

> +  long res[2];
> +  real_to_target (res,
> +		  CONST_DOUBLE_REAL_VALUE (value),
> +		  REAL_MODE_FORMAT (mode));
> +
> +  ival = zext_hwi (res[0], 32);
> +  if (GET_MODE_BITSIZE (mode) == GET_MODE_BITSIZE (DFmode))
> +    ival |= (zext_hwi (res[1], 32) << 32);
> +
> +  *intval = ival;
> +  return true;
> +}
> +

Thanks,
James