[PATCH][GCC][AArch64] optimize float immediate moves (1 /4) - infrastructure.
James Greenhalgh
james.greenhalgh@arm.com
Thu Jul 27 15:43:00 GMT 2017
On Wed, Jul 26, 2017 at 05:00:05PM +0100, Tamar Christina wrote:
> Hi James,
>
> I have updated the patch and have responded to your question blow.
>
> Ok for trunk?
Ok, with a few small changes.
> > > static bool
> > > @@ -5857,12 +5955,6 @@ aarch64_preferred_reload_class (rtx x,
> > reg_class_t regclass)
> > > return NO_REGS;
> > > }
> > >
> > > - /* If it's an integer immediate that MOVI can't handle, then
> > > - FP_REGS is not an option, so we return NO_REGS instead. */
> > > - if (CONST_INT_P (x) && reg_class_subset_p (regclass, FP_REGS)
> > > - && !aarch64_simd_imm_scalar_p (x, GET_MODE (x)))
> > > - return NO_REGS;
> > > -
> >
> > I don't understand this relaxation could you explain what this achieves and
> > why it is safe in this patch?
>
> Because this should be left up to the pattern to decide what should happen
> and not reload. Leaving the check here also means you'll do a reasonably
> expensive check twice for each constant
> you can emit a move for.
>
> Removing extra restriction on the constant classes leaves it up to
> aarch64_legitimate_constant_p to decide if if the constant can be emitted as
> a move or should be forced to memory.
> aarch64_legitimate_constant_p also calls aarch64_cannot_force_const_mem.
>
> The documentation for TARGET_PREFERRED_RELOAD_CLASS also states:
>
> "One case where TARGET_PREFERRED_RELOAD_CLASS must not return rclass is if x
> is a legitimate constant which cannot be loaded into some register class. By
> returning NO_REGS you can force x into a memory location. For example, rs6000
> can load immediate values into general-purpose registers, but does not have
> an instruction for loading an immediate value into a floating-point register,
> so TARGET_PREFERRED_RELOAD_CLASS returns NO_REGS when x is a floating-point
> constant. If the constant can't be loaded into any kind of register, code
> generation will be better if TARGET_LEGITIMATE_CONSTANT_P makes the constant
> illegitimate instead of using TARGET_PREFERRED_RELOAD_CLASS. "
>
> So it seems that not only did the original constraint not add anything, we
> may also generate better code now.
Thanks for the explanation.
> diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
> index e397ff4afa73cfbc7e192fd5686b1beff9bbbadf..fd20576d23cfdc48761f65e41762b2a5a71f3e61 100644
> --- a/gcc/config/aarch64/aarch64-protos.h
> +++ b/gcc/config/aarch64/aarch64-protos.h
> @@ -326,6 +326,8 @@ bool aarch64_emit_approx_sqrt (rtx, rtx, bool);
> void aarch64_expand_call (rtx, rtx, bool);
> bool aarch64_expand_movmem (rtx *);
> bool aarch64_float_const_zero_rtx_p (rtx);
> +bool aarch64_float_const_rtx_p (rtx);
> +bool aarch64_can_const_movi_rtx_p (rtx x, machine_mode mode);
This list should be alphabetical, first by type, then by name. So I'd
expect this to fit in just above aarch64_const_vec_all_same_int_p .
> bool aarch64_function_arg_regno_p (unsigned);
> bool aarch64_fusion_enabled_p (enum aarch64_fusion_pairs);
> bool aarch64_gen_movmemqi (rtx *);
> @@ -353,7 +355,6 @@ bool aarch64_regno_ok_for_base_p (int, bool);
> bool aarch64_regno_ok_for_index_p (int, bool);
> bool aarch64_simd_check_vect_par_cnst_half (rtx op, machine_mode mode,
> bool high);
> -bool aarch64_simd_imm_scalar_p (rtx x, machine_mode mode);
> bool aarch64_simd_imm_zero_p (rtx, machine_mode);
> bool aarch64_simd_scalar_immediate_valid_for_move (rtx, machine_mode);
> bool aarch64_simd_shift_imm_p (rtx, machine_mode, bool);
> @@ -488,4 +489,6 @@ std::string aarch64_get_extension_string_for_isa_flags (unsigned long,
>
> rtl_opt_pass *make_pass_fma_steering (gcc::context *ctxt);
>
> +bool aarch64_reinterpret_float_as_int (rtx value, unsigned HOST_WIDE_INT *fail);
This isn't defined in common/config/aarch64-common.c so shouldn't be in
the section for functions which will be defined there. It should be in
the list above between aarch64_regno_ok_for_index_p and
aarch64_simd_check_vect_par_cnst_half.
> +
> #endif /* GCC_AARCH64_PROTOS_H */
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index d753666ef67fc524260c41f36743df3649a0a98a..b1ddd77823e50e63439e497f695f3fad9bd9efc9 100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -147,6 +147,8 @@ static bool aarch64_builtin_support_vector_misalignment (machine_mode mode,
> const_tree type,
> int misalignment,
> bool is_packed);
> +static machine_mode
> +aarch64_simd_container_mode (machine_mode mode, unsigned width);
>
> /* Major revision number of the ARM Architecture implemented by the target. */
> unsigned aarch64_architecture_version;
> @@ -4723,6 +4725,62 @@ aarch64_legitimize_address_displacement (rtx *disp, rtx *off, machine_mode mode)
> return true;
> }
>
> +/* Return the binary representation of floating point constant VALUE in INTVAL.
> + If the value cannot be converted, return false without setting INTVAL.
> + The conversion is done in the given MODE. */
> +bool
> +aarch64_reinterpret_float_as_int (rtx value, unsigned HOST_WIDE_INT *intval)
> +{
> + machine_mode mode = GET_MODE (value);
> + if (GET_CODE (value) != CONST_DOUBLE
> + || !SCALAR_FLOAT_MODE_P (mode)
> + || GET_MODE_BITSIZE (mode) > HOST_BITS_PER_WIDE_INT)
> + return false;
> +
> + unsigned HOST_WIDE_INT ival = 0;
> +
> + /* Only support up to DF mode. */
> + gcc_assert (GET_MODE_BITSIZE (mode) <= GET_MODE_BITSIZE (DFmode));
I'm somewhere a bit hypothetical here, but on a machine with a 128-bit
HOST_WIDE_INT, you would hit this case and take the assert. Just make
this another case in your if statement above:
machine_mode mode = GET_MODE (value);
if (GET_CODE (value) != CONST_DOUBLE
|| !SCALAR_FLOAT_MODE_P (mode)
|| GET_MODE_BITSIZE (mode) > HOST_BITS_PER_WIDE_INT
|| GET_MODE_BITSIZE (mode) > GET_MODE_BITSIZE (DFmode))
return false;
> + long res[2];
> + real_to_target (res,
> + CONST_DOUBLE_REAL_VALUE (value),
> + REAL_MODE_FORMAT (mode));
> +
> + ival = zext_hwi (res[0], 32);
> + if (GET_MODE_BITSIZE (mode) == GET_MODE_BITSIZE (DFmode))
> + ival |= (zext_hwi (res[1], 32) << 32);
> +
> + *intval = ival;
> + return true;
> +}
> +
Thanks,
James
More information about the Gcc-patches
mailing list