[PATCH][ARM] Add movv4hf/v8hf expanders & later insns; disable VnHF immediates.

Mon Jan 18 13:13:00 GMT 2016

Hi Alan,

On 18/01/16 12:14, Alan Lawrence wrote:
> This fixes ICEs on armeb for float16x[48]_t vectors, e.g. in
> check_effective_target_arm_neon_fp_16_ok.
>
> At present, without the expander, moving v4hf/v8hf values around is done
> via subregs. On armeb, this ICEs because REG_CANNOT_CHANGE_MODE_P. (On arm-*,
> moving via two subregs is less efficient than one native insn!)
>
> However, adding the expanders, reveals a latent bug in the V4HF variant of
> *neon_mov, that vector constants are not handled properly in the
> neon_valid_immediate code. Hence, for now I've used a separate expander that
> disallows immediates, and disabled VnHF vectors as immediates in
> neon_valid_immediate_for_move; I'll file a PR for this.
>
> Also to fix the advsimd-intrinsics/vcombine test I had to add HF vector modes to
> the VX iterator and V_reg attribute, for vdup_n, as loading a vector of
> identical HF elements is now done by loading the scalar + vdup rather than
> forcing the vector out to the constant pool.
>
> On armeb, one of the ICEs this fixes, is in the test program for
> check_effective_target_arm_neon_fp_16_ok. This means the advsimd-intrinsics
> vcvt_f16 test now runs (and passes), and also that the other tests now run
> with neon-fp16, rather than only neon as previously (on armeb).
> This reveals that the fp16 cases of vld1_lane and vset_lane are (and were)
> failing. Since those tests would previously have failed *if fp16 had been
> passed in*, I think this is still a step forward; one can still run the tests
> with an explicit non-fp16 multilib if the old behaviour is desired.
>
> Note the previous patch removes other uses of VQXMOV (not strictly a dependency,
> generating V4HF/V8HF reinterpret patterns is harmless, they just aren't used).
>
> Bootstrapped + check-gcc on arm-none-linux-gnueabihf;
> cross-tested armeb-none-eabi.

Seems that you and Christophe have some duplicated work here?
https://gcc.gnu.org/ml/gcc-patches/2016-01/msg01031.html

Kyrill

> gcc/ChangeLog:
>
> 	* config/arm/arm.c (neon_valid_immediate): Disallow vectors of HFmode.
> 	* config/arm/iterators.md (V_HF): New.
> 	(VQXMOV): Add V8HF.
> 	(VX): Add V4HF, V8HF.
> 	(V_reg): Add cases for V4HF, V8HF.
> 	* config/arm/vec-common.md (mov<mode> V_HF): New.
> ---
>   gcc/config/arm/arm.c         |  2 ++
>   gcc/config/arm/iterators.md  |  8 ++++++--
>   gcc/config/arm/vec-common.md | 20 ++++++++++++++++++++
>   3 files changed, 28 insertions(+), 2 deletions(-)
>
> diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
> index 3276b03..4fdba38 100644
> --- a/gcc/config/arm/arm.c
> +++ b/gcc/config/arm/arm.c
> @@ -12371,6 +12371,8 @@ neon_valid_immediate (rtx op, machine_mode mode, int inverse,
>     /* Vectors of float constants.  */
>     if (GET_MODE_CLASS (mode) == MODE_VECTOR_FLOAT)
>       {
> +      if (GET_MODE_INNER (mode) == HFmode)
> +	return -1;
>         rtx el0 = CONST_VECTOR_ELT (op, 0);
>         const REAL_VALUE_TYPE *r0;
>   
> diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md
> index 974cf51..c5db868 100644
> --- a/gcc/config/arm/iterators.md
> +++ b/gcc/config/arm/iterators.md
> @@ -59,6 +59,9 @@
>   ;; Integer and float modes supported by Neon and IWMMXT.
>   (define_mode_iterator VALL [V2DI V2SI V4HI V8QI V2SF V4SI V8HI V16QI V4SF])
>   
> +;; Vectors of half-precision floats.
> +(define_mode_iterator V_HF [V4HF V8HF])
> +
>   ;; Integer and float modes supported by Neon and IWMMXT, except V2DI.
>   (define_mode_iterator VALLW [V2SI V4HI V8QI V2SF V4SI V8HI V16QI V4SF])
>   
> @@ -99,7 +102,7 @@
>   (define_mode_iterator VQI [V16QI V8HI V4SI])
>   
>   ;; Quad-width vector modes, with TImode added, for moves.
> -(define_mode_iterator VQXMOV [V16QI V8HI V4SI V4SF V2DI TI])
> +(define_mode_iterator VQXMOV [V16QI V8HI V8HF V4SI V4SF V2DI TI])
>   
>   ;; Opaque structure types wider than TImode.
>   (define_mode_iterator VSTRUCT [EI OI CI XI])
> @@ -160,7 +163,7 @@
>   (define_mode_iterator VMDQI [V4HI V2SI V8HI V4SI])
>   
>   ;; Modes with 8-bit and 16-bit elements.
> -(define_mode_iterator VX [V8QI V4HI V16QI V8HI])
> +(define_mode_iterator VX [V8QI V4HI V4HF V16QI V8HI V8HF])
>   
>   ;; Modes with 8-bit elements.
>   (define_mode_iterator VE [V8QI V16QI])
> @@ -428,6 +431,7 @@
>   ;; Register width from element mode
>   (define_mode_attr V_reg [(V8QI "P") (V16QI "q")
>                            (V4HI "P") (V8HI  "q")
> +			 (V4HF "P") (V8HF  "q")
>                            (V2SI "P") (V4SI  "q")
>                            (V2SF "P") (V4SF  "q")
>                            (DI   "P") (V2DI  "q")
> diff --git a/gcc/config/arm/vec-common.md b/gcc/config/arm/vec-common.md
> index ce98f71..c27578a 100644
> --- a/gcc/config/arm/vec-common.md
> +++ b/gcc/config/arm/vec-common.md
> @@ -38,6 +38,26 @@
>       }
>   })
>   
> +;; This exists separately from the above pattern to exclude an immediate RHS.
> +
> +(define_expand "mov<mode>"
> +  [(set (match_operand:V_HF 0 "nonimmediate_operand" "")
> +	(match_operand:V_HF 1 "nonimmediate_operand" ""))]
> +  "TARGET_NEON
> +   || (TARGET_REALLY_IWMMXT && VALID_IWMMXT_REG_MODE (<MODE>mode))"
> +{
> +  if (can_create_pseudo_p ())
> +    {
> +      if (!REG_P (operands[0]))
> +	operands[1] = force_reg (<MODE>mode, operands[1]);
> +      else if (TARGET_NEON && CONSTANT_P (operands[1]))
> +	{
> +	  operands[1] = neon_make_constant (operands[1]);
> +	  gcc_assert (operands[1] != NULL_RTX);
> +	}
> +    }
> +})
> +
>   ;; Vector arithmetic. Expanders are blank, then unnamed insns implement
>   ;; patterns separately for IWMMXT and Neon.
>