[PATCH] RISC-V: Using merge approach to optimize repeating sequence in vec_init

Robin Dapp rdapp.gcc@gmail.com
Fri May 12 13:08:27 GMT 2023


Hi,

in general LGTM, just minor nits and comments.

> -  void set_len_and_policy (rtx len, bool force_vlmax = false)
> -    {
> -      bool vlmax_p = force_vlmax;
> -      gcc_assert (has_dest);
> +  void set_len_and_policy (rtx len, bool force_vlmax = false, bool ta_p = true,
> +			   bool ma_p = true)
> +  {
> +    bool vlmax_p = force_vlmax;
> +    gcc_assert (has_dest);

Indentation?

>      m_inner_mode = GET_MODE_INNER (mode);
> -    m_inner_size = GET_MODE_BITSIZE (m_inner_mode).to_constant ();
> +    m_inner_size = GET_MODE_BITSIZE (m_inner_mode);
> +    m_inner_units = GET_MODE_SIZE (m_inner_mode);

I find it a bit misleading to call this units here.  Granted it's an inner
mode (i.e. referring to "bytes") but in the context of vector modes I'm likely
to think of a vector "unit" or lane.  What about m_inner_size_bytes or
m_inner_size_units?

> +bool
> +rvv_builder::repeating_sequence_use_merge_profitable_p ()
> +{
> +  return repeating_sequence_p (0, full_nelts ().to_constant (), npatterns ())
> +	 && inner_units () <= UNITS_PER_WORD
> +	 && 3 * npatterns () < full_nelts ().to_constant ();
> +}

Appreciate the explanatory comment and number of instructions is good for
now.  In the future and given the different uarchs we will want a proper
costing comparison.

> +/* Get the mask for merge approach.
> +
> +     Consider such following case:
> +       {a, b, a, b, a, b, a, b, a, b, a, b, a, b, a, b}
> +     To merge "a", the mask should be 1010....
> +     To merge "a", the mask should be 0101....
> +*/

Second line should be "b".

> +/* Emit merge instruction.  */
> +
> +static void
> +emit_merge_op (rtx dest, rtx src1, rtx src2, rtx mask)
> +{
> +  insn_expander<8> e;
> +  machine_mode mode = GET_MODE (dest);
> +  e.set_dest_and_mask (NULL_RTX, dest, GET_MODE (mask), true, true);
> +  e.add_input_operand (src1, mode);
> +  if (VECTOR_MODE_P (GET_MODE (src2)))
> +    e.add_input_operand (src2, mode);
> +  else
> +    e.add_input_operand (src2, GET_MODE_INNER (mode));
> +
> +  e.add_input_operand (mask, GET_MODE (mask));
> +  e.set_len_and_policy (NULL_RTX, true, true, false);
> +  if (VECTOR_MODE_P (GET_MODE (src2)))
> +    e.expand (code_for_pred_merge (mode), false);
> +  else
> +    e.expand (code_for_pred_merge_scalar (mode), false);
> +}

Looks a lot like binop.  Might need another round of wrappers
soon :)

Regards
 Robin


More information about the Gcc-patches mailing list