This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.
Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]
Re: [PATCH v4 01/10] Initial TI PRU GCC port

From: Richard Sandiford <richard dot sandiford at arm dot com>
To: Dimitar Dimitrov <dimitar at dinux dot eu>
Cc: gcc-patches at gcc dot gnu dot org
Date: Thu, 13 Sep 2018 13:02:21 +0100
Subject: Re: [PATCH v4 01/10] Initial TI PRU GCC port
References: <20180906111217.24365-1-dimitar@dinux.eu> <20180906111217.24365-2-dimitar@dinux.eu>
Dimitar Dimitrov <dimitar@dinux.eu> writes:
> +; Specialized IOR/AND patterns for matching setbit/clearbit instructions.
> +;
> +; TODO - allow clrbit and setbit to support (1 << REG) constructs
> +
> +(define_insn "clearbit_<EQD:mode><EQS0:mode>_<bitalu_zext>"
> +  [(set (match_operand:EQD 0 "register_operand"	"=r")

Nit: stray tab instead of space.

> +	(and:EQD
> +	  (zero_extend:EQD
> +	    (match_operand:EQS0 1 "register_operand" "r"))
> +	  (match_operand:EQD 2 "single_zero_operand" "n")))]
> +  ""
> +  "clr\\t%0, %1, %V2"
> +  [(set_attr "type" "alu")
> +   (set_attr "length" "4")])

Very minor suggestion, doesn't need to hold up acceptance, but: some
patterns (like this one) have an explicit length of 4, but others have
an implicit length of 4 thanks to the default value in the define_attr.
It would be more consistent to do one or the other everywhere.

Using EQD and EQS0 like this has the unfortunate effect of creating
patterns like:

   ... (zero_extend:SI (match_operand:SI ...)) ...

(which you rely on for the define_substs) and:

   ... (zero_extend:HI (match_operand:SI ...)) ...

even though these are both invalid rtl.  That said, the rtl passes
should never generate this kind of rtl (i.e. they shouldn't rely on
targets to reject it), so that's probably fine in practice.  It just
adds a bit of bloat.  I also don't have a good suggestion how to fix it
without more infrastructure.  Adding:

  GET_MODE_SIZE (<EQD>mode) >= GET_MODE_SIZE (<EQS0>mode)

should cause gencondmd to remove the cases in which the zero_extend
result is narrower than the input, but doing that everywhere would
be ugly, and would still leave the case in which the zero_extend
result is the same size as the input (which you can't remove without
breaking the define_substs).

So after all that, I think this is fine as-is.

> +(define_insn "sub_impl<EQD:mode><EQS0:mode><EQS1:mode>_<alu3_zext><alu3_zext_op1><alu3_zext_op2>"
> +  [(set (match_operand:EQD 0 "register_operand" "=r,r,r")
> +	(minus:EQD
> +	 (zero_extend:EQD
> +	  (match_operand:EQS0 1 "reg_or_ubyte_operand" "r,r,I"))
> +	 (zero_extend:EQD
> +	  (match_operand:EQS1 2 "reg_or_ubyte_operand" "r,I,r"))))]
> +  ""
> +  "@
> +   sub\\t%0, %1, %2
> +   sub\\t%0, %1, %2
> +   rsb\\t%0, %2, %1"
> +  [(set_attr "type" "alu")
> +   (set_attr "length" "4")])

By convention, subtraction patterns shouldn't accept constants for
operand 2.  Constants should instead be subtracted using an addition
of the negative.

> +(define_constraint "I"
> +  "An unsigned 8-bit constant."
> +  (and (match_code "const_int")
> +       (match_test "UBYTE_INT (ival)")))

As it stands this will reject QImode constants with the top bit set,
since const_ints are always stored in sign-extended form.  E.g. QImode
128 is stored as (const_int -128) rather than (const_int 128).

Unfortunately this is difficult to fix in a clean way, since
const_ints don't store their mode (a long-standing wart) and unlike
predicates, constraints aren't given a mode from context.  The best
way I can think of coping with it is:

a) have a separate constraint for -128...127
b) add a define_mode_attr that maps QI to the new constraint and
   HI and SI to I
c) use <EQS1:...> etc. instead of I in the match_operands

Similar comment for "J" and HImode, although you already have the
"N" as the corresponding signed constraint and so don't need a new one.

> +(define_predicate "const_1_operand"
> +  (and (match_code "const_int")
> +       (match_test "IN_RANGE (INTVAL (op), 1, 1)")))

INTVAL (op) == 1 seems more obvious.

> +(define_predicate "const_ubyte_operand"
> +  (and (match_code "const_int")
> +       (match_test "IN_RANGE (INTVAL (op), 0, 0xff)")))
> +
> +(define_predicate "const_uhword_operand"
> +  (and (match_code "const_int")
> +       (match_test "IN_RANGE (INTVAL (op), 0, 0xffff)")))

Here you can use "INTVAL (op) & GET_MODE_MASK (mode)" to avoid the
problem above, as long as you always pass a mode to these predicates.

> +(define_predicate "ctable_addr_operand"
> +  (and (match_code "const_int")
> +       (match_test ("(pru_get_ctable_base_index (INTVAL (op)) >= 0)"))))
> +
> +(define_predicate "ctable_base_operand"
> +  (and (match_code "const_int")
> +       (match_test ("(pru_get_ctable_exact_base_index (INTVAL (op)) >= 0)"))))

Redundant brackets around the match_test strings.

(I want to rip out that syntax at some point, since allowing brackets
around strings just makes the files harder for scripting tools to parse.)

> +;; Return true if OP is a text segment reference.
> +;; This is needed for program memory address expressions.  Borrowed from AVR.
> +(define_predicate "text_segment_operand"
> +  (match_code "code_label,label_ref,symbol_ref,plus,minus,const")
> +{
> +  switch (GET_CODE (op))
> +    {
> +    case CODE_LABEL:
> +      return true;
> +    case LABEL_REF :
> +      return true;
> +    case SYMBOL_REF :
> +      return SYMBOL_REF_FUNCTION_P (op);
> +    case PLUS :
> +    case MINUS :
> +      /* Assume canonical format of symbol + constant.
> +	 Fall through.  */
> +    case CONST :
> +      return text_segment_operand (XEXP (op, 0), VOIDmode);
> +    default :
> +      return false;
> +    }
> +})

This probably comes from AVR, but: no spaces before ":".

Bit surprised that we can get a CODE_LABEL rather than a LABEL_REF here.
Do you know if that triggers in practice, and if so where?

An IMO neater and slightly more robust way of writing the body is:

  poly_int64 offset:
  rtx base = strip_offset (op, &offset);
  switch (GET_CODE (base))
    {
    case LABEL_REF:
      ...as above...
    case SYMBOL_REF:
      ...as above...
    default:
      return false;
    }

with "plus" and "minus" not in the match_code list (since they should
always appear in consts if they really are text references).

> +;; Return true if OP is a load multiple operation.  It is known to be a
> +;; PARALLEL and the first section will be tested.
> +
> +(define_special_predicate "load_multiple_operation"
> +  (match_code "parallel")
> +{
> +  machine_mode elt_mode;
> +  int count = XVECLEN (op, 0);
> +  unsigned int dest_regno;
> +  rtx src_addr;
> +  int i, off;
> +
> +  /* Perform a quick check so we don't blow up below.  */
> +  if (GET_CODE (XVECEXP (op, 0, 0)) != SET
> +      || GET_CODE (SET_DEST (XVECEXP (op, 0, 0))) != REG
> +      || GET_CODE (SET_SRC (XVECEXP (op, 0, 0))) != MEM)
> +    return false;
> +
> +  dest_regno = REGNO (SET_DEST (XVECEXP (op, 0, 0)));
> +  src_addr = XEXP (SET_SRC (XVECEXP (op, 0, 0)), 0);
> +  elt_mode = GET_MODE (SET_DEST (XVECEXP (op, 0, 0)));
> +
> +  /* Check, is base, or base + displacement.  */
> +
> +  if (GET_CODE (src_addr) == REG)
> +    off = 0;
> +  else if (GET_CODE (src_addr) == PLUS
> +	   && GET_CODE (XEXP (src_addr, 0)) == REG
> +	   && GET_CODE (XEXP (src_addr, 1)) == CONST_INT)
> +    {
> +      off = INTVAL (XEXP (src_addr, 1));
> +      src_addr = XEXP (src_addr, 0);
> +    }
> +  else
> +    return false;
> +
> +  for (i = 1; i < count; i++)
> +    {
> +      rtx elt = XVECEXP (op, 0, i);
> +
> +      if (GET_CODE (elt) != SET
> +	  || GET_CODE (SET_DEST (elt)) != REG
> +	  || GET_MODE (SET_DEST (elt)) != elt_mode
> +	  || REGNO (SET_DEST (elt)) != dest_regno + i
> +	  || GET_CODE (SET_SRC (elt)) != MEM
> +	  || GET_MODE (SET_SRC (elt)) != elt_mode
> +	  || GET_CODE (XEXP (SET_SRC (elt), 0)) != PLUS
> +	  || ! rtx_equal_p (XEXP (XEXP (SET_SRC (elt), 0), 0), src_addr)
> +	  || GET_CODE (XEXP (XEXP (SET_SRC (elt), 0), 1)) != CONST_INT
> +	  || INTVAL (XEXP (XEXP (SET_SRC (elt), 0), 1))
> +	     != off + i * GET_MODE_SIZE (elt_mode))
> +	return false;
> +    }
> +
> +  return true;
> +})

This may not matter in practice, but the downside of testing PLUS and
CONST_INT explicitly is that the predicate won't match something like:

  (set (reg R1) (mem (plus (reg RB) (const_int -4))))
  (set (reg R2) (mem (reg RB)))
  (set (reg R3) (mem (plus (reg RB) (const_int 4))))
  ...

Using strip_offset on the addresses would avoid this.

> +;; Return true if OP is a store multiple operation.  It is known to be a
> +;; PARALLEL and the first section will be tested.
> +
> +(define_special_predicate "store_multiple_operation"
> +  (match_code "parallel")
> +{
> +  machine_mode elt_mode;
> +  int count = XVECLEN (op, 0);
> +  unsigned int src_regno;
> +  rtx dest_addr;
> +  int i, off;
> +
> +  /* Perform a quick check so we don't blow up below.  */
> +  if (GET_CODE (XVECEXP (op, 0, 0)) != SET
> +      || GET_CODE (SET_DEST (XVECEXP (op, 0, 0))) != MEM
> +      || GET_CODE (SET_SRC (XVECEXP (op, 0, 0))) != REG)
> +    return false;
> +
> +  src_regno = REGNO (SET_SRC (XVECEXP (op, 0, 0)));
> +  dest_addr = XEXP (SET_DEST (XVECEXP (op, 0, 0)), 0);
> +  elt_mode = GET_MODE (SET_SRC (XVECEXP (op, 0, 0)));
> +
> +  /* Check, is base, or base + displacement.  */
> +
> +  if (GET_CODE (dest_addr) == REG)
> +    off = 0;
> +  else if (GET_CODE (dest_addr) == PLUS
> +	   && GET_CODE (XEXP (dest_addr, 0)) == REG
> +	   && GET_CODE (XEXP (dest_addr, 1)) == CONST_INT)
> +    {
> +      off = INTVAL (XEXP (dest_addr, 1));
> +      dest_addr = XEXP (dest_addr, 0);
> +    }
> +  else
> +    return false;
> +
> +  for (i = 1; i < count; i++)
> +    {
> +      rtx elt = XVECEXP (op, 0, i);
> +
> +      if (GET_CODE (elt) != SET
> +	  || GET_CODE (SET_SRC (elt)) != REG
> +	  || GET_MODE (SET_SRC (elt)) != elt_mode
> +	  || REGNO (SET_SRC (elt)) != src_regno + i
> +	  || GET_CODE (SET_DEST (elt)) != MEM
> +	  || GET_MODE (SET_DEST (elt)) != elt_mode
> +	  || GET_CODE (XEXP (SET_DEST (elt), 0)) != PLUS
> +	  || ! rtx_equal_p (XEXP (XEXP (SET_DEST (elt), 0), 0), dest_addr)
> +	  || GET_CODE (XEXP (XEXP (SET_DEST (elt), 0), 1)) != CONST_INT
> +	  || INTVAL (XEXP (XEXP (SET_DEST (elt), 0), 1))
> +	     != off + i * GET_MODE_SIZE (elt_mode))
> +	return false;
> +    }
> +  return true;
> +})

Same comment here.

> +/* Callback for walk_gimple_seq that checks TP tree for TI ABI compliance.  */
> +static tree
> +check_op_callback (tree *tp, int *walk_subtrees, void *data)
> +{
> +  struct walk_stmt_info *wi = (struct walk_stmt_info *) data;
> +
> +  if (RECORD_OR_UNION_TYPE_P (*tp) || TREE_CODE (*tp) == ENUMERAL_TYPE)
> +    {
> +      /* Forward declarations have NULL tree type.  Skip them.  */
> +      if (TREE_TYPE (*tp) == NULL)
> +	return NULL;
> +    }
> +
> +  /* TODO - why C++ leaves INTEGER_TYPE forward declarations around?  */
> +  if (TREE_TYPE (*tp) == NULL)
> +    return NULL;
> +
> +  const tree type = TREE_TYPE (*tp);
> +
> +  /* Direct function calls are allowed, obviously.  */
> +  if (TREE_CODE (*tp) == ADDR_EXPR && TREE_CODE (type) == POINTER_TYPE)
> +    {
> +      const tree ptype = TREE_TYPE (type);
> +      if (TREE_CODE (ptype) == FUNCTION_TYPE)
> +	return NULL;
> +    }

This seems like a bit of a dangerous heuristic.  Couldn't it also cause
us to skip things like:

   (void *) func

?  (OK, that's dubious C, but we do support it.)

Maybe it would be better to check something like:

  gcall *call = dyn_cast <gcall *> (gsi_stmt (wi->gsi));
  if (call
      && tp == gimple_call_fn_ptr (call)
      && gimple_call_fndecl (call))
    return NULL;

instead (completely untested).

> +/* Implements the "pragma CTABLE_ENTRY" pragma.  This pragma takes a
> +   CTABLE index and an address, and instructs the compiler that
> +   LBCO/SBCO can be used on that base address.
> +
> +   WARNING: Only immediate constant addresses are currently supported.  */
> +static void
> +pru_pragma_ctable_entry (cpp_reader * reader ATTRIBUTE_UNUSED)
> +{
> +  tree ctable_index, base_addr;
> +  enum cpp_ttype type;
> +
> +  type = pragma_lex (&ctable_index);
> +  if (type == CPP_NUMBER)
> +    {
> +      type = pragma_lex (&base_addr);
> +      if (type == CPP_NUMBER)
> +	{
> +	  unsigned int i = tree_to_uhwi (ctable_index);
> +	  unsigned HOST_WIDE_INT base = tree_to_uhwi (base_addr);

You should test tree_fits_uhwi_p as well as type == CPP_NUMBER,
otherwise this could ICE for large values.  Also, "i" should be
"unsigned HOST_WIDE_INT", so that silly indices like 0x100000000
don't get silently truncated to 0.

> +extern const char * pru_output_sign_extend (rtx *);
> +extern const char * pru_output_signed_cbranch (rtx *, bool);
> +extern const char * pru_output_signed_cbranch_ubyteop2 (rtx *, bool);
> +extern const char * pru_output_signed_cbranch_zeroop2 (rtx *, bool);

Nit: should be no space between "*" and the function name.

> +#ifdef TREE_CODE
> +bool pru_return_in_memory (const_tree type, const_tree fntype ATTRIBUTE_UNUSED);
> +#endif /* TREE_CODE */

Nit: ATTRIBUTE_UNUSED doesn't have any meaning for prototypes,
and could easily be forgotten if the implementation does end up looking
at fntype in future.

> +/* Return the bytes needed to compute the frame pointer from the current
> +   stack pointer.  */
> +static int
> +pru_compute_frame_layout (void)
> +{
> +  int regno;
> +  HARD_REG_SET *save_mask;
> +  int total_size;
> +  int var_size;
> +  int out_args_size;
> +  int save_reg_size;
> +
> +  if (cfun->machine->initialized)
> +    return cfun->machine->total_size;
[...]
> +  cfun->machine->initialized = reload_completed;

Rather than this, please convert to the new TARGET_COMPUTE_FRAME_LAYOUT hook,
which is called only when necessary, and so doesn't need to protect against
repeated evaluations.

> +  /* Attach a note indicating what happened.  */
> +  if (reg_note_rtx == NULL_RTX)
> +    reg_note_rtx = copy_rtx (op0_adjust);
> +  if (kind != REG_NOTE_MAX)
> +    add_reg_note (insn, kind, reg_note_rtx);

Nit: would be better to copy only if kind != REG_NOTE_MAX.

> +  /* Ok, save this bunch.  */
> +  addr = gen_rtx_PLUS (Pmode, stack_pointer_rtx,
> +		       gen_int_mode (*sp_offset, Pmode));

Better to use plus_constant (which is shorter and also copies with a
zero offset).

> +/* Emit function prologue.  */
> +void
> +pru_expand_prologue (void)
> +{
> +  int regno_start;
> +  int total_frame_size;
> +  int sp_offset;      /* Offset from base_reg to final stack value.  */
> +  int save_regs_base; /* Offset from base_reg to register save area.  */
> +  int save_offset;    /* Temporary offset to currently saved register group.  */
> +  rtx insn;
> +
> +  total_frame_size = pru_compute_frame_layout ();
> +
> +  if (flag_stack_usage_info)
> +    current_function_static_stack_size = total_frame_size;
> +
> +  /* Decrement the stack pointer.  */
> +  if (!UBYTE_INT (total_frame_size))
> +    {
> +      /* We need an intermediary point, this will point at the spill block.  */
> +      insn = pru_add_to_sp (cfun->machine->save_regs_offset
> +			     - total_frame_size,
> +			     REG_NOTE_MAX);
> +      save_regs_base = 0;
> +      sp_offset = -cfun->machine->save_regs_offset;
> +    }
> +  else if (total_frame_size)
> +    {
> +      insn = emit_insn (gen_sub2_insn (stack_pointer_rtx,
> +				       gen_int_mode (total_frame_size,
> +						     Pmode)));

As above, constant amounts should be subtracted using an add2_insn with
the negative amount: insns that subtract a const_int aren't canonical.

> +  regno_start = 0;
> +  save_offset = save_regs_base;
> +  do {
> +      regno_start = xbbo_next_reg_cluster (regno_start, &save_offset, true);
> +  } while (regno_start >= 0);

Nit: GNU formatting would be:

  do
    regno_start = xbbo_next_reg_cluster (regno_start, &save_offset, true);
  while (regno_start >= 0);

Same for the copy in the epilogue.

> +  if (frame_pointer_needed)
> +    {
> +      int fp_save_offset = save_regs_base + cfun->machine->fp_save_offset;
> +      pru_add3_frame_adjust (hard_frame_pointer_rtx,
> +				    stack_pointer_rtx,
> +				    fp_save_offset,
> +				    REG_NOTE_MAX,
> +				    NULL_RTX);

Indentation looks off.

> +    }
> +
> +  if (sp_offset)
> +      pru_add_to_sp (sp_offset, REG_FRAME_RELATED_EXPR);

Same here (and for the epilogue code).

> +  else if (!UBYTE_INT (total_frame_size))
> +    {
> +      pru_add_to_sp (cfun->machine->save_regs_offset,
> +			    REG_CFA_ADJUST_CFA);

Same here (sorry!).

> +/* Implement TARGET_MODES_TIEABLE_P.  */
> +
> +static bool
> +pru_modes_tieable_p (machine_mode mode1, machine_mode mode2)
> +{
> +  return (mode1 == mode2
> +	  || (GET_MODE_SIZE (mode1) <= 4 && GET_MODE_SIZE (mode2) <= 4));
> +}

I think this should have a comment explaining why this implementation
is needed.

> +/* Implement TARGET_HARD_REGNO_MODE_OK.  */
> +
> +static bool
> +pru_hard_regno_mode_ok (unsigned int regno, machine_mode mode)
> +{
> +  switch (GET_MODE_SIZE (mode))
> +    {
> +    case 1: return true;
> +    case 2: return (regno % 4) <= 2;
> +    case 4: return (regno % 4) == 0;
> +    default: return (regno % 4) == 0; /* Not sure why VOIDmode is passed.  */
> +    }
> +}

The comment made me think that the default case was only there for
VOIDmode, but it looks from the rest of the port like you do support
modes bigger than 4 units.  Might be worth writing in a way that makes
that more obvious.

But yeah, passing VOIDmode is a bug.

> +/* Implement `TARGET_HARD_REGNO_SCRATCH_OK'.
> +   Returns true if REGNO is safe to be allocated as a scratch
> +   register (for a define_peephole2) in the current function.  */
> +
> +static bool
> +pru_hard_regno_scratch_ok (unsigned int regno)
> +{
> +  /* Don't allow hard registers that might be part of the frame pointer.
> +     Some places in the compiler just test for [HARD_]FRAME_POINTER_REGNUM
> +     and don't handle a frame pointer that spans more than one register.  */
> +
> +  if ((!reload_completed || frame_pointer_needed)
> +      && ((regno >= HARD_FRAME_POINTER_REGNUM
> +	   && regno <= HARD_FRAME_POINTER_REGNUM + 3)
> +	  || (regno >= FRAME_POINTER_REGNUM
> +	      && regno <= FRAME_POINTER_REGNUM + 3)))
> +    {
> +      return false;
> +    }
> +
> +  return true;
> +}
> +
> +
> +/* Worker function for `HARD_REGNO_RENAME_OK'.
> +   Return nonzero if register OLD_REG can be renamed to register NEW_REG.  */
> +
> +int
> +pru_hard_regno_rename_ok (unsigned int old_reg,
> +			  unsigned int new_reg)
> +{
> +  /* Don't allow hard registers that might be part of the frame pointer.
> +     Some places in the compiler just test for [HARD_]FRAME_POINTER_REGNUM
> +     and don't care for a frame pointer that spans more than one register.  */
> +  if ((!reload_completed || frame_pointer_needed)
> +      && ((old_reg >= HARD_FRAME_POINTER_REGNUM
> +	   && old_reg <= HARD_FRAME_POINTER_REGNUM + 3)
> +	  || (old_reg >= FRAME_POINTER_REGNUM
> +	      && old_reg <= FRAME_POINTER_REGNUM + 3)
> +	  || (new_reg >= HARD_FRAME_POINTER_REGNUM
> +	      && new_reg <= HARD_FRAME_POINTER_REGNUM + 3)
> +	  || (new_reg >= FRAME_POINTER_REGNUM
> +	      && new_reg <= FRAME_POINTER_REGNUM + 3)))
> +    {
> +      return 0;
> +    }
> +
> +  return 1;
> +}

I realise you've borrowed this approach from AVR, but it would really be
good to fix the target-independent code instead of hacking around it
in target code.

That said, given that this is what AVR already does, it would be unfair
to insist on it before accepting the port.

GNU style is to have no braces around single statements.

> +/* Compute a (partial) cost for rtx X.  Return true if the complete
> +   cost has been computed, and false if subexpressions should be
> +   scanned.  In either case, *TOTAL contains the cost result.  */
> +static bool
> +pru_rtx_costs (rtx x, machine_mode mode ATTRIBUTE_UNUSED,
> +	       int outer_code, int opno ATTRIBUTE_UNUSED,
> +	       int *total, bool speed ATTRIBUTE_UNUSED)
> +{
> +  const int code = GET_CODE (x);
> +
> +  switch (code)
> +    {
> +      case CONST_INT:

Nit: case statements should be indented by the same amount as the switch "{".

> +    case SET:
> +	{
> +	  /* A SET doesn't have a mode, so let's look at the SET_DEST to get
> +	     the mode for the factor.  */
> +	  mode = GET_MODE (SET_DEST (x));
> +
> +	  if (GET_MODE_SIZE (mode) <= GET_MODE_SIZE (SImode)
> +	      && (GET_CODE (SET_SRC (x)) == ZERO_EXTEND
> +		  || outer_code == ZERO_EXTEND))

The outer_code == ZERO_EXTEND seems redundant: SETs can never be
embedded in ZERO_EXTENDs.

> +	    {
> +	      *total = 0;
> +	    }
> +	  else
> +	    {
> +	      /* SI move has the same cost as a QI move.  */
> +	      int factor = GET_MODE_SIZE (mode) / GET_MODE_SIZE (SImode);
> +	      if (factor == 0)
> +		factor = 1;
> +	      *total = factor * COSTS_N_INSNS (1);
> +	    }

Could you explain this a bit more?  It looks like a pure QImode operation
gets a cost of 1 insn but an SImode operation zero-extended from QImode
gets a cost of 0.

> +      case PLUS:
> +	{
> +	  rtx op0 = XEXP (x, 0);
> +	  rtx op1 = XEXP (x, 1);
> +	  if (outer_code == MEM
> +	      && ((REG_P (op0) && reg_or_ubyte_operand (op1, VOIDmode))
> +		  || (REG_P (op1) && reg_or_ubyte_operand (op0, VOIDmode))

This second check shouldn't be necessary: (plus (const_int ...) (reg ...))
is invalid.

> +		  || (ctable_addr_operand (op0, VOIDmode) && op1 == NULL_RTX)
> +		  || (ctable_addr_operand (op1, VOIDmode) && op0 == NULL_RTX)

(plus ...) should never have null operands.

> +		  || (ctable_base_operand (op0, VOIDmode) && REG_P (op1))
> +		  || (ctable_base_operand (op1, VOIDmode) && REG_P (op0))))
> +	    {
> +	      /* CTABLE or REG base addressing - PLUS comes for free.  */
> +	      *total = COSTS_N_INSNS (0);
> +	      return true;
> +	    }
> +	  else
> +	    {
> +	      *total = COSTS_N_INSNS (1);
> +	      return false;
> +	    }
> +	}

[...]

> +/* Implement TARGET_PREFERRED_RELOAD_CLASS.  */
> +static reg_class_t
> +pru_preferred_reload_class (rtx x ATTRIBUTE_UNUSED, reg_class_t regclass)
> +{
> +  return regclass == NO_REGS ? GENERAL_REGS : regclass;
> +}

This looks odd: PREFERRED_RELOAD_CLASS should return a subset of the
original class rather than add new registers.  Can you remember why
it was needed?

> +  /* Floating-point comparisons.  */
> +  eqsf_libfunc = init_one_libfunc ("__pruabi_eqf");
> +  nesf_libfunc = init_one_libfunc ("__pruabi_neqf");
> +  lesf_libfunc = init_one_libfunc ("__pruabi_lef");
> +  ltsf_libfunc = init_one_libfunc ("__pruabi_ltf");
> +  gesf_libfunc = init_one_libfunc ("__pruabi_gef");
> +  gtsf_libfunc = init_one_libfunc ("__pruabi_gtf");
> +  eqdf_libfunc = init_one_libfunc ("__pruabi_eqd");
> +  nedf_libfunc = init_one_libfunc ("__pruabi_neqd");
> +  ledf_libfunc = init_one_libfunc ("__pruabi_led");
> +  ltdf_libfunc = init_one_libfunc ("__pruabi_ltd");
> +  gedf_libfunc = init_one_libfunc ("__pruabi_ged");
> +  gtdf_libfunc = init_one_libfunc ("__pruabi_gtd");
> +
> +  set_optab_libfunc (eq_optab, SFmode, NULL);
> +  set_optab_libfunc (ne_optab, SFmode, "__pruabi_neqf");
> +  set_optab_libfunc (gt_optab, SFmode, NULL);
> +  set_optab_libfunc (ge_optab, SFmode, NULL);
> +  set_optab_libfunc (lt_optab, SFmode, NULL);
> +  set_optab_libfunc (le_optab, SFmode, NULL);
> +  set_optab_libfunc (unord_optab, SFmode, "__pruabi_unordf");
> +  set_optab_libfunc (eq_optab, DFmode, NULL);
> +  set_optab_libfunc (ne_optab, DFmode, "__pruabi_neqd");
> +  set_optab_libfunc (gt_optab, DFmode, NULL);
> +  set_optab_libfunc (ge_optab, DFmode, NULL);
> +  set_optab_libfunc (lt_optab, DFmode, NULL);
> +  set_optab_libfunc (le_optab, DFmode, NULL);

So you need to pass NULL for the ordered comparisons because the ABI
routines return a different value from the one GCC expects?  Might be
worth a comment if so.

> +/* Emit comparison instruction if necessary, returning the expression
> +   that holds the compare result in the proper mode.  Return the comparison
> +   that should be used in the jump insn.  */
> +
> +rtx
> +pru_expand_fp_compare (rtx comparison, machine_mode mode)
> +{
> +  enum rtx_code code = GET_CODE (comparison);
> +  rtx op0 = XEXP (comparison, 0);
> +  rtx op1 = XEXP (comparison, 1);
> +  rtx cmp;
> +  enum rtx_code jump_code = code;
> +  machine_mode op_mode = GET_MODE (op0);
> +  rtx_insn *insns;
> +  rtx libfunc;
> +
> +  gcc_assert (op_mode == DFmode || op_mode == SFmode);
> +
> +  if (code == UNGE)
> +    {
> +      code = LT;
> +      jump_code = EQ;
> +    }
> +  else if (code == UNLE)
> +    {
> +      code = GT;
> +      jump_code = EQ;
> +    }

This is only safe if the LT and GT libcalls don't raise exceptions for
unordered inputs.  I'm guessing they don't, but it's probably worth a
comment if so.

> +/* Return the sign bit position for given OP's mode.  */
> +static int
> +sign_bit_position (const rtx op)
> +{
> +  const int sz = GET_MODE_SIZE (GET_MODE (op));
> +
> +  return  sz * 8 - 1;
> +}

Nit: excess space after "return".

> +  gcc_assert (bufi > 0);
> +  gcc_assert ((unsigned int)bufi < sizeof (buf));

Nit: missing space after "(unsigned int)".  Other instances later.

> +  /* Determine the comparison operators for positive and negative operands.  */
> +  if (code == LT)
> +      cmp_opstr = "qblt";
> +  else if (code == LE)
> +      cmp_opstr = "qble";
> +  else
> +      gcc_unreachable ();

Nit: indented too far.  Same for gtge.

> +/* Output asm code for a signed-compare conditional branch.
> +
> +   If IS_NEAR is true, then QBBx instructions may be used for reaching
> +   the destination label.  Otherwise JMP is used, at the expense of
> +   increased code size.  */
> +const char *
> +pru_output_signed_cbranch (rtx *operands, bool is_near)
> +{
> +  enum rtx_code code = GET_CODE (operands[0]);
> +
> +  if (code == LT || code == LE)
> +    return pru_output_ltle_signed_cbranch (operands, is_near);
> +  else if (code == GT || code == GE)
> +    return pru_output_gtge_signed_cbranch (operands, is_near);
> +  else
> +      gcc_unreachable ();

gcc_unreachable indented too far.

> +/*
> +   Optimized version of pru_output_signed_cbranch for constant second
> +   operand.  */
> +
> +const char *
> +pru_output_signed_cbranch_ubyteop2 (rtx *operands, bool is_near)

Excess newline after "/*".  Same for later routines.

> +/* Return true if register REGNO is a valid base register.
> +   STRICT_P is true if REG_OK_STRICT is in effect.  */
> +
> +bool
> +pru_regno_ok_for_base_p (int regno, bool strict_p)
> +{
> +  if (!HARD_REGISTER_NUM_P (regno))
> +    {
> +      if (!strict_p)
> +	return true;
> +
> +      if (!reg_renumber)
> +	return false;
> +
> +      regno = reg_renumber[regno];

reg_renumber is a reload-only thing and shouldn't be needed/used on
LRA targets.

> +/* Recognize a CTABLE base address.  Return CTABLE entry index, or -1 if
> + * base was not found in the pragma-filled pru_ctable.  */
> +int
> +pru_get_ctable_exact_base_index (unsigned HOST_WIDE_INT caddr)

Nit: no leading "*" on second line.

> +/* Return true if the address expression formed by BASE + OFFSET is
> +   valid.  */
> +static bool
> +pru_valid_addr_expr_p (machine_mode mode, rtx base, rtx offset, bool strict_p)
> +{
> +  if (!strict_p && base != NULL_RTX && GET_CODE (base) == SUBREG)
> +    base = SUBREG_REG (base);
> +  if (!strict_p && offset != NULL_RTX && GET_CODE (offset) == SUBREG)
> +    offset = SUBREG_REG (offset);

As above, this shouldn't need to cope with null base and offset, since
PLUS can never have null operands.

> +  if (REG_P (base)
> +      && pru_regno_ok_for_base_p (REGNO (base), strict_p)
> +      && (offset == NULL_RTX
> +	  || (CONST_INT_P (offset)
> +	      && pru_valid_const_ubyte_offset (mode, INTVAL (offset)))
> +	  || (REG_P (offset)
> +	      && pru_regno_ok_for_index_p (REGNO (offset), strict_p))))
> +    {
> +      /*     base register + register offset
> +       * OR  base register + UBYTE constant offset.  */
> +      return true;
> +    }
> +  else if (REG_P (base)
> +	   && pru_regno_ok_for_index_p (REGNO (base), strict_p)
> +	   && (offset != NULL_RTX && ctable_base_operand (offset, VOIDmode)))
> +    {
> +      /*     base CTABLE constant base + register offset
> +       * Note: GCC always puts the register as a first operand of PLUS.  */
> +      return true;
> +    }
> +  else if (CONST_INT_P (base)
> +	   && offset == NULL_RTX
> +	   && (ctable_addr_operand (base, VOIDmode)))
> +    {
> +      /*     base CTABLE constant base + UBYTE constant offset.  */
> +      return true;

This address should simply be a (const_int ...), with no containing (plus ...).

> +  switch (GET_CODE (op))
> +    {
> +    case REG:
> +      if (letter == 0)
> +	{
> +	  fprintf (file, "%s", pru_asm_regname (op));
> +	  return;
> +	}
> +      else if (letter == 'b')
> +	{
> +	  gcc_assert (REGNO (op) <= LAST_NONIO_GP_REG);
> +	  fprintf (file, "r%d.b%d", REGNO (op) / 4, REGNO (op) % 4);
> +	  return;
> +	}
> +      else if (letter == 'F')
> +	{
> +	  gcc_assert (REGNO (op) <= LAST_NONIO_GP_REG);
> +	  gcc_assert (REGNO (op) % 4 == 0);
> +	  fprintf (file, "r%d", REGNO (op) / 4);
> +	  return;

This routine needs to use output_operand_lossage instead of an assert
for anything that could conceivably come from inline asms.  E.g. the
assert for REGNO (op) % 4 == 0 would cause in an ICE if the user
(wrongly) used %F with a byte register.

This is true even for undocumented prefix letters, since we don't give
a specific error for using undocumented features and could still reach
this point for them.

> +    case CONST_DOUBLE:
> +      if (letter == 0)
> +	{
> +	  long val;
> +
> +	  if (GET_MODE (op) != SFmode)
> +	    fatal_insn ("internal compiler error.  Unknown mode:", op);

Same here.  DOUBLE_TYPE_SIZE is 64, so an inline asm like:

  asm ("# %0" :: "F" (1.0));

could get this internal compiler error.

> +    case SUBREG:
> +    case MEM:
> +      if (letter == 0)
> +	{
> +	  output_address (VOIDmode, op);
> +	  return;
> +	}

Does the SUBREG case ever hit?  They shouldn't appear as operands
at this late stage.

> +/* Implement TARGET_PRINT_OPERAND_ADDRESS.  */
> +static void
> +pru_print_operand_address (FILE *file, machine_mode mode, rtx op)
> +{
> +  if (GET_CODE (op) != REG && CONSTANT_ADDRESS_P (op)
> +      && text_segment_operand (op, VOIDmode))
> +    {
> +      fprintf (stderr, "Unexpectred text address?\n");
> +      debug_rtx (op);
> +      gcc_unreachable ();
> +    }

CONSTANT_ADDRESS_P makes the test for REG redundant.  This too could
trigger for inline asms and should be reported as an error (if that's
the right thing to do) rather than an ICE.

> +    case PLUS:
> +      {
> +	int base;
> +	rtx op0 = XEXP (op, 0);
> +	rtx op1 = XEXP (op, 1);
> +
> +	if (REG_P (op0) && CONST_INT_P (op1)
> +	    && pru_get_ctable_exact_base_index (INTVAL (op1)) >= 0)
> +	  {
> +	    base = pru_get_ctable_exact_base_index (INTVAL (op1));
> +	    fprintf (file, "%d, %s", base, pru_asm_regname (op0));
> +	    return;
> +	  }
> +	else if (REG_P (op1) && CONST_INT_P (op0)
> +		 && pru_get_ctable_exact_base_index (INTVAL (op0)) >= 0)
> +	  {
> +	    base = pru_get_ctable_exact_base_index (INTVAL (op0));
> +	    fprintf (file, "%d, %s", base, pru_asm_regname (op1));
> +	    return;
> +	  }

As above it's a bug if this last case ever occurs.

> +	else if (REG_P (op0) && CONSTANT_P (op1))
> +	  {
> +	    fprintf (file, "%s, ", pru_asm_regname (op0));
> +	    output_addr_const (file, op1);
> +	    return;
> +	  }
> +	else if (REG_P (op1) && CONSTANT_P (op0))
> +	  {
> +	    fprintf (file, "%s, ", pru_asm_regname (op1));
> +	    output_addr_const (file, op0);
> +	    return;
> +	  }

Same here.

> +	else if (REG_P (op1) && REG_P (op0))
> +	  {
> +	    fprintf (file, "%s, %s", pru_asm_regname (op0),
> +				     pru_asm_regname (op1));
> +	    return;
> +	  }
> +      }
> +      break;
> +
> +    case REG:
> +      fprintf (file, "%s, 0", pru_asm_regname (op));
> +      return;
> +
> +    case MEM:
> +      {
> +	rtx base = XEXP (op, 0);
> +	pru_print_operand_address (file, mode, base);
> +	return;
> +      }
> +    default:
> +      break;
> +    }
> +
> +  fprintf (stderr, "Missing way to print address\n");
> +  debug_rtx (op);
> +  gcc_unreachable ();

Same comment about ICEing on user inline asms.

> +/* Implement TARGET_ASM_FILETARGET_ASM_FILE_START_START.  */

Typo.

> +/* Helper function to get the starting storage HW register for an argument,
> +   or -1 if it must be passed on stack.  The cum_v state is not changed.  */
> +static int
> +pru_function_arg_regi (cumulative_args_t cum_v,
> +		       machine_mode mode, const_tree type,
> +		       bool named)
> +{
> +  CUMULATIVE_ARGS *cum = get_cumulative_args (cum_v);
> +  size_t argsize = pru_function_arg_size (mode, type);
> +  size_t i, bi;
> +  int regi = -1;
> +
> +  if (!pru_arg_in_reg_bysize (argsize))
> +    return -1;
> +
> +  if (!named)
> +    return -1;
> +
> +  /* Find the first available slot that fits.  Yes, that's the PRU ABI.  */
> +  for (i = 0; regi < 0 && i < ARRAY_SIZE (cum->regs_used); i++)
> +    {
> +      if (mode == BLKmode)
> +	{
> +	  /* Structs are passed, beginning at a full register.  */
> +	  if ((i % 4) != 0)
> +	    continue;
> +	}

The comment doesn't seem to match the code: a struct like:

  struct s { short x; };

will have HImode rather than BLKmode.

It's usually better to base ABI stuff on the type where possible,
since that corresponds directly to the source language, while modes
are more of an internal GCC thing.

> +  for (; count < 2; count++)
> +      emit_insn_before (gen_nop (), last_insn);

Nit: indented too far.

> +/* Code for converting doloop_begins and doloop_ends into valid
> +   PRU instructions.  Idea and code snippets borrowed from mep port.
> +
> +   A doloop_begin is just a placeholder:
> +
> +	$count = unspec ($count)
> +
> +   where $count is initially the number of iterations.
> +   doloop_end has the form:
> +
> +	if (--$count == 0) goto label
> +
> +   The counter variable is private to the doloop insns, nothing else
> +   relies on its value.
> +
> +   There are three cases, in decreasing order of preference:
> +
> +      1.  A loop has exactly one doloop_begin and one doloop_end.
> +	 The doloop_end branches to the first instruction after
> +	 the doloop_begin.
> +
> +	 In this case we can replace the doloop_begin with a LOOP
> +	 instruction and remove the doloop_end.  I.e.:
> +
> +		$count1 = unspec ($count1)
> +	    label:
> +		...
> +		if (--$count2 != 0) goto label
> +
> +	  becomes:
> +
> +		LOOP end_label,$count1
> +	    label:
> +		...
> +	    end_label:
> +		# end loop
> +
> +      2.  As for (1), except there are several doloop_ends.  One of them
> +	 (call it X) falls through to a label L.  All the others fall
> +	 through to branches to L.
> +
> +	 In this case, we remove X and replace the other doloop_ends
> +	 with branches to the LOOP label.  For example:
> +
> +		$count1 = unspec ($count1)
> +	    label:
> +		...
> +		if (--$count1 != 0) goto label
> +	    end_label:
> +		...
> +		if (--$count2 != 0) goto label
> +		goto end_label
> +
> +	 becomes:
> +
> +		LOOP end_label,$count1
> +	    label:
> +		...
> +	    end_label:
> +		# end repeat
> +		...
> +		goto end_label
> +
> +      3.  The fallback case.  Replace doloop_begins with:
> +
> +		$count = $count
> +
> +	 Replace doloop_ends with the equivalent of:
> +
> +		$count = $count - 1
> +		if ($count != 0) goto loop_label
> +
> +	 */

Ooh, I remember this :-)  Nice to see the code live on in some form.

> +  pru_builtins[PRU_BUILTIN_DELAY_CYCLES]
> +    = add_builtin_function ( "__delay_cycles", void_ftype_longlong,
> +			    PRU_BUILTIN_DELAY_CYCLES, BUILT_IN_MD, NULL,
> +			    NULL_TREE);

Nit: stray space before "__delay_cycles".

> +/* Emit a sequence of one or more delay_cycles_X insns, in order to generate
> +   code that delays exactly ARG cycles.  */
> +
> +static rtx
> +pru_expand_delay_cycles (rtx arg)
> +{
> +  HOST_WIDE_INT c, n;
> +
> +  if (GET_CODE (arg) != CONST_INT)
> +    {
> +      error ("%<__delay_cycles%> only takes constant arguments");
> +      return NULL_RTX;
> +    }
> +
> +  c = INTVAL (arg);
> +
> +  if (HOST_BITS_PER_WIDE_INT > 32)

This is always true now.

> +/* TI ABI implementation is not feature enough (e.g. function pointers are
> +   not supported), so we cannot list it as a multilib variant.  To prevent
> +   misuse from users, do not link any of the standard libraries.  */

Maybe s/feature enough/feature-complete enough/ (already correct in the
t-pru comment).

> +/* Basic characteristics of PRU registers:
> +
> +   Regno  Name
> +   0      r0		  Caller Saved.  Also used as a static chain register.
> +   1      r1		  Caller Saved.  Also used as a temporary by function
> +			  profiler and function prologue/epilogue.
> +   2      r2       sp	  Stack Pointer
> +   3*     r3.w0    ra	  Return Address (16-bit)
> +   4      r4       fp	  Frame Pointer
> +   5-13   r5-r13	  Callee Saved Registers
> +   14-29  r14-r29	  Register Arguments.  Caller Saved Registers.
> +   14-15  r14-r15	  Return Location
> +   30     r30		  Special I/O register.  Not used by compiler.
> +   31     r31		  Special I/O register.  Not used by compiler.
> +
> +   32     loop_cntr	  Internal register used as a counter by LOOP insns
> +
> +   33     pc		  Not an actual register
> +
> +   34     fake_fp	  Fake Frame Pointer (always eliminated)
> +   35     fake_ap	  Fake Argument Pointer (always eliminated)
> +   36			  First Pseudo Register
> +
> +   The definitions for all the hard register numbers are located in pru.md.
> +*/

Might be worth clarifying this to say that there are four GCC registers
per PRU register, so e.g. the first pseudo register is actually 144 rather
than 36.

> +/* Tests for various kinds of constants used in the PRU port.  */
> +#define SHIFT_INT(X) ((X) >= 0 && (X) <= 31)
> +
> +#define UHWORD_INT(X) (IN_RANGE ((X), 0, 0xffff))
> +#define SHWORD_INT(X) (IN_RANGE ((X), -32768, 32767))
> +#define UBYTE_INT(X) (IN_RANGE ((X), 0, 0xff))

Would be good to use IN_RANGE for SHIFT_INT too, since it avoids the
double evaluation of the macro argument.

> +/* Say that the epilogue uses the return address register.  Note that
> +   in the case of sibcalls, the values "used by the epilogue" are
> +   considered live at the start of the called function.  */
> +#define EPILOGUE_USES(REGNO) (epilogue_completed &&	      \
> +			      (((REGNO) == RA_REGNO)	      \
> +			       || (REGNO) == (RA_REGNO + 1)))

The "&&" should be on the next line.

> +#define __pru_name_R(X)  X".b0", X".b1", X".b2", X".b3"

Would be better not to use a leading "__", since that's in the
implementation namespace.  How about just PRU_NAME_R?

> +#define __pru_overlap_R(X)	      \
> +  { "r" #X	, X * 4	    ,  4 },   \
> +  { "r" #X ".w0", X * 4 + 0 ,  2 },   \
> +  { "r" #X ".w1", X * 4 + 1 ,  2 },   \
> +  { "r" #X ".w2", X * 4 + 2 ,  2 }

Same sort of thing here.

> +; Not recommended.  Please use %0 instead!
> +(define_mode_attr regwidth [(QI ".b0") (HI ".w0") (SI "")])

Looks like you've successfully done this transition :-)  I couldn't
see any other references to "regwidth", so might as well just delete it.

> +; I cannot think of any reason for the core to pass a 64-bit symbolic
> +; constants.  Hence simplify the rule and handle only numeric constants.

Not sure what you mean by "the core" here: the core compiler?
If so, then yeah, by default the only modes that a symbolic address can
have are Pmode and ptr_mode, which are both SImode here.  So I don't
think this is a simplification so much as expected behaviour.

> +; load_multiple pattern(s).
> +;
> +; ??? Due to reload problems with replacing registers inside match_parallel
> +; we currently support load_multiple/store_multiple only after reload.
> +;
> +; Idea taken from the s390 port.
> +
> +(define_expand "load_multiple"
> +  [(match_par_dup 3 [(set (match_operand 0 "" "")
> +			  (match_operand 1 "" ""))
> +		     (use (match_operand 2 "" ""))])]
> +  "reload_completed"
> +{
> +  machine_mode mode;
> +  int regno;
> +  int count;
> +  rtx from;
> +  int i, off;
> +
> +  /* Support only loading a constant number of fixed-point registers from
> +     memory.  */
> +  if (GET_CODE (operands[2]) != CONST_INT
> +      || GET_CODE (operands[1]) != MEM
> +      || GET_CODE (operands[0]) != REG)
> +    FAIL;
> +
> +  count = INTVAL (operands[2]);
> +  regno = REGNO (operands[0]);
> +  mode = GET_MODE (operands[0]);
> +  if (mode != QImode)
> +    FAIL;
> +
> +  operands[3] = gen_rtx_PARALLEL (VOIDmode, rtvec_alloc (count));
> +  if (!can_create_pseudo_p ())

This is always true if reload_completed, so I think the "else" arm
is dead code.

> +    {
> +      if (GET_CODE (XEXP (operands[1], 0)) == REG)
> +	{
> +	  from = XEXP (operands[1], 0);
> +	  off = 0;
> +	}
> +      else if (GET_CODE (XEXP (operands[1], 0)) == PLUS
> +	       && GET_CODE (XEXP (XEXP (operands[1], 0), 0)) == REG
> +	       && GET_CODE (XEXP (XEXP (operands[1], 0), 1)) == CONST_INT)
> +	{
> +	  from = XEXP (XEXP (operands[1], 0), 0);
> +	  off = INTVAL (XEXP (XEXP (operands[1], 0), 1));
> +	}
> +      else
> +	FAIL;

Could be simplified using split_offset.

Same comments for store_multiple.

> +;; Arithmetic Operations
> +
> +(define_expand "add<mode>3"
> +  [(set (match_operand:QISI 0 "register_operand"	    "=r,r,r")
> +	(plus:QISI (match_operand:QISI 1 "register_operand" "%r,r,r")
> +		 (match_operand:QISI 2 "nonmemory_operand"  "r,I,M")))]
> +  ""
> +  ""
> +  [(set_attr "type" "alu")])

define_expands shouldn't have constraints or attributes (or at least,
they have no effect).  So this should just be:

(define_expand "add<mode>3"
  [(set (match_operand:QISI 0 "register_operand")
	(plus:QISI (match_operand:QISI 1 "register_operand")
		   (match_operand:QISI 2 "nonmemory_operand")))])

Same for the rest of the file.

> +(define_insn "subdi3"
> +  [(set (match_operand:DI 0 "register_operand"		    "=&r,&r,&r")
> +	(minus:DI (match_operand:DI 1 "register_operand"    "r,r,I")
> +		 (match_operand:DI 2 "reg_or_ubyte_operand" "r,I,r")))]
> +  ""
> +  "@
> +   sub\\t%F0, %F1, %F2\;suc\\t%N0, %N1, %N2
> +   sub\\t%F0, %F1, %2\;suc\\t%N0, %N1, 0
> +   rsb\\t%F0, %F2, %1\;rsc\\t%N0, %N2, 0"
> +  [(set_attr "type" "alu,alu,alu")
> +   (set_attr "length" "8,8,8")])

As above, this shouldn't handle operand 2 being constant; that should
always go through adddi3 instead.

> +; LRA cannot cope with clobbered op2, hence the scratch register.
> +(define_insn "ashr<mode>3"
> +  [(set (match_operand:QISI 0 "register_operand"	    "=&r,r")
> +	  (ashiftrt:QISI
> +	    (match_operand:QISI 1 "register_operand"	    "0,r")
> +	    (match_operand:QISI 2 "reg_or_const_1_operand"  "r,P")))
> +   (clobber (match_scratch:QISI 3			    "=r,X"))]
> +  ""
> +  "@
> +   mov %3, %2\;ASHRLP%=:\;qbeq ASHREND%=, %3, 0\; sub %3, %3, 1\; lsr\\t%0, %0, 1\; qbbc ASHRLP%=, %0, (%S0 * 8) - 2\; set %0, %0, (%S0 * 8) - 1\; jmp ASHRLP%=\;ASHREND%=:
> +   lsr\\t%0, %1, 1\;qbbc LSIGN%=, %0, (%S0 * 8) - 2\;set %0, %0, (%S0 * 8) - 1\;LSIGN%=:"
> +  [(set_attr "type" "complex,alu")
> +   (set_attr "length" "28,4")])

What do you mean by LRA not coping?  What did you try originally and
what went wrong?

> +(define_insn "<code>di3"
> +  [(set (match_operand:DI 0 "register_operand"		"=&r,&r")
> +	  (LOGICAL_BITOP:DI
> +	    (match_operand:DI 1 "register_operand"	"%r,r")
> +	    (match_operand:DI 2 "reg_or_ubyte_operand"	"r,I")))]
> +  ""
> +  "@
> +   <logical_bitop_asm>\\t%F0, %F1, %F2\;<logical_bitop_asm>\\t%N0, %N1, %N2
> +   <logical_bitop_asm>\\t%F0, %F1, %2\;<logical_bitop_asm>\\t%N0, %N1, 0"
> +  [(set_attr "type" "alu,alu")
> +   (set_attr "length" "8,8")])
> +
> +
> +(define_insn "one_cmpldi2"
> +  [(set (match_operand:DI 0 "register_operand"		"=r")
> +	(not:DI (match_operand:DI 1 "register_operand"	"r")))]
> +  ""
> +{
> +  /* careful with overlapping source and destination regs.  */
> +  gcc_assert (GP_REG_P (REGNO (operands[0])));
> +  gcc_assert (GP_REG_P (REGNO (operands[1])));
> +  if (REGNO (operands[0]) == (REGNO (operands[1]) + 4))
> +    return "not\\t%N0, %N1\;not\\t%F0, %F1";
> +  else
> +    return "not\\t%F0, %F1\;not\\t%N0, %N1";
> +}
> +  [(set_attr "type" "alu")
> +   (set_attr "length" "8")])

Do you see any code improvements from defining these patterns?
These days I'd expect you'd get better code by letting the target-independent
code split them up into SImode operations.

> +;; Multiply instruction.  Idea for fixing registers comes from the AVR backend.
> +
> +(define_expand "mulsi3"
> +  [(set (match_operand:SI 0 "register_operand" "")
> +	(mult:SI (match_operand:SI 1 "register_operand" "")
> +		 (match_operand:SI 2 "register_operand" "")))]
> +  ""
> +{
> +  emit_insn (gen_mulsi3_fixinp (operands[0], operands[1], operands[2]));
> +  DONE;
> +})
> +
> +
> +(define_expand "mulsi3_fixinp"
> +  [(set (reg:SI 112) (match_operand:SI 1 "register_operand" ""))
> +   (set (reg:SI 116) (match_operand:SI 2 "register_operand" ""))
> +   (set (reg:SI 104) (mult:SI (reg:SI 112) (reg:SI 116)))
> +   (set (match_operand:SI 0 "register_operand" "") (reg:SI 104))]
> +  ""
> +{
> +})

This seems slightly dangerous since there's nothing to guarantee that
those registers aren't already live at the point of expansion.
The more usual way (and IMO correct way) would be for:

> +(define_insn "*mulsi3_prumac"
> +  [(set (reg:SI 104) (mult:SI (reg:SI 112) (reg:SI 116)))]
> +  ""
> +  "nop\;xin\\t0, r26, 4"
> +  [(set_attr "type" "alu")
> +   (set_attr "length" "8")])

to have register classes and constraints for these three registers,
like e.g. the x86 port does for %cx etc.

It would be good if you could try that.  On the other hand, if AVR
already does this and it worked in practice then I guess it's not
something that should hold up the port.

> +(define_expand "doloop_end"
> +  [(use (match_operand 0 "nonimmediate_operand" ""))
> +   (use (label_ref (match_operand 1 "" "")))]
> +  "TARGET_OPT_LOOP"
> +  "if (GET_CODE (operands[0]) == REG && GET_MODE (operands[0]) == QImode)
> +     FAIL;
> +   pru_emit_doloop (operands, 1);
> +   DONE;
> +  ")

Would be more consistent to use { ... } instead of "..." to quote the code.

> +@item ctable_entry @var{index} @var{constant_address}
> +@cindex pragma, ctable_entry
> +Specifies that the given PRU CTABLE entry at @var{index} has a value
> +@var{constant_address}.  This enables GCC to emit LBCO/SBCO instructions
> +when the load/store address is known and can be addressed with some CTABLE
> +entry.  Example:

Maybe "...that the PRU CTABLE entry given by @var{index} has the value...".
Think "For example:" is used more than "Example:".

> @@ -23444,6 +23449,56 @@ the offset with a symbol reference to a canary in the TLS block.
>  @end table
>  
>  
> +@node PRU Options
> +@subsection PRU Options
> +@cindex PRU Options
> +
> +These command-line options are defined for PRU target:
> +
> +@table @gcctabopt
> +@item -minrt
> +@opindex minrt
> +Enable the use of a minimum runtime environment---no static
> +initializers or constructors.  Results in significant code size
> +reduction of final ELF binaries.

Up to you, but this read to me as "-m"+"inrt".  If this option is already
in use then obviously keep it, but if it's new then it might be worth
calling it -mminrt instead, for consistency with -mmcu.

Maybe s/---/, with no support for/?

Maybe "Using this option can significantly reduce the size of the
final ELF binary."

It might also be worth emphasising that this option only has an effect
when linking, and that the compiler doesn't (AFAICT) enforce that static
initialisers and constructors aren't used when -minrt is passed.

> +@item -mmcu=@var{mcu}
> +@opindex mmcu
> +Specify the PRU MCU variant to use.  Check Newlib for exact list of options.

s/for exact/for the exact/.
Maybe s/list of options/list of supported MCUs/?

> +@item -mno-relax
> +@opindex mno-relax
> +Pass on (or do not pass on) the @option{-mrelax} command-line option
> +to the linker.

This "Pass on (or do not pass on)" construct is only used when GCC
supports two contrasting options.  It looks like the port really
does only support for -mno-relax.  Also, the linker option is
--relax rather than -mrelax.  So maybe something like:

Make GCC pass @option{--no-relax} command-line option to the linker
instead of the @option{--relax} option.

The mno-relax entry in pru.opt should have RejectNegative, to prevent
"mno-no-relax" (unless something already prevents that these days).

> +@item -mabi=@var{variant}
> +@opindex mabi
> +Specify the ABI variant to output code for.  Permissible values are @samp{gnu}
> +for GCC, and @samp{ti} for fully conformant TI ABI.  These are the differences:
>
> +@table @samp
> +@item Function Pointer Size
> +TI ABI specifies that function (code) pointers are 16-bit, whereas GCC
> +supports only 32-bit data and code pointers.
>
> +@item Optional Return Value Pointer
> +Function return values larger than 64 bits are passed by using a hidden
> +pointer as the first argument of the function.  TI ABI, though, mandates that
> +the pointer can be NULL in case the caller is not using the returned value.
> +GCC always passes and expects a valid return value pointer.
> +
> +@end table
> +
> +The current @samp{mabi=ti} implementation simply raises a compile error
> +when any of the above code constructs is detected.

It might be better to reword this so that there's a clearer distinction
between "gnu" the GNU ABI for PRU on the one hand and what GCC the
compiler generates on the other.  Maybe after "Specify the ABI variant
to output code for.":

@option{-mbi=ti} selects the unmodified TI ABI while @option{-mbi=gnu}
selects a GNU variant that copes more naturally with certain GCC
assumptions.

and then refer to the "GNU ABI" instead of "GCC" when describing the
differences.

Also, this doesn't mention that -mabi=ti implies -mno-relax.
Would be good to mention that and explain why.

> +@item M
> +An integer constant in the range [-255;0].

Think [-255, 0] is more common.

> +
> +@item T
> +A text segment (program memory) constant label.
> +
> +@item Z
> +Integer constant zero.

Just checking, but did you deliberately leave out N?  That's fine if so,
but the docstring in constraints.md didn't have an "@internal" marker to
indicate it was internal-only.

Despite the long reply (sorry), this looks in really good shape to me FWIW.
Thanks for the submission.

Richard
Follow-Ups:
- Re: [PATCH v4 01/10] Initial TI PRU GCC port
  - From: Dimitar Dimitrov
References:
- [PATCH v4 00/10] New backend for the TI PRU processor
  - From: Dimitar Dimitrov
- [PATCH v4 01/10] Initial TI PRU GCC port
  - From: Dimitar Dimitrov
Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]