[PATCH v4 01/10] Initial TI PRU GCC port
Dimitar Dimitrov
dimitar@dinux.eu
Sat Sep 22 10:23:00 GMT 2018
On Thursday, 9/13/2018 13:02:21 EEST Richard Sandiford wrote:
> Dimitar Dimitrov <dimitar@dinux.eu> writes:
> > +(define_insn
> > "sub_impl<EQD:mode><EQS0:mode><EQS1:mode>_<alu3_zext><alu3_zext_op1><alu3
> > _zext_op2>" + [(set (match_operand:EQD 0 "register_operand" "=r,r,r")
> > + (minus:EQD
> > + (zero_extend:EQD
> > + (match_operand:EQS0 1 "reg_or_ubyte_operand" "r,r,I"))
> > + (zero_extend:EQD
> > + (match_operand:EQS1 2 "reg_or_ubyte_operand" "r,I,r"))))]
> > + ""
> > + "@
> > + sub\\t%0, %1, %2
> > + sub\\t%0, %1, %2
> > + rsb\\t%0, %2, %1"
> > + [(set_attr "type" "alu")
> > + (set_attr "length" "4")])
>
> By convention, subtraction patterns shouldn't accept constants for
> operand 2. Constants should instead be subtracted using an addition
> of the negative.
Understood. I will remove second alternative. But I will leave the third one
since it enables an important optimization:
unsigned test(unsigned a)
{
return 10-a;
}
RTL:
(insn 6 3 7 2 (set (reg:SI 152)
(minus:SI (const_int 10 [0xa])
(reg/v:SI 151 [ a ]))) "test.c":430 -1
(nil))
Assembly:
test:
rsb r14, r14, 10
ret
>
> > +(define_constraint "I"
> > + "An unsigned 8-bit constant."
> > + (and (match_code "const_int")
> > + (match_test "UBYTE_INT (ival)")))
>
> As it stands this will reject QImode constants with the top bit set,
> since const_ints are always stored in sign-extended form. E.g. QImode
> 128 is stored as (const_int -128) rather than (const_int 128).
>
> Unfortunately this is difficult to fix in a clean way, since
> const_ints don't store their mode (a long-standing wart) and unlike
> predicates, constraints aren't given a mode from context. The best
> way I can think of coping with it is:
>
> a) have a separate constraint for -128...127
> b) add a define_mode_attr that maps QI to the new constraint and
> HI and SI to I
> c) use <EQS1:...> etc. instead of I in the match_operands
>
> Similar comment for "J" and HImode, although you already have the
> "N" as the corresponding signed constraint and so don't need a new one.
Thank you. This strategy worked for QImode. I will include the changes in my
next patch.
Since PRU ALU operations do not have 16-bit constant operands, there is no
need to add define_mode_attr for HImode. The "mov" pattern already handles the
"N" [-32768, 32767] constraint.
>
>
> > +;; Return true if OP is a text segment reference.
> > +;; This is needed for program memory address expressions. Borrowed from
> > AVR. +(define_predicate "text_segment_operand"
> > + (match_code "code_label,label_ref,symbol_ref,plus,minus,const")
> > +{
> > + switch (GET_CODE (op))
> > + {
> > + case CODE_LABEL:
> > + return true;
> > + case LABEL_REF :
> > + return true;
> > + case SYMBOL_REF :
> > + return SYMBOL_REF_FUNCTION_P (op);
> > + case PLUS :
> > + case MINUS :
> > + /* Assume canonical format of symbol + constant.
> > + Fall through. */
> > + case CONST :
> > + return text_segment_operand (XEXP (op, 0), VOIDmode);
> > + default :
> > + return false;
> > + }
> > +})
>
> This probably comes from AVR, but: no spaces before ":".
>
> Bit surprised that we can get a CODE_LABEL rather than a LABEL_REF here.
> Do you know if that triggers in practice, and if so where?
Indeed, CODE_LABEL case is never reached. I'll leave gcc_unreachable here.
> An IMO neater and slightly more robust way of writing the body is:
> poly_int64 offset:
> rtx base = strip_offset (op, &offset);
> switch (GET_CODE (base))
>
> {
>
> case LABEL_REF:
> ...as above...
>
> case SYMBOL_REF:
> ...as above...
>
> default:
> return false;
>
> }
>
> with "plus" and "minus" not in the match_code list (since they should
> always appear in consts if they really are text references).
The "plus" and "minus" are needed when handling code labels as values. Take
for example the following construct:
int x = &&lab1 - &&lab0;
lab1:
...
lab2:
My TARGET_ASM_INTEGER callback uses the text_segment_operand predicate. In the
above case it is passed the following RTL expression:
(minus:SI (label_ref/v:SI 20)
(label_ref/v:SI 27))
I need to detect text labels so that I annotate them with %pmem:
.4byte %pmem(.L4-(.L2))
Instead of the incorrect:
.4byte .L3-(.L2)
> > +;; Return true if OP is a load multiple operation. It is known to be a
> > +;; PARALLEL and the first section will be tested.
> > +
> > +(define_special_predicate "load_multiple_operation"
> > + (match_code "parallel")
> > +{
> > + machine_mode elt_mode;
> > + int count = XVECLEN (op, 0);
> > + unsigned int dest_regno;
> > + rtx src_addr;
> > + int i, off;
> > +
> > + /* Perform a quick check so we don't blow up below. */
> > + if (GET_CODE (XVECEXP (op, 0, 0)) != SET
> > + || GET_CODE (SET_DEST (XVECEXP (op, 0, 0))) != REG
> > + || GET_CODE (SET_SRC (XVECEXP (op, 0, 0))) != MEM)
> > + return false;
> > +
> > + dest_regno = REGNO (SET_DEST (XVECEXP (op, 0, 0)));
> > + src_addr = XEXP (SET_SRC (XVECEXP (op, 0, 0)), 0);
> > + elt_mode = GET_MODE (SET_DEST (XVECEXP (op, 0, 0)));
> > +
> > + /* Check, is base, or base + displacement. */
> > +
> > + if (GET_CODE (src_addr) == REG)
> > + off = 0;
> > + else if (GET_CODE (src_addr) == PLUS
> > + && GET_CODE (XEXP (src_addr, 0)) == REG
> > + && GET_CODE (XEXP (src_addr, 1)) == CONST_INT)
> > + {
> > + off = INTVAL (XEXP (src_addr, 1));
> > + src_addr = XEXP (src_addr, 0);
> > + }
> > + else
> > + return false;
> > +
> > + for (i = 1; i < count; i++)
> > + {
> > + rtx elt = XVECEXP (op, 0, i);
> > +
> > + if (GET_CODE (elt) != SET
> > + || GET_CODE (SET_DEST (elt)) != REG
> > + || GET_MODE (SET_DEST (elt)) != elt_mode
> > + || REGNO (SET_DEST (elt)) != dest_regno + i
> > + || GET_CODE (SET_SRC (elt)) != MEM
> > + || GET_MODE (SET_SRC (elt)) != elt_mode
> > + || GET_CODE (XEXP (SET_SRC (elt), 0)) != PLUS
> > + || ! rtx_equal_p (XEXP (XEXP (SET_SRC (elt), 0), 0), src_addr)
> > + || GET_CODE (XEXP (XEXP (SET_SRC (elt), 0), 1)) != CONST_INT
> > + || INTVAL (XEXP (XEXP (SET_SRC (elt), 0), 1))
> > + != off + i * GET_MODE_SIZE (elt_mode))
> > + return false;
> > + }
> > +
> > + return true;
> > +})
>
> This may not matter in practice, but the downside of testing PLUS and
> CONST_INT explicitly is that the predicate won't match something like:
>
> (set (reg R1) (mem (plus (reg RB) (const_int -4))))
> (set (reg R2) (mem (reg RB)))
> (set (reg R3) (mem (plus (reg RB) (const_int 4))))
> ...
>
> Using strip_offset on the addresses would avoid this.
It does not matter for PRU since all offsets are unsigned integers. But I will
switch to strip_offset to make the code cleaner.
> > +/* Callback for walk_gimple_seq that checks TP tree for TI ABI
> > compliance. */ +static tree
> > +check_op_callback (tree *tp, int *walk_subtrees, void *data)
> > +{
> > + struct walk_stmt_info *wi = (struct walk_stmt_info *) data;
> > +
> > + if (RECORD_OR_UNION_TYPE_P (*tp) || TREE_CODE (*tp) == ENUMERAL_TYPE)
> > + {
> > + /* Forward declarations have NULL tree type. Skip them. */
> > + if (TREE_TYPE (*tp) == NULL)
> > + return NULL;
> > + }
> > +
> > + /* TODO - why C++ leaves INTEGER_TYPE forward declarations around? */
> > + if (TREE_TYPE (*tp) == NULL)
> > + return NULL;
> > +
> > + const tree type = TREE_TYPE (*tp);
> > +
> > + /* Direct function calls are allowed, obviously. */
> > + if (TREE_CODE (*tp) == ADDR_EXPR && TREE_CODE (type) == POINTER_TYPE)
> > + {
> > + const tree ptype = TREE_TYPE (type);
> > + if (TREE_CODE (ptype) == FUNCTION_TYPE)
> > + return NULL;
> > + }
>
> This seems like a bit of a dangerous heuristic. Couldn't it also cause
> us to skip things like:
>
> (void *) func
>
> ? (OK, that's dubious C, but we do support it.)
The cast yields a "data pointer", which is 32 bits for both types of ABI.
Hence it is safe to skip "(void *) func".
The TI ABI's 16 bit function pointers become a problem when they change the
layout of structures and function argument registers.
> > +/* Implement TARGET_MODES_TIEABLE_P. */
> > +
> > +static bool
> > +pru_modes_tieable_p (machine_mode mode1, machine_mode mode2)
> > +{
> > + return (mode1 == mode2
> > + || (GET_MODE_SIZE (mode1) <= 4 && GET_MODE_SIZE (mode2) <= 4));
> > +}
>
> I think this should have a comment explaining why this implementation
> is needed.
Sorry, this is stale. I will altogether remove the callback, and will use the
default "true" implementation. PRU registers are fairly orthogonal.
A few years ago I got confused and considered this a requirement for correct
64 bit operations.
>
> > + {
> > + *total = 0;
> > + }
> > + else
> > + {
> > + /* SI move has the same cost as a QI move. */
> > + int factor = GET_MODE_SIZE (mode) / GET_MODE_SIZE (SImode);
> > + if (factor == 0)
> > + factor = 1;
> > + *total = factor * COSTS_N_INSNS (1);
> > + }
>
> Could you explain this a bit more? It looks like a pure QImode operation
> gets a cost of 1 insn but an SImode operation zero-extended from QImode
> gets a cost of 0.
I have unintentionally bumped QI cost while trying to make zero-extends cheap.
I will fix by using the following simple logic:
case SET:
mode = GET_MODE (SET_DEST (x));
/* SI move has the same cost as a QI move. Moves larger than
64 bits are costly. */
int factor = CEIL (GET_MODE_SIZE (mode), GET_MODE_SIZE (SImode));
*total = factor * COSTS_N_INSNS (1);
>
> [...]
>
> > +/* Implement TARGET_PREFERRED_RELOAD_CLASS. */
> > +static reg_class_t
> > +pru_preferred_reload_class (rtx x ATTRIBUTE_UNUSED, reg_class_t regclass)
> > +{
> > + return regclass == NO_REGS ? GENERAL_REGS : regclass;
> > +}
>
> This looks odd: PREFERRED_RELOAD_CLASS should return a subset of the
> original class rather than add new registers. Can you remember why
> it was needed?
I'm not sure what target code is supposed to return when NO_REGS is passed
here. I saw how other ports handle NO_REGS (c-sky, m32c, nios2, rl78). So I
followed suite.
Here is a backtrace of LRA when NO_REGS is passed:
0xf629ae pru_preferred_reload_class
../../gcc/gcc/config/pru/pru.c:788
0xa3d6e8 process_alt_operands
../../gcc/gcc/lra-constraints.c:2600
0xa3ef7d curr_insn_transform
../../gcc/gcc/lra-constraints.c:3889
0xa4301e lra_constraints(bool)
../../gcc/gcc/lra-constraints.c:4906
0xa2726c lra(_IO_FILE*)
../../gcc/gcc/lra.c:2446
0x9c97a9 do_reload
../../gcc/gcc/ira.c:5469
0x9c97a9 execute
../../gcc/gcc/ira.c:5653
>
> > + /* Floating-point comparisons. */
> > + eqsf_libfunc = init_one_libfunc ("__pruabi_eqf");
> > + nesf_libfunc = init_one_libfunc ("__pruabi_neqf");
> > + lesf_libfunc = init_one_libfunc ("__pruabi_lef");
> > + ltsf_libfunc = init_one_libfunc ("__pruabi_ltf");
> > + gesf_libfunc = init_one_libfunc ("__pruabi_gef");
> > + gtsf_libfunc = init_one_libfunc ("__pruabi_gtf");
> > + eqdf_libfunc = init_one_libfunc ("__pruabi_eqd");
> > + nedf_libfunc = init_one_libfunc ("__pruabi_neqd");
> > + ledf_libfunc = init_one_libfunc ("__pruabi_led");
> > + ltdf_libfunc = init_one_libfunc ("__pruabi_ltd");
> > + gedf_libfunc = init_one_libfunc ("__pruabi_ged");
> > + gtdf_libfunc = init_one_libfunc ("__pruabi_gtd");
> > +
> > + set_optab_libfunc (eq_optab, SFmode, NULL);
> > + set_optab_libfunc (ne_optab, SFmode, "__pruabi_neqf");
> > + set_optab_libfunc (gt_optab, SFmode, NULL);
> > + set_optab_libfunc (ge_optab, SFmode, NULL);
> > + set_optab_libfunc (lt_optab, SFmode, NULL);
> > + set_optab_libfunc (le_optab, SFmode, NULL);
> > + set_optab_libfunc (unord_optab, SFmode, "__pruabi_unordf");
> > + set_optab_libfunc (eq_optab, DFmode, NULL);
> > + set_optab_libfunc (ne_optab, DFmode, "__pruabi_neqd");
> > + set_optab_libfunc (gt_optab, DFmode, NULL);
> > + set_optab_libfunc (ge_optab, DFmode, NULL);
> > + set_optab_libfunc (lt_optab, DFmode, NULL);
> > + set_optab_libfunc (le_optab, DFmode, NULL);
>
> So you need to pass NULL for the ordered comparisons because the ABI
> routines return a different value from the one GCC expects? Might be
> worth a comment if so.
Yes, I will comment.
>
> > +/* Emit comparison instruction if necessary, returning the expression
> > + that holds the compare result in the proper mode. Return the
> > comparison + that should be used in the jump insn. */
> > +
> > +rtx
> > +pru_expand_fp_compare (rtx comparison, machine_mode mode)
> > +{
> > + enum rtx_code code = GET_CODE (comparison);
> > + rtx op0 = XEXP (comparison, 0);
> > + rtx op1 = XEXP (comparison, 1);
> > + rtx cmp;
> > + enum rtx_code jump_code = code;
> > + machine_mode op_mode = GET_MODE (op0);
> > + rtx_insn *insns;
> > + rtx libfunc;
> > +
> > + gcc_assert (op_mode == DFmode || op_mode == SFmode);
> > +
> > + if (code == UNGE)
> > + {
> > + code = LT;
> > + jump_code = EQ;
> > + }
> > + else if (code == UNLE)
> > + {
> > + code = GT;
> > + jump_code = EQ;
> > + }
>
> This is only safe if the LT and GT libcalls don't raise exceptions for
> unordered inputs. I'm guessing they don't, but it's probably worth a
> comment if so.
Yes, I will comment.
>
> > +/* Return true if register REGNO is a valid base register.
> > + STRICT_P is true if REG_OK_STRICT is in effect. */
> > +
> > +bool
> > +pru_regno_ok_for_base_p (int regno, bool strict_p)
> > +{
> > + if (!HARD_REGISTER_NUM_P (regno))
> > + {
> > + if (!strict_p)
> > + return true;
> > +
> > + if (!reg_renumber)
> > + return false;
> > +
> > + regno = reg_renumber[regno];
>
> reg_renumber is a reload-only thing and shouldn't be needed/used on
> LRA targets.
I will remove the snippet. It's a remnant from the PRU's reload days.
>
> > + if (REG_P (base)
> > + && pru_regno_ok_for_base_p (REGNO (base), strict_p)
> > + && (offset == NULL_RTX
> > + || (CONST_INT_P (offset)
> > + && pru_valid_const_ubyte_offset (mode, INTVAL (offset)))
> > + || (REG_P (offset)
> > + && pru_regno_ok_for_index_p (REGNO (offset), strict_p))))
> > + {
> > + /* base register + register offset
> > + * OR base register + UBYTE constant offset. */
> > + return true;
> > + }
> > + else if (REG_P (base)
> > + && pru_regno_ok_for_index_p (REGNO (base), strict_p)
> > + && (offset != NULL_RTX && ctable_base_operand (offset, VOIDmode)))
> > + {
> > + /* base CTABLE constant base + register offset
> > + * Note: GCC always puts the register as a first operand of PLUS.
> > */ + return true;
> > + }
> > + else if (CONST_INT_P (base)
> > + && offset == NULL_RTX
> > + && (ctable_addr_operand (base, VOIDmode)))
> > + {
> > + /* base CTABLE constant base + UBYTE constant offset. */
> > + return true;
>
> This address should simply be a (const_int ...), with no containing (plus
> ...).
Yes, my bad. Will remove this snippet.
> > + case SUBREG:
> > + case MEM:
> > + if (letter == 0)
> > + {
> > + output_address (VOIDmode, op);
> > + return;
> > + }
>
> Does the SUBREG case ever hit? They shouldn't appear as operands
> at this late stage.
No, it does not hit. I will put gcc_unreachable().
> > +/* Helper function to get the starting storage HW register for an
> > argument, + or -1 if it must be passed on stack. The cum_v state is
> > not changed. */ +static int
> > +pru_function_arg_regi (cumulative_args_t cum_v,
> > + machine_mode mode, const_tree type,
> > + bool named)
> > +{
> > + CUMULATIVE_ARGS *cum = get_cumulative_args (cum_v);
> > + size_t argsize = pru_function_arg_size (mode, type);
> > + size_t i, bi;
> > + int regi = -1;
> > +
> > + if (!pru_arg_in_reg_bysize (argsize))
> > + return -1;
> > +
> > + if (!named)
> > + return -1;
> > +
> > + /* Find the first available slot that fits. Yes, that's the PRU ABI.
> > */ + for (i = 0; regi < 0 && i < ARRAY_SIZE (cum->regs_used); i++)
> > + {
> > + if (mode == BLKmode)
> > + {
> > + /* Structs are passed, beginning at a full register. */
> > + if ((i % 4) != 0)
> > + continue;
> > + }
>
> The comment doesn't seem to match the code: a struct like:
>
> struct s { short x; };
>
> will have HImode rather than BLKmode.
I will update the comment to clarify it is only for large structs. Code should
be ok.
>
> It's usually better to base ABI stuff on the type where possible,
> since that corresponds directly to the source language, while modes
> are more of an internal GCC thing.
I will explore this option. But ABI is already defined in data sizes, which I
think neatly fit with GCC's modes.
>
> > +(define_insn "subdi3"
> > + [(set (match_operand:DI 0 "register_operand" "=&r,&r,&r")
> > + (minus:DI (match_operand:DI 1 "register_operand" "r,r,I")
> > + (match_operand:DI 2 "reg_or_ubyte_operand" "r,I,r")))]
> > + ""
> > + "@
> > + sub\\t%F0, %F1, %F2\;suc\\t%N0, %N1, %N2
> > + sub\\t%F0, %F1, %2\;suc\\t%N0, %N1, 0
> > + rsb\\t%F0, %F2, %1\;rsc\\t%N0, %N2, 0"
> > + [(set_attr "type" "alu,alu,alu")
> > + (set_attr "length" "8,8,8")])
>
> As above, this shouldn't handle operand 2 being constant; that should
> always go through adddi3 instead.
I will remove alternative 2, but will leave 3.
>
> > +; LRA cannot cope with clobbered op2, hence the scratch register.
> > +(define_insn "ashr<mode>3"
> > + [(set (match_operand:QISI 0 "register_operand" "=&r,r")
> > + (ashiftrt:QISI
> > + (match_operand:QISI 1 "register_operand" "0,r")
> > + (match_operand:QISI 2 "reg_or_const_1_operand" "r,P")))
> > + (clobber (match_scratch:QISI 3 "=r,X"))]
> > + ""
> > + "@
> > + mov %3, %2\;ASHRLP%=:\;qbeq ASHREND%=, %3, 0\; sub %3, %3, 1\;
> > lsr\\t%0, %0, 1\; qbbc ASHRLP%=, %0, (%S0 * 8) - 2\; set %0, %0, (%S0 *
> > 8) - 1\; jmp ASHRLP%=\;ASHREND%=: + lsr\\t%0, %1, 1\;qbbc LSIGN%=, %0,
> > (%S0 * 8) - 2\;set %0, %0, (%S0 * 8) - 1\;LSIGN%=:" + [(set_attr "type"
> > "complex,alu")
> > + (set_attr "length" "28,4")])
>
> What do you mean by LRA not coping? What did you try originally and
> what went wrong?
Better assembly could be generated if the shift count register was used for a
loop counter. Pseudo code:
while (op2--)
op0 >>= 1;
I originally tried to clobber operand 2:
(define_insn "ashr<mode>3"
[(set (match_operand:QISI 0 "register_operand" "=&r,r")
(ashiftrt:QISI
(match_operand:QISI 1 "register_operand" "0,r")
(match_operand:QISI 2 "reg_or_const_1_operand" "+r,P"))
)]
But with the above pattern operand 2 was not clobbered. Its value was deemed
untouched (i.e. live across the pattern). So I came up with clobbering a
separate register to fix this, at the expense of slightly bigger generated
code.
>
> > +(define_insn "<code>di3"
> > + [(set (match_operand:DI 0 "register_operand" "=&r,&r")
> > + (LOGICAL_BITOP:DI
> > + (match_operand:DI 1 "register_operand" "%r,r")
> > + (match_operand:DI 2 "reg_or_ubyte_operand" "r,I")))]
> > + ""
> > + "@
> > + <logical_bitop_asm>\\t%F0, %F1, %F2\;<logical_bitop_asm>\\t%N0, %N1,
> > %N2 + <logical_bitop_asm>\\t%F0, %F1, %2\;<logical_bitop_asm>\\t%N0,
> > %N1, 0" + [(set_attr "type" "alu,alu")
> > + (set_attr "length" "8,8")])
> > +
> > +
> > +(define_insn "one_cmpldi2"
> > + [(set (match_operand:DI 0 "register_operand" "=r")
> > + (not:DI (match_operand:DI 1 "register_operand" "r")))]
> > + ""
> > +{
> > + /* careful with overlapping source and destination regs. */
> > + gcc_assert (GP_REG_P (REGNO (operands[0])));
> > + gcc_assert (GP_REG_P (REGNO (operands[1])));
> > + if (REGNO (operands[0]) == (REGNO (operands[1]) + 4))
> > + return "not\\t%N0, %N1\;not\\t%F0, %F1";
> > + else
> > + return "not\\t%F0, %F1\;not\\t%N0, %N1";
> > +}
> > + [(set_attr "type" "alu")
> > + (set_attr "length" "8")])
>
> Do you see any code improvements from defining these patterns?
> These days I'd expect you'd get better code by letting the
> target-independent code split them up into SImode operations.
Yes, these patterns improve code size and speed.
Looking at expand_binop(), DI logical ops are split into WORD mode, which in
PRU's peculiar case is QI. So without those patterns, a 64 bit IOR generates 8
QI instructions:
or r0.b0, r14.b0, r16.b0
or r0.b1, r14.b1, r16.b1
or r0.b2, r14.b2, r16.b2
or r0.b3, r14.b3, r16.b3
or r1.b0, r15.b0, r17.b0
or r1.b1, r15.b1, r17.b1
or r1.b2, r15.b2, r17.b2
or r1.b3, r15.b3, r17.b3
Whereas with patterns defined, it is only 2 SImode instructions:
or r0, r14, r16
or r1, r15, r17
>
> > +;; Multiply instruction. Idea for fixing registers comes from the AVR
> > backend. +
> > +(define_expand "mulsi3"
> > + [(set (match_operand:SI 0 "register_operand" "")
> > + (mult:SI (match_operand:SI 1 "register_operand" "")
> > + (match_operand:SI 2 "register_operand" "")))]
> > + ""
> > +{
> > + emit_insn (gen_mulsi3_fixinp (operands[0], operands[1], operands[2]));
> > + DONE;
> > +})
> > +
> > +
> > +(define_expand "mulsi3_fixinp"
> > + [(set (reg:SI 112) (match_operand:SI 1 "register_operand" ""))
> > + (set (reg:SI 116) (match_operand:SI 2 "register_operand" ""))
> > + (set (reg:SI 104) (mult:SI (reg:SI 112) (reg:SI 116)))
> > + (set (match_operand:SI 0 "register_operand" "") (reg:SI 104))]
> > + ""
> > +{
> > +})
>
> This seems slightly dangerous since there's nothing to guarantee that
> those registers aren't already live at the point of expansion.
>
> The more usual way (and IMO correct way) would be for:
> > +(define_insn "*mulsi3_prumac"
> > + [(set (reg:SI 104) (mult:SI (reg:SI 112) (reg:SI 116)))]
> > + ""
> > + "nop\;xin\\t0, r26, 4"
> > + [(set_attr "type" "alu")
> > + (set_attr "length" "8")])
>
> to have register classes and constraints for these three registers,
> like e.g. the x86 port does for %cx etc.
>
> It would be good if you could try that. On the other hand, if AVR
> already does this and it worked in practice then I guess it's not
> something that should hold up the port.
This suggestion worked. It was a matter of correct predicates.
Here is what I will put in the next patch version:
(define_predicate "pru_muldst_operand"
(and (match_code "reg")
(ior (match_test "REGNO_REG_CLASS (REGNO (op)) == MULDST_REGS")
(match_test "REGNO (op) >= FIRST_PSEUDO_REGISTER"))))
...
(define_register_constraint "Rmd0" "MULDST_REGS"
"@internal
The multiply destination register.")
...
(define_insn "mulsi3"
[(set (match_operand:SI 0 "pru_muldst_operand" "=Rmd0")
(mult:SI (match_operand:SI 1 "pru_mulsrc0_operand" "Rms0")
(match_operand:SI 2 "pru_mulsrc1_operand" "Rms1")))]
>
> > @@ -23444,6 +23449,56 @@ the offset with a symbol reference to a canary in
> > the TLS block.>
> > @end table
> >
> > +@node PRU Options
> > +@subsection PRU Options
> > +@cindex PRU Options
> > +
> > +These command-line options are defined for PRU target:
> > +
> > +@table @gcctabopt
> > +@item -minrt
> > +@opindex minrt
> > +Enable the use of a minimum runtime environment---no static
> > +initializers or constructors. Results in significant code size
> > +reduction of final ELF binaries.
>
> Up to you, but this read to me as "-m"+"inrt". If this option is already
> in use then obviously keep it, but if it's new then it might be worth
> calling it -mminrt instead, for consistency with -mmcu.
I assumed this is a standard naming since MSP430 already uses it. I've seen
the option being utilized by pru-gcc users, so let's keep it that way.
> > +@item T
> > +A text segment (program memory) constant label.
> > +
> > +@item Z
> > +Integer constant zero.
>
> Just checking, but did you deliberately leave out N? That's fine if so,
> but the docstring in constraints.md didn't have an "@internal" marker to
> indicate it was internal-only.
I did not know about "@internal. I will annotate appropriately.
>
> Despite the long reply (sorry), this looks in really good shape to me FWIW.
> Thanks for the submission.
I take this sentence as an indication that the PRU port is welcome for
inclusion into GCC. I'll work on fixing the comments and will send a new patch
version.
Thank you for the in-depth review and valuable suggestions.
Regards,
Dimitar
More information about the Gcc-patches
mailing list