[PATCH 2/6] RISC-V Port: gcc

Tue Jan 31 01:21:00 GMT 2017

On Fri, Jan 20, 2017 at 10:41 PM, Richard Henderson <rth@redhat.com> wrote:
> On 01/11/2017 06:30 PM, Palmer Dabbelt wrote:
>>
>> +(define_register_constraint "f" "TARGET_HARD_FLOAT ? FP_REGS : NO_REGS"
>> +  "A floating-point register (if available).")
>> +
>
>
> I know this is the Traditional Way, but I do wonder if using the new enable
> attribute on the alternatives would be better.  No need to change change;
> just something that makes me wonder.

Yeah, we could look into that later.

>
>> +(define_constraint "Q"
>> +  "@internal"
>> +  (match_operand 0 "const_arith_operand"))
>
>
> How does this differ from "I"?
>
>> +
>> +(define_memory_constraint "W"
>> +  "@internal
>> +   A memory address based on a member of @code{BASE_REG_CLASS}."
>> +  (and (match_code "mem")
>> +       (match_operand 0 "memory_operand")))
>
>
> How does this differ (materially) from "m"?  Or from just
>
>   (match_operand 0 "memory_operand")
>
> for that matter?
>
>
>> +;;........................
>> +;; DI -> SI optimizations
>> +;;........................
>> +
>> +;; Simplify (int)(a + 1), etc.
>> +(define_peephole2
>> +  [(set (match_operand:DI 0 "register_operand")
>> +       (match_operator:DI 4 "modular_operator"
>> +         [(match_operand:DI 1 "register_operand")
>> +          (match_operand:DI 2 "arith_operand")]))
>> +   (set (match_operand:SI 3 "register_operand")
>> +       (truncate:SI (match_dup 0)))]
>> +  "TARGET_64BIT && (REGNO (operands[0]) == REGNO (operands[3]) ||
>> peep2_reg_dead_p (2, operands[0]))
>> +   && (GET_CODE (operands[4]) != ASHIFT || (CONST_INT_P (operands[2]) &&
>> INTVAL (operands[2]) < 32))"
>> +  [(set (match_dup 3)
>> +         (truncate:SI
>> +            (match_op_dup:DI 4
>> +              [(match_operand:DI 1 "register_operand")
>> +               (match_operand:DI 2 "arith_operand")])))])
>
>
> I take from this that combine + ree fail to do the job?
>
> I must say I'm less than thrilled about the use of truncate instead of
> simply properly representing the sign-extensions that are otherwise required
> by the ABI.  RISCV is unlike MIPS in that the 32-bit operations (addw et al)
> do not require sign-extended inputs in order to produce a correct output
> value.
>
> Consider Alpha as a counter-example to MIPS, where the ABI requires
> sign-extensions at function edges and the ISA requires sign-extensions for
> comparisons.

We've revamped the port to use TRULY_NOOP_TRUNCATION.

>>
>>
>> +(define_insn "*local_pic_storesi<mode>"
>> +  [(set (mem:ANYF (match_operand 0 "absolute_symbolic_operand" ""))
>> +       (match_operand:ANYF 1 "register_operand" "f"))
>> +   (clobber (reg:SI T0_REGNUM))]
>> +  "TARGET_HARD_FLOAT && !TARGET_64BIT && USE_LOAD_ADDRESS_MACRO
>> (operands[0])"
>> +  "<store>\t%1,%0,t0"
>> +  [(set (attr "length") (const_int 8))])
>
>
> Use match_scratch so that you need not hard-code T0.
> And likewise for the other pic stores.
>
>> +(define_predicate "const_0_operand"
>> +  (and (match_code "const_int,const_double,const_vector")
>> +       (match_test "op == CONST0_RTX (GET_MODE (op))")))
>
>
> Probably want to support const_wide_int.
>
>> +  /* Don't handle multi-word moves this way; we don't want to introduce
>> +     the individual word-mode moves until after reload.  */
>> +  if (GET_MODE_SIZE (mode) > UNITS_PER_WORD)
>> +    return false;
>
>
> Why not?  Do the subreg passes not do a good job?  I would expect it to do
> well on a target like RISCV.
>
>> +(define_predicate "move_operand"
>> +  (match_operand 0 "general_operand")
>
>
> Normally this is handled by TARGET_LEGITIMATE_CONSTANT_P.
>
>> +/* Target CPU builtins.  */
>> +#define TARGET_CPU_CPP_BUILTINS()                                      \
>> +  do                                                                   \
>> +    {                                                                  \
>> +      builtin_define ("__riscv");                                      \
>
>
> Perhaps better to move this to riscv-c.c?
>
>> +      if (TARGET_MUL)                                                  \
>> +       builtin_define ("__riscv_mul");                                 \
>> +      if (TARGET_DIV)                                                  \
>> +       builtin_define ("__riscv_div");                                 \
>> +      if (TARGET_DIV && TARGET_MUL)                                    \
>> +       builtin_define ("__riscv_muldiv");                              \
>
>
> Out of curiosity, why are these split when the M extension doesn't do so?

Implementations that have MUL but not DIV cannot claim conformance to
the M extension, but they do exist, and it's easy enough for us to
support them.  AFAIK, they are all FPGA soft cores, where MUL maps
very cheaply to the DSP macros, but DIV is relatively more expensive.

>
>> +/* Define this macro if it is advisable to hold scalars in registers
>> +   in a wider mode than that declared by the program.  In such cases,
>> +   the value is constrained to be within the bounds of the declared
>> +   type, but kept valid in the wider mode.  The signedness of the
>> +   extension may differ from that of the type.  */
>> +
>> +#define PROMOTE_MODE(MODE, UNSIGNEDP, TYPE)    \
>> +  if (GET_MODE_CLASS (MODE) == MODE_INT                \
>> +      && GET_MODE_SIZE (MODE) < 4)             \
>> +    {                                          \
>> +      (MODE) = Pmode;                          \
>> +    }
>
>
> I think you want to force unsignedp false for SImode when extending to
> DImode.
>
>> +/* a0-a7, t0-a6, fa0-fa7, and ft0-ft11 are volatile across calls.
>> +   The call RTLs themselves clobber ra.  */
>> +
>> +#define CALL_USED_REGISTERS                                            \
>> +{ /* General registers.  */                                            \
>> +  1, 0, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1,                      \
>> +  1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1,                      \
>> +  /* Floating-point registers.  */                                     \
>> +  1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1,                      \
>> +  1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1,                      \
>> +  /* Others.  */                                                       \
>> +  1, 1                                                                 \
>> +}
>> +
>> +#define CALL_REALLY_USED_REGISTERS                                     \
>> +{ /* General registers.  */                                            \
>> +  1, 0, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1,                      \
>> +  1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1,                      \
>> +  /* Floating-point registers.  */                                     \
>> +  1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1,                      \
>> +  1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1,                      \
>> +  /* Others.  */                                                       \
>> +  1, 1                                                                 \
>> +}
>
>
> You shouldn't have to replicate these if they're the same.
>
>> +/* Return coprocessor number from register number.  */
>> +
>> +#define COPNUM_AS_CHAR_FROM_REGNUM(REGNO)                              \
>> +  (COP0_REG_P (REGNO) ? '0' : COP2_REG_P (REGNO) ? '2'                 \
>> +   : COP3_REG_P (REGNO) ? '3' : '?')
>> +
>
>
> MIPS cruft.
>
>> +typedef struct {
>> +  /* Number of integer registers used so far, up to
>> MAX_ARGS_IN_REGISTERS. */
>> +  unsigned int num_gprs;
>> +
>> +  /* Number of floating-point registers used so far, likewise.  */
>> +  unsigned int num_fprs;
>> +} CUMULATIVE_ARGS;
>
>
> Related to something I mentioned for patch 1: why isn't this just a single
> int?
>
>
>> +(define_c_enum "unspec" [
>> +  ;; GP manipulation.
>> +  UNSPEC_EH_RETURN
>> +
>> +  ;; Symbolic accesses.  The order of this list must match that of
>> +  ;; enum riscv_symbol_type in riscv-protos.h.
>> +  UNSPEC_ADDRESS_FIRST
>> +  UNSPEC_PCREL
>> +  UNSPEC_LOAD_GOT
>> +  UNSPEC_TLS
>> +  UNSPEC_TLS_LE
>> +  UNSPEC_TLS_IE
>> +  UNSPEC_TLS_GD
>> +
>> +  UNSPEC_AUIPC
>> +
>> +  ;; Register save and restore.
>> +  UNSPEC_GPR_SAVE
>> +  UNSPEC_GPR_RESTORE
>> +
>> +  ;; Blockage and synchronisation.
>> +  UNSPEC_BLOCKAGE
>> +  UNSPEC_FENCE
>> +  UNSPEC_FENCE_I
>> +])
>
>
> Some of these are unspec_volatile, and so should be in define_c_enum
> "unspecv".
>
>> +(define_expand "add<mode>3"
>> +  [(set (match_operand:GPR 0 "register_operand")
>> +       (plus:GPR (match_operand:GPR 1 "register_operand")
>> +                 (match_operand:GPR 2 "arith_operand")))]
>> +  "")
>> +
>> +(define_insn "*addsi3"
>> +  [(set (match_operand:SI 0 "register_operand" "=r,r")
>> +       (plus:SI (match_operand:GPR 1 "register_operand" "r,r")
>> +                 (match_operand:GPR2 2 "arith_operand" "r,Q")))]
>> +  ""
>> +  { return TARGET_64BIT ? "addw\t%0,%1,%2" : "add\t%0,%1,%2"; }
>> +  [(set_attr "type" "arith")
>> +   (set_attr "mode" "SI")])
>
>
> Err... mismatched input modes?  That's wrong.
>
>> +
>> +(define_insn "*adddi3"
>> +  [(set (match_operand:DI 0 "register_operand" "=r,r")
>> +       (plus:DI (match_operand:DI 1 "register_operand" "r,r")
>> +                 (match_operand:DI 2 "arith_operand" "r,Q")))]
>> +  "TARGET_64BIT"
>> +  "add\t%0,%1,%2"
>> +  [(set_attr "type" "arith")
>> +   (set_attr "mode" "DI")])
>
>
> Remove the addsi3 weirdness and it doesn't appear that you need the
> define_expand.
>
> Likewise for sub and mul.
>
>> +(define_expand "<u>mulditi3"
>> +  [(set (match_operand:TI 0 "register_operand")
>> +       (mult:TI (any_extend:TI (match_operand:DI 1 "register_operand"))
>> +                (any_extend:TI (match_operand:DI 2
>> "register_operand"))))]
>> +  "TARGET_MUL && TARGET_64BIT"
>> +{
>> +  rtx low = gen_reg_rtx (DImode);
>> +  emit_insn (gen_muldi3 (low, operands[1], operands[2]));
>> +
>> +  rtx high = gen_reg_rtx (DImode);
>> +  emit_insn (gen_<u>muldi3_highpart (high, operands[1], operands[2]));
>> +
>> +  emit_move_insn (gen_lowpart (DImode, operands[0]), low);
>> +  emit_move_insn (gen_highpart (DImode, operands[0]), high);
>> +  DONE;
>> +})
>
>
> FWIW, the manual recommends the sequence MULHU; MUL if you need both the
> high and low parts.  Whether that implies that you keep those together as a
> unit until final output, or whether you simply emit them in the other order,
> one would need to consult an actual hardware manual.

The manual means to suggest they should be kept together.  However, no
implementations I know of take advantage of this fusion opportunity,
and so scheduling them apart is actually preferable.  Once such an
implementation exists, we can make this decision consequent to -mtune.

>>
>>
>> +(define_insn "floatsidf2"
>> +  [(set (match_operand:DF 0 "register_operand" "=f")
>> +       (float:DF (match_operand:SI 1 "reg_or_0_operand" "rJ")))]
>> +  "TARGET_DOUBLE_FLOAT"
>> +  "fcvt.d.w\t%0,%z1"
>> +  [(set_attr "type"    "fcvt")
>> +   (set_attr "mode"    "DF")
>> +   (set_attr "cnv_mode"        "I2D")])
>> +
>> +
>> +(define_insn "floatdidf2"
>> +  [(set (match_operand:DF 0 "register_operand" "=f")
>> +       (float:DF (match_operand:DI 1 "reg_or_0_operand" "rJ")))]
>> +  "TARGET_64BIT && TARGET_DOUBLE_FLOAT"
>> +  "fcvt.d.l\t%0,%z1"
>> +  [(set_attr "type"    "fcvt")
>> +   (set_attr "mode"    "DF")
>> +   (set_attr "cnv_mode"        "I2D")])
>> +
>
>
> Merge with GPR?  And likewise for other fp conversions.
>
>> +;; Instructions for adding the low 16 bits of an address to a register.
>
>
> paste-o: s/16/12/
>
>> +(define_insn "*low<mode>"
>> +  [(set (match_operand:P 0 "register_operand" "=r")
>> +       (lo_sum:P (match_operand:P 1 "register_operand" "r")
>> +                 (match_operand:P 2 "symbolic_operand" "")))]
>> +  ""
>> +  "add\t%0,%1,%R2"
>
>
> Really add, not addi?
>
>> +;; Unlike most other insns, the move insns can't be split with '
>> +;; different predicates, because register spilling and other parts of
>> +;; the compiler, have memoized the insn number already.
>
>
> This comment is incorrect.  In general you can't split non-moves this way
> either.
>
>> +;; HImode constant generation; see riscv_move_integer for details.
>> +;; si+si->hi without truncation is legal because of
>> TRULY_NOOP_TRUNCATION.
>> +
>> +(define_insn "add<mode>hi3"
>> +  [(set (match_operand:HI 0 "register_operand" "=r,r")
>> +       (plus:HI (match_operand:HISI 1 "register_operand" "r,r")
>> +                 (match_operand:HISI 2 "arith_operand" "r,Q")))]
>> +  ""
>> +  { return TARGET_64BIT ? "addw\t%0,%1,%2" : "add\t%0,%1,%2"; }
>> +  [(set_attr "type" "arith")
>> +   (set_attr "mode" "HI")])
>> +
>> +(define_insn "xor<mode>hi3"
>> +  [(set (match_operand:HI 0 "register_operand" "=r,r")
>> +       (xor:HI (match_operand:HISI 1 "register_operand" "r,r")
>> +                 (match_operand:HISI 2 "arith_operand" "r,Q")))]
>> +  ""
>> +  "xor\t%0,%1,%2"
>> +  [(set_attr "type" "logical")
>> +   (set_attr "mode" "HI")])
>
>
> I see from the comment you want to use these for HImode constant generation.
> However, I do wonder if it wouldn't be better to name these such that they
> don't get used during normal rtl expansion.
>
> Why do you support HI and QImode moves into FP registers?
>
>> +;; Expand in-line code to clear the instruction cache between operand[0]
>> and
>> +;; operand[1].
>> +(define_expand "clear_cache"
>> +  [(match_operand 0 "pmode_register_operand")
>> +   (match_operand 1 "pmode_register_operand")]
>> +  ""
>> +  "
>> +{
>> +  emit_insn (gen_fence_i ());
>> +  DONE;
>> +}")
>
>
> Do you need a FENCE before the FENCE.I?

It's actually not clear to me what the semantics of clear_cache are
for multiprocessors.  Can you shed some light?

If thread A does modifies code then sets a flag, then thread B reads
the flag and executes a FENCE.I, then thread A needs a FENCE before
setting the flag and thread B needs a fence before the FENCE.I.  But,
is it not the software's responsibility to insert both fences, rather
than assuming one of the fences is folded into clear_cache?

>
>> +(define_insn "call_value_multiple_internal"
>> +  [(set (match_operand 0 "register_operand" "")
>> +       (call (mem:SI (match_operand 1 "call_insn_operand" "l,S"))
>> +             (match_operand 2 "" "")))
>> +   (set (match_operand 3 "register_operand" "")
>> +       (call (mem:SI (match_dup 1))
>> +             (match_dup 2)))
>
>
> Any reason for this?  Your return value registers are sequential.  The
> normal thing to do is just use e.g. (reg:TI 10).

I think we'd need different patterns for mixed int/FP struct returns
(which use a0 and fa0) if we took that approach.

>
>> +    case MEMMODEL_SEQ_CST:
>> +    case MEMMODEL_SYNC_SEQ_CST:
>> +    case MEMMODEL_ACQ_REL:
>> +      return "fence rw,rw";
>> +    case MEMMODEL_ACQUIRE:
>> +    case MEMMODEL_SYNC_ACQUIRE:
>> +    case MEMMODEL_CONSUME:
>> +      return "fence r,rw";
>> +    case MEMMODEL_RELEASE:
>> +    case MEMMODEL_SYNC_RELEASE:
>> +      return "fence rw,w";
>
>
> The memory models say whether any memory operation can cross the barrier,
> not what kind of operations might cross.  I think you have to use rw,rw
> always.
>
>
> r~