This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.
Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]
Re: [PATCH 2/6] Andes nds32: machine description of nds32 porting (2).

From: Chung-Ju Wu <jasonwucj at gmail dot com>
To: gcc-patches <gcc-patches at gcc dot gnu dot org>, rdsandiford at googlemail dot com
Date: Sat, 28 Sep 2013 02:14:10 +0800
Subject: Re: [PATCH 2/6] Andes nds32: machine description of nds32 porting (2).
Authentication-results: sourceware.org; auth=none
References: <CADj25HOO04tn85ZfL2adeHUo8EL7mGwFf8yB4CadofGCNVszVQ at mail dot gmail dot com> <Pine dot LNX dot 4 dot 64 dot 1307092324060 dot 29921 at digraph dot polyomino dot org dot uk> <51EFF7CA dot 6050601 at gmail dot com> <51F0F2F5 dot 6040905 at gmail dot com> <522CA258 dot 2010403 at gmail dot com> <87six7k4x5 dot fsf at talisman dot default>
On 9/14/13 7:40 PM, Richard Sandiford wrote:
> Some comments for part 2.
> 

The followings are our revisions for part 2.

> Chung-Ju Wu <jasonwucj@gmail.com> writes:
>> +;; Include intrinsic functions definition.
>> +(include "nds32.intrinsic.md")
>> +
>> +;; Include block move for nds32 multiple load/store behavior.
>> +(include "nds32.multiple.md")
>> +
>> +;; Include DImode/DFmode operations.
>> +(include "nds32.doubleword.md")
>> +
>> +;; Include peephole patterns.
>> +(include "nds32.peephole2.md")
> 
> Usual gcc style is to use "-" rather than "." as a word separator in
> filenames.
> 

Fix it accordingly.

>> +(define_insn "*store_si"
>> +  [(set (match_operand:SI 0 "memory_operand"   "=U45, U33, U37, U45, m")
>> +	(match_operand:SI 1 "register_operand" "   l,   l,   l,   d, r"))]
>> +  ""
> 
> Loads, stores, register moves and constant moves should normally be in
> the same pattern, so that anything operating on constraints can see all
> the alternatives at once.  This might not be as important for LRA as it
> was for reload, but it still seems like good practice.
> 

Thanks for the suggestion.  Now we combine all the loads, stores,
register move, and constant move into a single pattern.
It works perfectly during the regression test.

>> +(define_insn "*mov<mode>"
>> +  [(set (match_operand:QIHISI 0 "register_operand" "=r, m, r")
>> +	(match_operand:QIHISI 1 "register_operand" " r, r, m"))]
>> +  ""
>> +{
>> +  switch (which_alternative)
>> +    {
>> +    case 0:
>> +      if (get_attr_length (insn) == 2)
>> +	return "mov55\t%0, %1";
>> +      else
>> +	return "ori\t%0, %1, 0";
>> +    case 1:
>> +      return nds32_output_32bit_store (operands, <byte>);
>> +    case 2:
>> +      return nds32_output_32bit_load (operands, <byte>);
>> +
>> +    default:
>> +      gcc_unreachable ();
>> +    }
>> +}
>> +  [(set_attr "type" "alu,store,load")
>> +   (set_attr "enabled" "1")
>> +   (set_attr_alternative "length"
>> +     [
>> +       ;; Alternative 0
>> +       (if_then_else (match_test "TARGET_16_BIT")
>> +		     (const_int 2)
>> +		     (const_int 4))
>> +       ;; Alternative 1
>> +       (const_int 4)
>> +       ;; Alternative 2
>> +       (const_int 4)
>> +     ])])
> 
> The style used in the load and store patterns was:
> 
> (define_insn "*mov<mode>"
>   [(set (match_operand:QIHISI 0 "register_operand" "=r, r, m, r")
> 	(match_operand:QIHISI 1 "register_operand" " r, r, r, m"))]
>   ""
> {
>   switch (which_alternative)
>     {
>     case 0:
>       return "mov55\t%0, %1";
>     case 1:
>       return "ori\t%0, %1, 0";
>     case 2:
>       return nds32_output_32bit_store (operands, <byte>);
>     case 3:
>       return nds32_output_32bit_load (operands, <byte>);
>     default:
>       gcc_unreachable ();
>     }
> }
>   [(set_attr "type" "alu,alu,store,load")
>    (set_attr "length" "2,4,4,4")])
> 
> which seems neater.  Did you try that but find that it didn't work here?
> 
> Same comment for other instructions where:
> 
>        (if_then_else (match_test "TARGET_16_BIT")
> 		     (const_int 2)
> 		     (const_int 4))
> 
> occurs (except for the special case of relaxable branch instructions,
> where using the if_then_else is good).
> 

Thanks for the comment.
I simplify the patterns as you suggested.

>> +;; We use nds32_symbolic_operand to limit that only CONST/SYMBOL_REF/LABEL_REF
>> +;; are able to match such instruction template.
>> +(define_insn "*move_addr"
>> +  [(set (match_operand:SI 0 "register_operand"       "=l, r")
>> +	(match_operand:SI 1 "nds32_symbolic_operand" " i, i"))]
>> +  ""
>> +  "la\t%0, %1"
>> +  [(set_attr "type" "move")
>> +   (set_attr "length"  "8")])
>> +
>> +
>> +(define_insn "*sethi"
>> +  [(set (match_operand:SI 0 "register_operand"           "=r")
>> +	(high:SI (match_operand:SI 1 "immediate_operand" " i")))]
>> +  ""
>> +{
>> +  return "sethi\t%0, hi20(%1)";
>> +}
>> +  [(set_attr "type" "alu")
>> +   (set_attr "length" "4")])
>> +
>> +
>> +(define_insn "*lo_sum"
>> +  [(set (match_operand:SI 0 "register_operand"             "=r")
>> +	(lo_sum:SI (match_operand:SI 1 "register_operand"  " 0")
>> +		   (match_operand:SI 2 "immediate_operand" " i")))]
>> +  ""
>> +  "ori\t%0, %1, lo12(%2)"
>> +  [(set_attr "type" "alu")
>> +   (set_attr "length" "4")])
> 
> Very minor, but "nds32_symbolic_operand" seems like a better choice for
> *sethi and *lo_sum too, since (high ...) and (lo_sum ...) shouldn't be
> used for const_ints.
> 

Modify it accordingly.

> Any pass would be in its rights to fuse a *sethi and *lo_sum pair back
> into a single *move_addr.  Is that something you want to allow?
> (That's a genuine question rather than a review comment btw.)
> 

In most cases, when we want to load address into a register,
it would match *move_addr pattern to generate "la" pseudo instruction,
which is going to be expanded into "sethi + ori" instructions by assembler.

But we also need to have *sethi and *lo_sum patterns so that we can
construct full addressing mode data load/store like:

  (set (reg_a) (high symbol))
  (set (reg_t) (mem (lo_sum reg_a symbol)))
or
  (set (reg_a) (high symbol))
  (set (mem (lo_sum reg_a symbol) (reg_t)))

To my experiences, it seems such construction is performed
during reload phase.  For example, given such code fragment:

  int a;
  void foo () { a = 10; }

Compile such sample with '-O1 -fdump-rtl-all', we can have:

[a.c.213r.ira]
(insn 7 6 0 2 (set (mem/c:SI (symbol_ref:SI ("a")  <var_decl 0xb75200b8 a>) [0 a+0 S4 A32])
        (const_int 10 [0xa])) a.c:2 27 {*movsi}
     (nil))

[a.c.214r.reload]
(insn 11 6 13 2 (set (reg:SI 0 $r0 [42])
        (high:SI (symbol_ref:SI ("a")  <var_decl 0xb75200b8 a>))) a.c:2 29 {*sethi}
     (nil))
(insn 13 11 12 2 (set (reg:SI 0 $r0 [42])
        (reg:SI 0 $r0 [42])) a.c:2 27 {*movsi}
     (expr_list:REG_DEAD (reg:SI 0 $r0 [42])
        (nil)))
(insn 12 13 14 2 (set (reg:SI 1 $r1 [43])
        (const_int 10 [0xa])) a.c:2 27 {*movsi}
     (nil))
(insn 14 12 7 2 (set (reg:SI 1 $r1 [43])
        (reg:SI 1 $r1 [43])) a.c:2 27 {*movsi}
     (expr_list:REG_DEAD (reg:SI 1 $r1 [43])
        (nil)))
(insn 7 14 10 2 (set (mem/c:SI (lo_sum:SI (reg:SI 0 $r0 [42])
                (symbol_ref:SI ("a")  <var_decl 0xb75200b8 a>)) [0 a+0 S4 A32])
        (reg:SI 1 $r1 [43])) a.c:2 27 {*movsi}
     (expr_list:REG_DEAD (reg:SI 1 $r1 [43])
        (expr_list:REG_DEAD (reg:SI 0 $r0 [42])
            (nil))))

> Is the "0" constraint on the *lo_sum really necessary?  It looks from
> the later OR patterns as though this form of ORI allows the source and
> destination registers to be different.
> 

Fix it accordingly.

>> +;; Zero extension instructions.
>> +
>> +(define_expand "zero_extend<mode>si2"
>> +  [(set (match_operand:SI 0 "general_operand" "")
>> +	(zero_extend:SI (match_operand:QIHI 1 "general_operand" "")))]
>> +  ""
>> +{
>> +  rtx tmp_reg;
>> +
>> +  /* We need to make sure operands[1] is a register.  */
>> +  if (!REG_P (operands[1]))
>> +    operands[1] = force_reg (GET_MODE (operands[1]), operands[1]);
> 
> Why do you need this?  It looks like the architecture has zero-extending loads.
> 

You are right.  Such behavior is unnecessary, so I...

>> +
>> +  /* If the pattern is "(mem X) <- (zero_extend (reg Y))",
>> +     we create two rtx patterns:
>> +       (reg:SI K) <- (zero_extend:SI (reg Y))
>> +       (mem:SI X) <- (reg:SI K)
>> +     The first rtx will be matched by '*zero_extend<mode>si2_reg' template,
>> +     and the second rtx will be matched by mov naming pattern.  */
>> +  if (MEM_P (operands[0]))
>> +    {
>> +      tmp_reg = gen_reg_rtx (SImode);
>> +
>> +      emit_insn (gen_zero_extend<mode>si2 (tmp_reg, operands[1]));
>> +      emit_insn (gen_movsi (operands[0], tmp_reg));
>> +
>> +      DONE;
>> +    }
>> +})
>> +
>> +(define_insn "*zero_extend<mode>si2_reg"
>> +  [(set (match_operand:SI 0 "register_operand"                   "=w, r")
>> +	(zero_extend:SI (match_operand:QIHI 1 "register_operand" " w, r")))]
>> +  ""
>> +{
>> +  switch (which_alternative)
>> +    {
>> +    case 0:
>> +      return "ze<size>33\t%0, %1";
>> +    case 1:
>> +      return "ze<size>\t%0, %1";
>> +
>> +    default:
>> +      gcc_unreachable ();
>> +    }
>> +}
>> +  [(set_attr "type"   "alu,alu")
>> +   (set_attr "length" "  2,  4")])
>> +
>> +(define_insn "*zero_extend<mode>si2_load"
>> +  [(set (match_operand:SI 0 "register_operand"                 "=  l, *r")
>> +	(zero_extend:SI (match_operand:QIHI 1 "memory_operand" " U33,  m")))]
>> +  ""
>> +{
>> +  if (which_alternative == 0)
>> +    return nds32_output_16bit_load (operands, <byte>);
>> +  else
>> +    return nds32_output_32bit_load (operands, <byte>);
>> +}
>> +  [(set_attr "length" "2, 4")
>> +   (set_attr "type" "load,load")])
> 
> Here too it's better to have a single pattern with both the register
> and memory alternatives.  It ought to be possible to define
> "zero_extend<mode>si2" directly as a define_insn rather than a
> define_expand:
> 
> (define_insn "zero_extend<mode>si2"
>   [(set (match_operand:SI 0 "register_operand" "w,w,r,*r")
> 	(zero_extend:SI (match_operand:QIHI 1 "nonimmediate_operand" "w,r,U33,m")))]
>   ...
> 
> The target-independent code will then handle memory destinations.
> 
> Same comments for "extend<mode>si2".
> 

... follow your suggestion to have a single pattern for zero_extend pattern.
I also make similar changes for extend<mode>si2 pattern.

>> +;; Arithmetic instructions.
>> +
>> +(define_expand "addsi3"
>> +  [(set (match_operand:SI 0 "register_operand" "")
>> +	(plus:SI (match_operand:SI 1 "register_operand" "")
>> +		 (match_operand:SI 2 "nds32_nonmemory_nonsymbol_operand" "")))]
>> +  ""
>> +{
>> +  if (GET_CODE (operands[2]) == CONST_INT)
>> +    operands[2] = gen_int_mode (INTVAL (operands[2]), SImode);
>> +})
> 
> This looks like it's papering over a bug elsewhere.  Any CONST_INT passed
> into to gen_addsi3 must already be correct for SImode.  If you find callers
> where that isn't true, we need to fix them.  Also, any incorrect constants
> are usually filtered out by the predicate.
> 

Indeed.  After doing some investigation, it was designed to cover previous
bug that we use gen_addsi3 directly but passed incorrect const_int.
It should not happen in the current implementation.  So I remove it.

>> +(define_insn "*add<mode>3"
>> +  [(set (match_operand:QIHISI 0 "register_operand"                      "=   d,    l,  d, l,    k,    l,    r, r")
>> +	(plus:QIHISI (match_operand:QIHISI 1 "register_operand"         "    0,    l, %0, l,    0,    k,    r, r")
>> +		     (match_operand:QIHISI 2 "nds32_reg_or_int_operand" " Iu05, Iu03,  r, l, Is10, Iu06, Is15, r")))]
>> +  ""
>> +  "@
>> +  addi45\t%0, %2
>> +  addi333\t%0, %1, %2
>> +  add45\t%0, %2
>> +  add333\t%0, %1, %2
>> +  addi10.sp\t%2
>> +  addri36.sp\t%0, %2
>> +  addi\t%0, %1, %2
>> +  add\t%0, %1, %2"
>> +  [(set_attr "type"   "alu,alu,alu,alu,alu,alu,alu,alu")
>> +   (set_attr "length" "  2,  2,  2,  2,  2,  2,  4,  4")])
> 
> The predicates in the define_expand and define_insn are different.
> They should usually be the same.
> 
> Without the gen_int_mode, this too could be defined directly as
> a define_insn, without a separate define_expand.
> 

Thanks for the suggestion.
Now I define single "add<mode>3" pattern for these operations.

>> +(define_expand "subsi3"
>> +  [(set (match_operand:SI 0 "register_operand" "")
>> +	(minus:SI (match_operand:SI 1 "nds32_rimm15s_operand" "")
>> +		 (match_operand:SI 2 "nds32_rimm15s_operand" "")))]
>> +  ""
>> +  ""
>> +)
> 
> Operand 2 shouldn't allow immediates.  They should all go via the
> add optab instead.
> 

Thanks for the comment.
Now I define sub<mode>3 pattern without Iu05/Iu03 immediate:

(define_insn "sub<mode>3"
  [(set (match_operand:QIHISI 0 "register_operand"                    "=d, l,    r, r")
        (minus:QIHISI (match_operand:QIHISI 1 "nds32_rimm15s_operand" " 0, l, Is15, r")
                      (match_operand:QIHISI 2 "register_operand"      " r, l,    r, r")))]
  ...

And I also create two new constraints used in add<mode>3 pattern:

(define_insn "add<mode>3"
  [(set (match_operand:QIHISI 0 "register_operand"                   "=   d,    l, ...")
        (plus:QIHISI (match_operand:QIHISI 1 "register_operand"      "    0,    l, ...")
                     (match_operand:QIHISI 2 "nds32_rimm15s_operand" " In05, In03, ...")))]
  ...
{
  ...
    case 0:
      operands[2] = gen_int_mode (-INTVAL (operands[2]), SImode);
      return "subi45\t%0, %2";
    case 1:
      operands[2] = gen_int_mode (-INTVAL (operands[2]), SImode);
      return "subi333\t%0, %1, %2";
  ...
}

where In05 is in range of -31 ~ 0, and In03 is in range of -7 ~ 0.

>> +
>> +(define_insn "*sub<mode>3"
>> +  [(set (match_operand:QIHISI 0 "register_operand"                    "=   d,    l, d, l,    r, r")
>> +	(minus:QIHISI (match_operand:QIHISI 1 "nds32_rimm15s_operand" "    0,    l, 0, l, Is15, r")
>> +		      (match_operand:QIHISI 2 "nds32_rimm15s_operand" " Iu05, Iu03, r, l,    r, r")))]
>> +  ""
>> +  "@
>> +  subi45\t%0, %2
>> +  subi333\t%0, %1, %2
>> +  sub45\t%0, %2
>> +  sub333\t%0, %1, %2
>> +  subri\t%0, %2, %1
>> +  sub\t%0, %1, %2"
>> +  [(set_attr "type"   "alu,alu,alu,alu,alu,alu")
>> +   (set_attr "length" "  2,  2,  2,  2,  4,  4")])
> 
> Here too a direct define_insn seems better than a define_expand/define_insn
> pair.
> 

Modify it accordingly.

>> +(define_expand "andsi3"
>> +  [(set (match_operand:SI         0 "register_operand" "")
>> +	(and:SI (match_operand:SI 1 "register_operand" "")
>> +		(match_operand:SI 2 "general_operand"  "")))]
>> +  ""
>> +{
>> +  /* If operands[2] is const_int,
>> +     we might be able to use other more efficient instructions.  */
>> +  if (GET_CODE (operands[2]) == CONST_INT)
>> +    {
>> +      int mask = INTVAL (operands[2]);
>> +
>> +      if (mask == 255)
>> +	{
>> +	  /* ($r0 & 0xff)  ==>  (zeb $r0, $r0) */
>> +	  operands[1] = convert_to_mode (QImode, operands[1], 1);
>> +	  emit_insn (gen_zero_extendqisi2 (operands[0], operands[1]));
>> +	  DONE;
>> +	}
>> +      else if (mask == 65535)
>> +	{
>> +	  /* ($r0 & 0xffff)  ==>  (zeh $r0, $r0) */
>> +	  operands[1] = convert_to_mode (HImode, operands[1], 1);
>> +	  emit_insn (gen_zero_extendhisi2 (operands[0], operands[1]));
>> +	  DONE;
>> +	}
>> +    }
>> +})
> 
> It looks like the associated "*andsi3" insn also has a case for zeb.
> That's usually the better approach.  Please consider adding a zeh case
> to the "*andsi3" alternatives too and removing the code above.
> 
> With that change, the define_expand and define_insn could be fused.
> 

Thanks for the comment.  Modify it accordingly.

>> +;; For iorsi3 naming pattern, we have to use define_expand first,
>> +;; and then design different anonymous patterns so that it can
>> +;; simply set different instruction length according to ISA.
>> +(define_expand "iorsi3"
>> +  [(set (match_operand:SI 0 "register_operand"         "")
>> +	(ior:SI (match_operand:SI 1 "register_operand" "")
>> +		(match_operand:SI 2 "general_operand"  "")))]
>> +  ""
>> +  ""
>> +)
>> +
>> +;; This is the iorsi3 pattern for V3/V3M ISA,
>> +;; which DOES HAVE 'or33' instruction.
>> +;; So we can identify 'or Rt3,Ra3,Rb3' case and set its length to be 2.
>> +(define_insn "*iorsi3"
>> +  [(set (match_operand:SI 0 "register_operand"         "= w, r,    r,    r")
>> +	(ior:SI (match_operand:SI 1 "register_operand" " %0, r,    r,    r")
>> +		(match_operand:SI 2 "general_operand"  "  w, r, Iu15, Ie15")))]
>> +  ""
>> +{
>> +  int one_position;
>> +
>> +  switch (which_alternative)
>> +    {
>> +    case 0:
>> +      return "or33\t%0, %2";
>> +    case 1:
>> +      return "or\t%0, %1, %2";
>> +    case 2:
>> +      return "ori\t%0, %1, %2";
>> +    case 3:
>> +      /* If we reach this alternative,
>> +         it must pass the nds32_can_use_bset_p() test,
>> +         so that we can guarantee there is only one 1-bit
>> +         within the immediate value.  */
>> +      for (one_position = 31; one_position >= 0; one_position--)
>> +	{
>> +	  if ((INTVAL (operands[2]) & (1 << one_position)) != 0)
>> +	    {
>> +	      /* Found the 1-bit position.  */
>> +	      operands[2] = GEN_INT (one_position);
>> +	      break;
>> +	    }
>> +	}
>> +      return "bset\t%0, %1, %2";
>> +
>> +    default:
>> +      gcc_unreachable ();
>> +    }
>> +}
>> +  [(set_attr "type"   "alu,alu,alu,alu")
>> +   (set_attr "length" "  2,  4,  4,  4")])
> 
> I don't understand the comment above the define_expand, sorry.
> This too looks like a case where "iorsi3" should just be a define_insn,
> with no define_expand.
> 

The define_expand is a legacy code.
It should not be necessary in current design.
Thanks for the catch.  Fix it accordingly.

> Case 3 could use exact_log2.
> 

Modify it accordingly.

> Same comments for xor. 

Modify it accordingly.

> (You might be able to use code iterators
> and have a single set of patterns for or and xor, including the
> shifting variants.)
> 

Thanks for the suggestion.  But I prefer not to use code iterators
because we may add different variants for 'or' and 'xor' separately.

>> +;; For negsi2 naming pattern, we have to use define_expand first,
>> +;; and then design different anonymous patterns so that it can
>> +;; output assembly code according to ISA.
>> +(define_expand "negsi2"
>> +  [(set (match_operand:SI 0 "register_operand"         "")
>> +	(neg:SI (match_operand:SI 1 "register_operand" "")))]
>> +  ""
>> +  ""
>> +)
>> +
>> +;; Note that there is NO 'neg33' instruction for V2 ISA.
>> +;; So 'subri A,B,0' (its semantic is 'A = 0 - B')
>> +;; is the only option for V2 ISA.
>> +(define_insn "*negsi2"
>> +  [(set (match_operand:SI 0 "register_operand"         "=w, r")
>> +	(neg:SI (match_operand:SI 1 "register_operand" " w, r")))]
>> +  ""
>> +  "@
>> +   neg33\t%0, %1
>> +   subri\t%0, %1, 0"
>> +  [(set_attr "type"   "alu,alu")
>> +   (set_attr "length" "  2,  4")])
> 
> Here too the define_expand seems redundant.  Same for one_cmplsi2.
> 

Thanks for the catch.
I modify negsi2 and one_cmplsi2 patterns accordingly.

> (Looks like you already define the shift instructions directly though,
> thanks.)
> 
>> +;; Shift instructions.
>> +
>> +(define_insn "ashlsi3"
>> +  [(set (match_operand:SI 0 "register_operand"            "=   l,    r, r")
>> +	(ashift:SI (match_operand:SI 1 "register_operand" "    l,    r, r")
>> +		   (match_operand:SI 2 "general_operand"  " Iu03, Iu05, r")))]
> 
> Operand 2 doesn't allow memory, so nonmemory_operand would be better
> than general_operand.  Both are correct, but nonmemory_operand is
> tighter and so forces the pre-RA optimisers to treat the load as
> a separate instruction.
> 
> Same for the other shift instructions.
> 

Thanks for the suggestion.
It does help optimization for some cases.
Modify it accordingly.

>> +(define_expand "mov<mode>cc"
>> +  [(set (match_operand:QIHI 0 "register_operand" "")
>> +	(if_then_else:QIHI (match_operand 1 "comparison_operator" "")
>> +			   (match_operand:QIHI 2 "register_operand" "")
>> +			   (match_operand:QIHI 3 "register_operand" "")))]
>> +  "TARGET_CMOV"
>> +{
>> +  rtx insn;
>> +
>> +  /* For QImode and HImode conditional move,
>> +     make them to be SImode behavior.  */
>> +  operands[0] = simplify_gen_subreg (SImode, operands[0], <MODE>mode, 0);
>> +  operands[2] = simplify_gen_subreg (SImode, operands[2], <MODE>mode, 0);
>> +  operands[3] = simplify_gen_subreg (SImode, operands[3], <MODE>mode, 0);
>> +
>> +  insn = gen_movsicc (operands[0], operands[1], operands[2], operands[3]);
>> +
>> +  if (!insn)
>> +    FAIL;
>> +
>> +  emit_insn (insn);
>> +  DONE;
>> +})
> 
> It'd be better to handle QI, HI and SI using a single template if possible.
> Subregs are harder to optimise than plain registers.
> 

Thanks for clarifying it.
I decide to remove QI, HI pattern and just leave movsicc pattern in place.

>> +
>> +(define_insn "cmovz"
>> +  [(set (match_operand:SI 0 "register_operand"                      "=r, r")
>> +        (if_then_else:SI (eq (match_operand:SI 1 "register_operand" " r, r")
>> +			     (const_int 0))
>> +			 (match_operand:SI 2 "register_operand"     " r, 0")
>> +			 (match_operand:SI 3 "register_operand"     " 0, r")))]
>> +  "TARGET_CMOV"
>> +  "@
>> +   cmovz\t%0, %2, %1
>> +   cmovz\t%0, %3, %1"
>> +  [(set_attr "type" "move")
>> +   (set_attr "length"  "4")])
>> +
>> +(define_insn "cmovn"
>> +  [(set (match_operand:SI 0 "register_operand"                      "=r, r")
>> +	(if_then_else:SI (ne (match_operand:SI 1 "register_operand" " r, r")
>> +			     (const_int 0))
>> +			 (match_operand:SI 2 "register_operand"     " r, 0")
>> +			 (match_operand:SI 3 "register_operand"     " 0, r")))]
>> +  "TARGET_CMOV"
>> +  "@
>> +   cmovn\t%0, %2, %1
>> +   cmovn\t%0, %3, %1"
>> +  [(set_attr "type" "move")
>> +   (set_attr "length"  "4")])
>> +
>> +(define_insn_and_split "*movsicc"
>> +  [(set (match_operand:SI 0 "register_operand"                     "=r, r")
>> +	(if_then_else:SI (match_operator 1 "nds32_equality_comparison_operator"
>> +			   [(match_operand:SI 2 "register_operand" " r, r")
>> +			    (const_int 0)])
>> +			 (match_operand:SI 3 "register_operand"    " 0, r")
>> +			 (match_operand:SI 4 "register_operand"    " r, 0")))]
>> +  "TARGET_CMOV"
>> +  "#"
>> +  "reload_completed"
>> +  [(pc)]
>> +{
>> +  enum rtx_code code = GET_CODE (operands[1]);
>> +  rtx then_op = operands[3];
>> +  rtx else_op = operands[4];
>> +  rtx tmp;
>> +
>> +  /* According to the implementation in "movsicc" naming pattern,
>> +     if we make transformation in which the comparison code is EQ,
>> +     the desired target is at "else" part position semantically.
>> +     Now it is the time (after reload_completed) to physically
>> +     swap it to "then" part position.  */
>> +  if (code == EQ)
>> +    {
>> +      tmp     = then_op;
>> +      then_op = else_op;
>> +      else_op = tmp;
>> +    }
>> +
>> +  /* Choosing cmovz or cmovn is based on reload phase result.
>> +     After reload phase, one source operand will use
>> +     the same register as result operand.
>> +     We can use cmovz/cmovn to catch the other source operand
>> +     which has different register.
>> +     So We check register number to determine using cmovz or cmovn.  */
>> +  if (REGNO(then_op) == REGNO(operands[0]))
>> +    emit_insn (gen_cmovz (operands[0], operands[2], else_op, operands[0]));
>> +  else if (REGNO(else_op) == REGNO(operands[0]))
>> +    emit_insn (gen_cmovn (operands[0], operands[2], then_op, operands[0]));
>> +  else
>> +    gcc_unreachable ();
>> +
>> +  DONE;
>> +})
> 
> I don't really see off-hand how the third instruction would match in its
> define_insn form, since the earlier instructions ought to match first.
> And it looks from first glance like the splitter is working around
> a bug in the first two instructions.  E.g. shouldn't the first pattern be:
> 
> (define_insn "cmovz"
>   [(set (match_operand:SI 0 "register_operand"                      "=r, r")
>         (if_then_else:SI (eq (match_operand:SI 1 "register_operand" " r, r")
> 			     (const_int 0))
> 			 (match_operand:SI 2 "register_operand"     " r, 0")
> 			 (match_operand:SI 3 "register_operand"     " 0, r")))]
>   "TARGET_CMOV"
>   "@
>    cmovz\t%0, %2, %1
>    cmovn\t%0, %3, %1"
>   [(set_attr "type" "move")
>    (set_attr "length"  "4")])
> 
> with the second alternative being "cmovn" rather than "cmovz"?
> With a similar change to the "cmovn" pattern, the define_insn_and_split
> ought to be unnecessary.
> 

I would say you are ABSOLUTELY right!
We follow your suggestion to modify "cmovz" and "cmovn" patterns.
With that change we can happily discard the define_insn_and_split part!
Thanks you very much!! :)

>> +	  /* We want to plus 1 into the integer value
>> +	     of operands[2] to create 'slt' instruction.
>> +	     This caculation is performed on the host machine,
>> +	     which may be 64-bit integer.
>> +	     So the meaning of caculation result may be
>> +	     different from the 32-bit nds32 target.
>> +
>> +	     For example:
>> +	       0x7fffffff + 0x1 -> 0x80000000,
>> +	       this value is POSITIVE on 64-bit machine,
>> +	       but the expected value on 32-bit nds32 target
>> +	       should be NEGATIVE value.
>> +
>> +	     Hence, instead of using GEN_INT(), we use gen_int_mode() to
>> +	     explicitly create SImode constant rtx.  */
>> +	  operands[2] = gen_int_mode (INTVAL (operands[2]) + 1, SImode);
> 
> The comment seems unnecessary.  gen_int_mode is better than GEN_INT
> whereever it can be used.  It's new uses of GEN_INT that deserve
> comments :-)
> 

May I keep this comment for other engineers? :p

Some people may not be able to tell the difference yet,
and I wish I can keep this comment so that others are
able to refer a sample of why gen_int_mode() is better
than GEN_INT() wherever it can be used. :)

>> +	  /* We want to plus 1 into the integer value
>> +	     of operands[2] to create 'slt' instruction.
>> +	     This caculation is performed on the host machine,
>> +	     which may be 64-bit integer.
>> +	     So the meaning of caculation result may be
>> +	     different from the 32-bit nds32 target.
>> +
>> +	     For example:
>> +	       0x7fffffff + 0x1 -> 0x80000000,
>> +	       this value is POSITIVE on 64-bit machine,
>> +	       but the expected value on 32-bit nds32 target
>> +	       should be NEGATIVE value.
>> +
>> +	     Hence, instead of using GEN_INT(), we use gen_int_mode() to
>> +	     explicitly create SImode constant rtx.  */
>> +	  operands[2] = gen_int_mode (INTVAL (operands[2]) + 1, SImode);
>> +
>> +	  if (code == LE)
>> +	    {
>> +	      /* LE, use slts instruction */
>> +	      emit_insn (gen_slts_compare (tmp_reg, operands[1], operands[2]));
>> +	    }
>> +	  else
>> +	    {
>> +	      /* LEU, use slt instruction */
>> +	      emit_insn (gen_slt_compare  (tmp_reg, operands[1], operands[2]));
>> +	    }
> 
> Same here, but (le:SI X INT_MAX) isn't the same as (lt:SI X INT_MIN).
> I'm not sure we're guaranteed to have optimised away all those cases
> by this point, but at least an assert would be good.
> 

Thanks.  I add an assert in case the code is LE and
original operands[2] is INT_MAX.

>> +;; Subroutine call instruction returning no value.
>> +;;   operands[0]: It should be a mem RTX whose address is
>> +;;                the the address of the function.
>> +;;   operands[1]: It is the number of bytes of arguments pushed as a const_int.
>> +;;   operands[2]: It is the number of registers used as operands.
>> +
>> +(define_expand "call"
>> +  [(parallel [(call (match_operand 0 "memory_operand" "")
>> +		    (match_operand 1 "general_operand" ""))
> 
> Might as well just use (match_operand 1 "" ""), like the define_insns do.
> 
> FWIW you can drop trailing ""s from match_operands.
> 

Modify it accordingly.

>> +(define_expand "prologue" [(const_int 0)]
>> +  ""
>> +{
>> +  /* Note that only under V3/V3M ISA, we could use v3push prologue.  */
>> +  if (TARGET_V3PUSH)
>> +    nds32_expand_prologue_v3push();
>> +  else
>> +    nds32_expand_prologue();
>> +  DONE;
>> +})
> 
> Space before "()".  Same for the epilogue pattern.
> 

Fix it accordingly.

>> +;; nop instruction.
>> +
>> +(define_insn "nop"
>> +  [(const_int 0)]
>> +  ""
>> +  "nop16"
>> +  [(set_attr "type"   "misc")
>> +   (set_attr "length"    "2")])
> 
> What happens for !TARGET_16BIT?  Won't the alternative be disabled there?
> 

Oh my god~ Thank you for the catch about such latent bug!
Fix it in the revised patch. :)

>> +  /* The En4 encoding string of the instruction is
>> +     in the bitwise fashion of following: "fp gp lp sp".  */
>> +  const char *En4_str[16] =
> 
> The optimisers would probably pick this up anyway, but:
> 
>   static const char *const En4_str[16] =
> 
> explicitly says that this is an array that should live in read-only storage.
> 

Fix it accordingly.

>> +  /* Create RbRe_str string.
>> +     Note that we need to output ',' character if there exists En4 field.  */
>> +  if (REGNO (operands[0]) != SP_REGNUM && REGNO (operands[1]) != SP_REGNUM)
>> +      RbRe_str = (INTVAL (operands[2]) != 0) ? "%0, %1, " : "%0, %1";
>> +  else
>> +      RbRe_str = "";
> 
> Excess indentation.  GNU style is not to wrap "INTVAL (operands[2]) != 0"
> in brackets.
> 
> Same comments for pop.
> 

Fix it accordingly.


> Overall it looks good to me FWIW.
> 
> Thanks,
> Richard
> 

A revised patch for nds32.md is attached.
Thanks for all of your review comments! :)

Best regards,
jasonwucj
Attachment: 2-nds32-backend-md-part2.v3.revised.patch
Description: Text document
References:
- Re: [PATCH 2/6] Andes nds32: machine description of nds32 porting (2).
  - From: Chung-Ju Wu
- Re: [PATCH 2/6] Andes nds32: machine description of nds32 porting (2).
  - From: Richard Sandiford
Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]