This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |
Other format: | [Raw text] |
On 9/14/13 7:40 PM, Richard Sandiford wrote: > Some comments for part 2. > The followings are our revisions for part 2. > Chung-Ju Wu <jasonwucj@gmail.com> writes: >> +;; Include intrinsic functions definition. >> +(include "nds32.intrinsic.md") >> + >> +;; Include block move for nds32 multiple load/store behavior. >> +(include "nds32.multiple.md") >> + >> +;; Include DImode/DFmode operations. >> +(include "nds32.doubleword.md") >> + >> +;; Include peephole patterns. >> +(include "nds32.peephole2.md") > > Usual gcc style is to use "-" rather than "." as a word separator in > filenames. > Fix it accordingly. >> +(define_insn "*store_si" >> + [(set (match_operand:SI 0 "memory_operand" "=U45, U33, U37, U45, m") >> + (match_operand:SI 1 "register_operand" " l, l, l, d, r"))] >> + "" > > Loads, stores, register moves and constant moves should normally be in > the same pattern, so that anything operating on constraints can see all > the alternatives at once. This might not be as important for LRA as it > was for reload, but it still seems like good practice. > Thanks for the suggestion. Now we combine all the loads, stores, register move, and constant move into a single pattern. It works perfectly during the regression test. >> +(define_insn "*mov<mode>" >> + [(set (match_operand:QIHISI 0 "register_operand" "=r, m, r") >> + (match_operand:QIHISI 1 "register_operand" " r, r, m"))] >> + "" >> +{ >> + switch (which_alternative) >> + { >> + case 0: >> + if (get_attr_length (insn) == 2) >> + return "mov55\t%0, %1"; >> + else >> + return "ori\t%0, %1, 0"; >> + case 1: >> + return nds32_output_32bit_store (operands, <byte>); >> + case 2: >> + return nds32_output_32bit_load (operands, <byte>); >> + >> + default: >> + gcc_unreachable (); >> + } >> +} >> + [(set_attr "type" "alu,store,load") >> + (set_attr "enabled" "1") >> + (set_attr_alternative "length" >> + [ >> + ;; Alternative 0 >> + (if_then_else (match_test "TARGET_16_BIT") >> + (const_int 2) >> + (const_int 4)) >> + ;; Alternative 1 >> + (const_int 4) >> + ;; Alternative 2 >> + (const_int 4) >> + ])]) > > The style used in the load and store patterns was: > > (define_insn "*mov<mode>" > [(set (match_operand:QIHISI 0 "register_operand" "=r, r, m, r") > (match_operand:QIHISI 1 "register_operand" " r, r, r, m"))] > "" > { > switch (which_alternative) > { > case 0: > return "mov55\t%0, %1"; > case 1: > return "ori\t%0, %1, 0"; > case 2: > return nds32_output_32bit_store (operands, <byte>); > case 3: > return nds32_output_32bit_load (operands, <byte>); > default: > gcc_unreachable (); > } > } > [(set_attr "type" "alu,alu,store,load") > (set_attr "length" "2,4,4,4")]) > > which seems neater. Did you try that but find that it didn't work here? > > Same comment for other instructions where: > > (if_then_else (match_test "TARGET_16_BIT") > (const_int 2) > (const_int 4)) > > occurs (except for the special case of relaxable branch instructions, > where using the if_then_else is good). > Thanks for the comment. I simplify the patterns as you suggested. >> +;; We use nds32_symbolic_operand to limit that only CONST/SYMBOL_REF/LABEL_REF >> +;; are able to match such instruction template. >> +(define_insn "*move_addr" >> + [(set (match_operand:SI 0 "register_operand" "=l, r") >> + (match_operand:SI 1 "nds32_symbolic_operand" " i, i"))] >> + "" >> + "la\t%0, %1" >> + [(set_attr "type" "move") >> + (set_attr "length" "8")]) >> + >> + >> +(define_insn "*sethi" >> + [(set (match_operand:SI 0 "register_operand" "=r") >> + (high:SI (match_operand:SI 1 "immediate_operand" " i")))] >> + "" >> +{ >> + return "sethi\t%0, hi20(%1)"; >> +} >> + [(set_attr "type" "alu") >> + (set_attr "length" "4")]) >> + >> + >> +(define_insn "*lo_sum" >> + [(set (match_operand:SI 0 "register_operand" "=r") >> + (lo_sum:SI (match_operand:SI 1 "register_operand" " 0") >> + (match_operand:SI 2 "immediate_operand" " i")))] >> + "" >> + "ori\t%0, %1, lo12(%2)" >> + [(set_attr "type" "alu") >> + (set_attr "length" "4")]) > > Very minor, but "nds32_symbolic_operand" seems like a better choice for > *sethi and *lo_sum too, since (high ...) and (lo_sum ...) shouldn't be > used for const_ints. > Modify it accordingly. > Any pass would be in its rights to fuse a *sethi and *lo_sum pair back > into a single *move_addr. Is that something you want to allow? > (That's a genuine question rather than a review comment btw.) > In most cases, when we want to load address into a register, it would match *move_addr pattern to generate "la" pseudo instruction, which is going to be expanded into "sethi + ori" instructions by assembler. But we also need to have *sethi and *lo_sum patterns so that we can construct full addressing mode data load/store like: (set (reg_a) (high symbol)) (set (reg_t) (mem (lo_sum reg_a symbol))) or (set (reg_a) (high symbol)) (set (mem (lo_sum reg_a symbol) (reg_t))) To my experiences, it seems such construction is performed during reload phase. For example, given such code fragment: int a; void foo () { a = 10; } Compile such sample with '-O1 -fdump-rtl-all', we can have: [a.c.213r.ira] (insn 7 6 0 2 (set (mem/c:SI (symbol_ref:SI ("a") <var_decl 0xb75200b8 a>) [0 a+0 S4 A32]) (const_int 10 [0xa])) a.c:2 27 {*movsi} (nil)) [a.c.214r.reload] (insn 11 6 13 2 (set (reg:SI 0 $r0 [42]) (high:SI (symbol_ref:SI ("a") <var_decl 0xb75200b8 a>))) a.c:2 29 {*sethi} (nil)) (insn 13 11 12 2 (set (reg:SI 0 $r0 [42]) (reg:SI 0 $r0 [42])) a.c:2 27 {*movsi} (expr_list:REG_DEAD (reg:SI 0 $r0 [42]) (nil))) (insn 12 13 14 2 (set (reg:SI 1 $r1 [43]) (const_int 10 [0xa])) a.c:2 27 {*movsi} (nil)) (insn 14 12 7 2 (set (reg:SI 1 $r1 [43]) (reg:SI 1 $r1 [43])) a.c:2 27 {*movsi} (expr_list:REG_DEAD (reg:SI 1 $r1 [43]) (nil))) (insn 7 14 10 2 (set (mem/c:SI (lo_sum:SI (reg:SI 0 $r0 [42]) (symbol_ref:SI ("a") <var_decl 0xb75200b8 a>)) [0 a+0 S4 A32]) (reg:SI 1 $r1 [43])) a.c:2 27 {*movsi} (expr_list:REG_DEAD (reg:SI 1 $r1 [43]) (expr_list:REG_DEAD (reg:SI 0 $r0 [42]) (nil)))) > Is the "0" constraint on the *lo_sum really necessary? It looks from > the later OR patterns as though this form of ORI allows the source and > destination registers to be different. > Fix it accordingly. >> +;; Zero extension instructions. >> + >> +(define_expand "zero_extend<mode>si2" >> + [(set (match_operand:SI 0 "general_operand" "") >> + (zero_extend:SI (match_operand:QIHI 1 "general_operand" "")))] >> + "" >> +{ >> + rtx tmp_reg; >> + >> + /* We need to make sure operands[1] is a register. */ >> + if (!REG_P (operands[1])) >> + operands[1] = force_reg (GET_MODE (operands[1]), operands[1]); > > Why do you need this? It looks like the architecture has zero-extending loads. > You are right. Such behavior is unnecessary, so I... >> + >> + /* If the pattern is "(mem X) <- (zero_extend (reg Y))", >> + we create two rtx patterns: >> + (reg:SI K) <- (zero_extend:SI (reg Y)) >> + (mem:SI X) <- (reg:SI K) >> + The first rtx will be matched by '*zero_extend<mode>si2_reg' template, >> + and the second rtx will be matched by mov naming pattern. */ >> + if (MEM_P (operands[0])) >> + { >> + tmp_reg = gen_reg_rtx (SImode); >> + >> + emit_insn (gen_zero_extend<mode>si2 (tmp_reg, operands[1])); >> + emit_insn (gen_movsi (operands[0], tmp_reg)); >> + >> + DONE; >> + } >> +}) >> + >> +(define_insn "*zero_extend<mode>si2_reg" >> + [(set (match_operand:SI 0 "register_operand" "=w, r") >> + (zero_extend:SI (match_operand:QIHI 1 "register_operand" " w, r")))] >> + "" >> +{ >> + switch (which_alternative) >> + { >> + case 0: >> + return "ze<size>33\t%0, %1"; >> + case 1: >> + return "ze<size>\t%0, %1"; >> + >> + default: >> + gcc_unreachable (); >> + } >> +} >> + [(set_attr "type" "alu,alu") >> + (set_attr "length" " 2, 4")]) >> + >> +(define_insn "*zero_extend<mode>si2_load" >> + [(set (match_operand:SI 0 "register_operand" "= l, *r") >> + (zero_extend:SI (match_operand:QIHI 1 "memory_operand" " U33, m")))] >> + "" >> +{ >> + if (which_alternative == 0) >> + return nds32_output_16bit_load (operands, <byte>); >> + else >> + return nds32_output_32bit_load (operands, <byte>); >> +} >> + [(set_attr "length" "2, 4") >> + (set_attr "type" "load,load")]) > > Here too it's better to have a single pattern with both the register > and memory alternatives. It ought to be possible to define > "zero_extend<mode>si2" directly as a define_insn rather than a > define_expand: > > (define_insn "zero_extend<mode>si2" > [(set (match_operand:SI 0 "register_operand" "w,w,r,*r") > (zero_extend:SI (match_operand:QIHI 1 "nonimmediate_operand" "w,r,U33,m")))] > ... > > The target-independent code will then handle memory destinations. > > Same comments for "extend<mode>si2". > ... follow your suggestion to have a single pattern for zero_extend pattern. I also make similar changes for extend<mode>si2 pattern. >> +;; Arithmetic instructions. >> + >> +(define_expand "addsi3" >> + [(set (match_operand:SI 0 "register_operand" "") >> + (plus:SI (match_operand:SI 1 "register_operand" "") >> + (match_operand:SI 2 "nds32_nonmemory_nonsymbol_operand" "")))] >> + "" >> +{ >> + if (GET_CODE (operands[2]) == CONST_INT) >> + operands[2] = gen_int_mode (INTVAL (operands[2]), SImode); >> +}) > > This looks like it's papering over a bug elsewhere. Any CONST_INT passed > into to gen_addsi3 must already be correct for SImode. If you find callers > where that isn't true, we need to fix them. Also, any incorrect constants > are usually filtered out by the predicate. > Indeed. After doing some investigation, it was designed to cover previous bug that we use gen_addsi3 directly but passed incorrect const_int. It should not happen in the current implementation. So I remove it. >> +(define_insn "*add<mode>3" >> + [(set (match_operand:QIHISI 0 "register_operand" "= d, l, d, l, k, l, r, r") >> + (plus:QIHISI (match_operand:QIHISI 1 "register_operand" " 0, l, %0, l, 0, k, r, r") >> + (match_operand:QIHISI 2 "nds32_reg_or_int_operand" " Iu05, Iu03, r, l, Is10, Iu06, Is15, r")))] >> + "" >> + "@ >> + addi45\t%0, %2 >> + addi333\t%0, %1, %2 >> + add45\t%0, %2 >> + add333\t%0, %1, %2 >> + addi10.sp\t%2 >> + addri36.sp\t%0, %2 >> + addi\t%0, %1, %2 >> + add\t%0, %1, %2" >> + [(set_attr "type" "alu,alu,alu,alu,alu,alu,alu,alu") >> + (set_attr "length" " 2, 2, 2, 2, 2, 2, 4, 4")]) > > The predicates in the define_expand and define_insn are different. > They should usually be the same. > > Without the gen_int_mode, this too could be defined directly as > a define_insn, without a separate define_expand. > Thanks for the suggestion. Now I define single "add<mode>3" pattern for these operations. >> +(define_expand "subsi3" >> + [(set (match_operand:SI 0 "register_operand" "") >> + (minus:SI (match_operand:SI 1 "nds32_rimm15s_operand" "") >> + (match_operand:SI 2 "nds32_rimm15s_operand" "")))] >> + "" >> + "" >> +) > > Operand 2 shouldn't allow immediates. They should all go via the > add optab instead. > Thanks for the comment. Now I define sub<mode>3 pattern without Iu05/Iu03 immediate: (define_insn "sub<mode>3" [(set (match_operand:QIHISI 0 "register_operand" "=d, l, r, r") (minus:QIHISI (match_operand:QIHISI 1 "nds32_rimm15s_operand" " 0, l, Is15, r") (match_operand:QIHISI 2 "register_operand" " r, l, r, r")))] ... And I also create two new constraints used in add<mode>3 pattern: (define_insn "add<mode>3" [(set (match_operand:QIHISI 0 "register_operand" "= d, l, ...") (plus:QIHISI (match_operand:QIHISI 1 "register_operand" " 0, l, ...") (match_operand:QIHISI 2 "nds32_rimm15s_operand" " In05, In03, ...")))] ... { ... case 0: operands[2] = gen_int_mode (-INTVAL (operands[2]), SImode); return "subi45\t%0, %2"; case 1: operands[2] = gen_int_mode (-INTVAL (operands[2]), SImode); return "subi333\t%0, %1, %2"; ... } where In05 is in range of -31 ~ 0, and In03 is in range of -7 ~ 0. >> + >> +(define_insn "*sub<mode>3" >> + [(set (match_operand:QIHISI 0 "register_operand" "= d, l, d, l, r, r") >> + (minus:QIHISI (match_operand:QIHISI 1 "nds32_rimm15s_operand" " 0, l, 0, l, Is15, r") >> + (match_operand:QIHISI 2 "nds32_rimm15s_operand" " Iu05, Iu03, r, l, r, r")))] >> + "" >> + "@ >> + subi45\t%0, %2 >> + subi333\t%0, %1, %2 >> + sub45\t%0, %2 >> + sub333\t%0, %1, %2 >> + subri\t%0, %2, %1 >> + sub\t%0, %1, %2" >> + [(set_attr "type" "alu,alu,alu,alu,alu,alu") >> + (set_attr "length" " 2, 2, 2, 2, 4, 4")]) > > Here too a direct define_insn seems better than a define_expand/define_insn > pair. > Modify it accordingly. >> +(define_expand "andsi3" >> + [(set (match_operand:SI 0 "register_operand" "") >> + (and:SI (match_operand:SI 1 "register_operand" "") >> + (match_operand:SI 2 "general_operand" "")))] >> + "" >> +{ >> + /* If operands[2] is const_int, >> + we might be able to use other more efficient instructions. */ >> + if (GET_CODE (operands[2]) == CONST_INT) >> + { >> + int mask = INTVAL (operands[2]); >> + >> + if (mask == 255) >> + { >> + /* ($r0 & 0xff) ==> (zeb $r0, $r0) */ >> + operands[1] = convert_to_mode (QImode, operands[1], 1); >> + emit_insn (gen_zero_extendqisi2 (operands[0], operands[1])); >> + DONE; >> + } >> + else if (mask == 65535) >> + { >> + /* ($r0 & 0xffff) ==> (zeh $r0, $r0) */ >> + operands[1] = convert_to_mode (HImode, operands[1], 1); >> + emit_insn (gen_zero_extendhisi2 (operands[0], operands[1])); >> + DONE; >> + } >> + } >> +}) > > It looks like the associated "*andsi3" insn also has a case for zeb. > That's usually the better approach. Please consider adding a zeh case > to the "*andsi3" alternatives too and removing the code above. > > With that change, the define_expand and define_insn could be fused. > Thanks for the comment. Modify it accordingly. >> +;; For iorsi3 naming pattern, we have to use define_expand first, >> +;; and then design different anonymous patterns so that it can >> +;; simply set different instruction length according to ISA. >> +(define_expand "iorsi3" >> + [(set (match_operand:SI 0 "register_operand" "") >> + (ior:SI (match_operand:SI 1 "register_operand" "") >> + (match_operand:SI 2 "general_operand" "")))] >> + "" >> + "" >> +) >> + >> +;; This is the iorsi3 pattern for V3/V3M ISA, >> +;; which DOES HAVE 'or33' instruction. >> +;; So we can identify 'or Rt3,Ra3,Rb3' case and set its length to be 2. >> +(define_insn "*iorsi3" >> + [(set (match_operand:SI 0 "register_operand" "= w, r, r, r") >> + (ior:SI (match_operand:SI 1 "register_operand" " %0, r, r, r") >> + (match_operand:SI 2 "general_operand" " w, r, Iu15, Ie15")))] >> + "" >> +{ >> + int one_position; >> + >> + switch (which_alternative) >> + { >> + case 0: >> + return "or33\t%0, %2"; >> + case 1: >> + return "or\t%0, %1, %2"; >> + case 2: >> + return "ori\t%0, %1, %2"; >> + case 3: >> + /* If we reach this alternative, >> + it must pass the nds32_can_use_bset_p() test, >> + so that we can guarantee there is only one 1-bit >> + within the immediate value. */ >> + for (one_position = 31; one_position >= 0; one_position--) >> + { >> + if ((INTVAL (operands[2]) & (1 << one_position)) != 0) >> + { >> + /* Found the 1-bit position. */ >> + operands[2] = GEN_INT (one_position); >> + break; >> + } >> + } >> + return "bset\t%0, %1, %2"; >> + >> + default: >> + gcc_unreachable (); >> + } >> +} >> + [(set_attr "type" "alu,alu,alu,alu") >> + (set_attr "length" " 2, 4, 4, 4")]) > > I don't understand the comment above the define_expand, sorry. > This too looks like a case where "iorsi3" should just be a define_insn, > with no define_expand. > The define_expand is a legacy code. It should not be necessary in current design. Thanks for the catch. Fix it accordingly. > Case 3 could use exact_log2. > Modify it accordingly. > Same comments for xor. Modify it accordingly. > (You might be able to use code iterators > and have a single set of patterns for or and xor, including the > shifting variants.) > Thanks for the suggestion. But I prefer not to use code iterators because we may add different variants for 'or' and 'xor' separately. >> +;; For negsi2 naming pattern, we have to use define_expand first, >> +;; and then design different anonymous patterns so that it can >> +;; output assembly code according to ISA. >> +(define_expand "negsi2" >> + [(set (match_operand:SI 0 "register_operand" "") >> + (neg:SI (match_operand:SI 1 "register_operand" "")))] >> + "" >> + "" >> +) >> + >> +;; Note that there is NO 'neg33' instruction for V2 ISA. >> +;; So 'subri A,B,0' (its semantic is 'A = 0 - B') >> +;; is the only option for V2 ISA. >> +(define_insn "*negsi2" >> + [(set (match_operand:SI 0 "register_operand" "=w, r") >> + (neg:SI (match_operand:SI 1 "register_operand" " w, r")))] >> + "" >> + "@ >> + neg33\t%0, %1 >> + subri\t%0, %1, 0" >> + [(set_attr "type" "alu,alu") >> + (set_attr "length" " 2, 4")]) > > Here too the define_expand seems redundant. Same for one_cmplsi2. > Thanks for the catch. I modify negsi2 and one_cmplsi2 patterns accordingly. > (Looks like you already define the shift instructions directly though, > thanks.) > >> +;; Shift instructions. >> + >> +(define_insn "ashlsi3" >> + [(set (match_operand:SI 0 "register_operand" "= l, r, r") >> + (ashift:SI (match_operand:SI 1 "register_operand" " l, r, r") >> + (match_operand:SI 2 "general_operand" " Iu03, Iu05, r")))] > > Operand 2 doesn't allow memory, so nonmemory_operand would be better > than general_operand. Both are correct, but nonmemory_operand is > tighter and so forces the pre-RA optimisers to treat the load as > a separate instruction. > > Same for the other shift instructions. > Thanks for the suggestion. It does help optimization for some cases. Modify it accordingly. >> +(define_expand "mov<mode>cc" >> + [(set (match_operand:QIHI 0 "register_operand" "") >> + (if_then_else:QIHI (match_operand 1 "comparison_operator" "") >> + (match_operand:QIHI 2 "register_operand" "") >> + (match_operand:QIHI 3 "register_operand" "")))] >> + "TARGET_CMOV" >> +{ >> + rtx insn; >> + >> + /* For QImode and HImode conditional move, >> + make them to be SImode behavior. */ >> + operands[0] = simplify_gen_subreg (SImode, operands[0], <MODE>mode, 0); >> + operands[2] = simplify_gen_subreg (SImode, operands[2], <MODE>mode, 0); >> + operands[3] = simplify_gen_subreg (SImode, operands[3], <MODE>mode, 0); >> + >> + insn = gen_movsicc (operands[0], operands[1], operands[2], operands[3]); >> + >> + if (!insn) >> + FAIL; >> + >> + emit_insn (insn); >> + DONE; >> +}) > > It'd be better to handle QI, HI and SI using a single template if possible. > Subregs are harder to optimise than plain registers. > Thanks for clarifying it. I decide to remove QI, HI pattern and just leave movsicc pattern in place. >> + >> +(define_insn "cmovz" >> + [(set (match_operand:SI 0 "register_operand" "=r, r") >> + (if_then_else:SI (eq (match_operand:SI 1 "register_operand" " r, r") >> + (const_int 0)) >> + (match_operand:SI 2 "register_operand" " r, 0") >> + (match_operand:SI 3 "register_operand" " 0, r")))] >> + "TARGET_CMOV" >> + "@ >> + cmovz\t%0, %2, %1 >> + cmovz\t%0, %3, %1" >> + [(set_attr "type" "move") >> + (set_attr "length" "4")]) >> + >> +(define_insn "cmovn" >> + [(set (match_operand:SI 0 "register_operand" "=r, r") >> + (if_then_else:SI (ne (match_operand:SI 1 "register_operand" " r, r") >> + (const_int 0)) >> + (match_operand:SI 2 "register_operand" " r, 0") >> + (match_operand:SI 3 "register_operand" " 0, r")))] >> + "TARGET_CMOV" >> + "@ >> + cmovn\t%0, %2, %1 >> + cmovn\t%0, %3, %1" >> + [(set_attr "type" "move") >> + (set_attr "length" "4")]) >> + >> +(define_insn_and_split "*movsicc" >> + [(set (match_operand:SI 0 "register_operand" "=r, r") >> + (if_then_else:SI (match_operator 1 "nds32_equality_comparison_operator" >> + [(match_operand:SI 2 "register_operand" " r, r") >> + (const_int 0)]) >> + (match_operand:SI 3 "register_operand" " 0, r") >> + (match_operand:SI 4 "register_operand" " r, 0")))] >> + "TARGET_CMOV" >> + "#" >> + "reload_completed" >> + [(pc)] >> +{ >> + enum rtx_code code = GET_CODE (operands[1]); >> + rtx then_op = operands[3]; >> + rtx else_op = operands[4]; >> + rtx tmp; >> + >> + /* According to the implementation in "movsicc" naming pattern, >> + if we make transformation in which the comparison code is EQ, >> + the desired target is at "else" part position semantically. >> + Now it is the time (after reload_completed) to physically >> + swap it to "then" part position. */ >> + if (code == EQ) >> + { >> + tmp = then_op; >> + then_op = else_op; >> + else_op = tmp; >> + } >> + >> + /* Choosing cmovz or cmovn is based on reload phase result. >> + After reload phase, one source operand will use >> + the same register as result operand. >> + We can use cmovz/cmovn to catch the other source operand >> + which has different register. >> + So We check register number to determine using cmovz or cmovn. */ >> + if (REGNO(then_op) == REGNO(operands[0])) >> + emit_insn (gen_cmovz (operands[0], operands[2], else_op, operands[0])); >> + else if (REGNO(else_op) == REGNO(operands[0])) >> + emit_insn (gen_cmovn (operands[0], operands[2], then_op, operands[0])); >> + else >> + gcc_unreachable (); >> + >> + DONE; >> +}) > > I don't really see off-hand how the third instruction would match in its > define_insn form, since the earlier instructions ought to match first. > And it looks from first glance like the splitter is working around > a bug in the first two instructions. E.g. shouldn't the first pattern be: > > (define_insn "cmovz" > [(set (match_operand:SI 0 "register_operand" "=r, r") > (if_then_else:SI (eq (match_operand:SI 1 "register_operand" " r, r") > (const_int 0)) > (match_operand:SI 2 "register_operand" " r, 0") > (match_operand:SI 3 "register_operand" " 0, r")))] > "TARGET_CMOV" > "@ > cmovz\t%0, %2, %1 > cmovn\t%0, %3, %1" > [(set_attr "type" "move") > (set_attr "length" "4")]) > > with the second alternative being "cmovn" rather than "cmovz"? > With a similar change to the "cmovn" pattern, the define_insn_and_split > ought to be unnecessary. > I would say you are ABSOLUTELY right! We follow your suggestion to modify "cmovz" and "cmovn" patterns. With that change we can happily discard the define_insn_and_split part! Thanks you very much!! :) >> + /* We want to plus 1 into the integer value >> + of operands[2] to create 'slt' instruction. >> + This caculation is performed on the host machine, >> + which may be 64-bit integer. >> + So the meaning of caculation result may be >> + different from the 32-bit nds32 target. >> + >> + For example: >> + 0x7fffffff + 0x1 -> 0x80000000, >> + this value is POSITIVE on 64-bit machine, >> + but the expected value on 32-bit nds32 target >> + should be NEGATIVE value. >> + >> + Hence, instead of using GEN_INT(), we use gen_int_mode() to >> + explicitly create SImode constant rtx. */ >> + operands[2] = gen_int_mode (INTVAL (operands[2]) + 1, SImode); > > The comment seems unnecessary. gen_int_mode is better than GEN_INT > whereever it can be used. It's new uses of GEN_INT that deserve > comments :-) > May I keep this comment for other engineers? :p Some people may not be able to tell the difference yet, and I wish I can keep this comment so that others are able to refer a sample of why gen_int_mode() is better than GEN_INT() wherever it can be used. :) >> + /* We want to plus 1 into the integer value >> + of operands[2] to create 'slt' instruction. >> + This caculation is performed on the host machine, >> + which may be 64-bit integer. >> + So the meaning of caculation result may be >> + different from the 32-bit nds32 target. >> + >> + For example: >> + 0x7fffffff + 0x1 -> 0x80000000, >> + this value is POSITIVE on 64-bit machine, >> + but the expected value on 32-bit nds32 target >> + should be NEGATIVE value. >> + >> + Hence, instead of using GEN_INT(), we use gen_int_mode() to >> + explicitly create SImode constant rtx. */ >> + operands[2] = gen_int_mode (INTVAL (operands[2]) + 1, SImode); >> + >> + if (code == LE) >> + { >> + /* LE, use slts instruction */ >> + emit_insn (gen_slts_compare (tmp_reg, operands[1], operands[2])); >> + } >> + else >> + { >> + /* LEU, use slt instruction */ >> + emit_insn (gen_slt_compare (tmp_reg, operands[1], operands[2])); >> + } > > Same here, but (le:SI X INT_MAX) isn't the same as (lt:SI X INT_MIN). > I'm not sure we're guaranteed to have optimised away all those cases > by this point, but at least an assert would be good. > Thanks. I add an assert in case the code is LE and original operands[2] is INT_MAX. >> +;; Subroutine call instruction returning no value. >> +;; operands[0]: It should be a mem RTX whose address is >> +;; the the address of the function. >> +;; operands[1]: It is the number of bytes of arguments pushed as a const_int. >> +;; operands[2]: It is the number of registers used as operands. >> + >> +(define_expand "call" >> + [(parallel [(call (match_operand 0 "memory_operand" "") >> + (match_operand 1 "general_operand" "")) > > Might as well just use (match_operand 1 "" ""), like the define_insns do. > > FWIW you can drop trailing ""s from match_operands. > Modify it accordingly. >> +(define_expand "prologue" [(const_int 0)] >> + "" >> +{ >> + /* Note that only under V3/V3M ISA, we could use v3push prologue. */ >> + if (TARGET_V3PUSH) >> + nds32_expand_prologue_v3push(); >> + else >> + nds32_expand_prologue(); >> + DONE; >> +}) > > Space before "()". Same for the epilogue pattern. > Fix it accordingly. >> +;; nop instruction. >> + >> +(define_insn "nop" >> + [(const_int 0)] >> + "" >> + "nop16" >> + [(set_attr "type" "misc") >> + (set_attr "length" "2")]) > > What happens for !TARGET_16BIT? Won't the alternative be disabled there? > Oh my god~ Thank you for the catch about such latent bug! Fix it in the revised patch. :) >> + /* The En4 encoding string of the instruction is >> + in the bitwise fashion of following: "fp gp lp sp". */ >> + const char *En4_str[16] = > > The optimisers would probably pick this up anyway, but: > > static const char *const En4_str[16] = > > explicitly says that this is an array that should live in read-only storage. > Fix it accordingly. >> + /* Create RbRe_str string. >> + Note that we need to output ',' character if there exists En4 field. */ >> + if (REGNO (operands[0]) != SP_REGNUM && REGNO (operands[1]) != SP_REGNUM) >> + RbRe_str = (INTVAL (operands[2]) != 0) ? "%0, %1, " : "%0, %1"; >> + else >> + RbRe_str = ""; > > Excess indentation. GNU style is not to wrap "INTVAL (operands[2]) != 0" > in brackets. > > Same comments for pop. > Fix it accordingly. > Overall it looks good to me FWIW. > > Thanks, > Richard > A revised patch for nds32.md is attached. Thanks for all of your review comments! :) Best regards, jasonwucj
Attachment:
2-nds32-backend-md-part2.v3.revised.patch
Description: Text document
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |