This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |
Other format: | [Raw text] |
Re: [PATCH] MIPS32 DSP intrinsics Thanks for your review! The latest patch is attached. It includes the changes that you pointed out and GCC document updates (invoke.texi and extend.texi). I will revise the patch, if there are issues.Thanks! Regards, Chao-ying 2005-07-07 Chao-ying Fu <fu@mips.com> * config/mips/mips-dsp.md: New file. * config/mips/mips-modes.def (V4QI, V2HI, CCDSP): New modes. * config/mips/mips.h (TARGET_CPU_CPP_BUILTINS): Add __mips_dsp. (ASM_SPEC): Map -mdsp to -mdsp in GAS. (FIRST_PSEUDO_REGISTER): Increase to 188 because of 6 DSP accumulators and 6 DSP control registers. (FIXED_REGISTERS, CALL_USED_REGISTERS, CALL_REALLY_USED_REGISTERS): Update for 12 new DSP registers. (DSP_ACC_REG_FIRST, DSP_ACC_REG_LAST, DSP_ACC_REG_NUM, AC1HI_REGNUM, AC1LO_REGNUM, AC2HI_REGNUM, AC2LO_REGNUM, AC3HI_REGNUM, AC3LO_REGNUM): New define. (DSP_ACC_REG_P, ACC_REG_P, ACC_HI_REG_P): New define. (reg_class): Add ACC_REGS. (REG_CLASS_NAMES): Add names for ACC_REGS. (REG_CLASS_CONTENTS): Update for ACC_REGS and ALL_REGS. (REG_ALLOC_ORDER): Update for 12 new DSP registers. (mips_char_to_class): Add 'a' for ACC_REGS. (UIMM6_OPERAND, IMM10_OPERAND): New define. (YA, YB, Ya, Yb, Yc, Yd, Ye, Yf): New extra two-letter constraints. (EXTRA_CONSTRAINT_Y): Add 8 extra constraints. (REGISTER_NAMES): Add names for 12 new DSP registers. * config/mips/mips.c (mips_function_type): Add types for DSP builtin functions. (mips_builtin_type): Add MIPS_BUILTIN_DIRECT_NO_TARGET and MIPS_BUILTIN_BPOSGE32. (mips_expand_builtin_direct): Add one parameter to indicate that builtin functions need to return a value. (mips_expand_builtin_bposge): New for expanding "bposge" builtin functions. (mips_regno_to_class): Add 12 new DSP registers. (mips_legitimize_move): Change to check hi/lo and six new DSP accumulators. (mips_subword): Change to check four HI registers. (mips_output_move): Output move to DSP accumulators. (override_options): Make sure -mdsp and -mips16 are not used together. Map 'a' to ACC_REGS. Enable DSP accumulators for machine modes. (mips_conditional_register_usage): Disable 6 new DSP accumulators when !TARGET_DSP. (print_operand): Add 'q' for printing DSP accumulators. (mips_cannot_change_mode_class): Check ACC_REGS. (mips_secondary_reload_class): Check ACC_REGS. (mips_vector_mode_supported_p): Enable V2HI and V4QI when TARGET_DSP. (mips_register_move_cost): Check ACC_REGS. (CODE_FOR_mips_addq_ph, CODE_FOR_mips_addq_s_ph, CODE_FOR_mips_addq_s_w, CODE_FOR_mips_addu_qb, CODE_FOR_mips_addu_s_qb, CODE_FOR_mips_subq_ph, CODE_FOR_mips_subq_s_ph, CODE_FOR_mips_subq_s_w, CODE_FOR_mips_subu_qb, CODE_FOR_mips_subu_s_qb, CODE_FOR_mips_absq_s_ph, CODE_FOR_mips_absq_s_w, CODE_FOR_mips_shll_qb, CODE_FOR_mips_shll_ph, CODE_FOR_mips_shll_s_ph, CODE_FOR_mips_shll_s_w, CODE_FOR_mips_shra_r_ph, CODE_FOR_mips_shra_r_w, CODE_FOR_mips_cmpu_eq_qb, CODE_FOR_mips_cmpu_lt_qb, CODE_FOR_mips_cmpu_le_qb, CODE_FOR_mips_cmp_eq_ph, CODE_FOR_mips_cmp_lt_ph, CODE_FOR_mips_cmp_le_ph, CODE_FOR_mips_pick_qb, CODE_FOR_mips_pick_ph, CODE_FOR_mips_bposge32): New define. (DIRECT_NO_TARGET_BUILTIN, BPOSGE_BUILTIN): New define for builtin functions. (dsp_bdesc): New for DSP builtin functions. (bdesc_arrays): Add DSP builtin function table. (mips_prepare_builtin_arg): Change for check predicate again after copy_to_mode_reg. (mips_expand_builtin): Expand new builtin types. (mips_init_builtins): Initialize new builtin types. (mips_expand_builtin_direct): Check if builtin functions need to return a value. (mips_expand_builtin_bposge): New for expanding "bposge" builtin functions. * config/mips/mips.md (UNSPEC_ADDQ .. UNSPEC_RDDSP): New unspec. (*movdi_32bit): Change 'x' to 'a' for ACC_REGS. (*movsi_internal): Change 'x' to 'a' for ACC_REGS. (*mfhilo_<mode>): Add constraints for 6 new DSP accumulators. Emit instructions for 6 new DSP accumulators. (mips-dsp.md): New include. * config/mips/mips.opt (-mdsp): New option. * config/mips/predicates.md (const_uimm6_operand, const_imm10_operand, reg_imm10_operand): New predicates. * doc/invoke.texi (-mdsp): Document new option. * doc/extend.texi (MIPS DSP Builtin Functions): New section. ----- Original Message ----- From: Richard Sandiford To: Fu, Chao-Ying Cc: Richard Sandiford ; Thekkath, Radhika ; gcc-patches@gcc.gnu.org Sent: Saturday, July 02, 2005 9:34 AM Subject: Re: [PATCH] MIPS32 DSP intrinsics "Chao-ying Fu" <fu@mips.com> writes: > I combined 17 immediate and variable versions of > intrinsics. And I used macro to reduce several > instructions. Please see the attached files. Thanks. These changes look good. > Here is the list of 17 intrinsics that can accept > immediate numbers and variables as parameters. > > v4i8 __builtin_mips_shll_qb (v4i8, imm0_7); > v2q15 __builtin_mips_shll_ph (v2q15, imm0_15); > v2q15 __builtin_mips_shll_s_ph (v2q15, imm0_15); > q31 __builtin_mips_shll_s_w (q31, imm0_31); > v4i8 __builtin_mips_shrl_qb (v4i8, imm0_7); > v2q15 __builtin_mips_shra_ph (v2q15, imm0_15); > v2q15 __builtin_mips_shra_r_ph (v2q15, imm0_15); > q31 __builtin_mips_shra_r_w (q31, imm0_31); > v4i8 __builtin_mips_repl_qb (imm0_255); > v2q15 __builtin_mips_repl_ph (imm_n512_511); > i32 __builtin_mips_extr_w (a64, imm0_31); > i32 __builtin_mips_extr_r_w (a64, imm0_31); > i32 __builtin_mips_extr_rs_w (a64, imm0_31); > i32 __builtin_mips_extr_s_h (a64, imm0_31); > i32 __builtin_mips_extp (a64, imm0_31); > i32 __builtin_mips_extpdp (a64, imm0_31); > a64 __builtin_mips_shilo (a64, imm_n32_31); > > The range of immediate numbers and the range > of variables are the same for all intrinsics > except __builtin_mips_repl_ph() which immediate > version accepts -512 to 511 (repl.ph), but variable > version can process -32768 to 32767 (replv.ph). > Thus, I created a predicate "reg_imm10_operand" for > the "repl.ph" instruction pattern, but used the predicate > "arith_operand" for other 16 instruction patterns. > For repl.ph, GCC will copy numbers to a register and pick the > variable instructions, if numbers are not in the range > (-512 to 511). > For other 16 instruction patterns, there is no benefit by > moving numbers to a register, because both immediate and > variable versions accept same ranges. Thanks for the in-depth description. This sort of information makes the patch _much_ easier to review. > One small problem is that when users use immediate > numbers out of the range, GCC doesn't give warnings but > just silently masks numbers to the range upon emitting > instructions. Is this behavior ok? Yeah, I think this is exactly how it should be. The masking is part of the *v instructions' semantics, so it should be part of the builtin functions' semantics too. The revised .md file looks much better. You've made good use of macros here. There's still more cut-&-paste than I'd really like, such as with the qbl and qbr variants, but I'm not sure there's much that can be done about that. Any attempt to combine patterns further might also make it more difficult to handle the DEFINE_BUILTIN_* stuff in a regular way. In summary, I think the new .md file is OK, except: > if (which_alternative==0) Spaces around "==". Same problem throughout file. > (define_insn "mips_repl_ph" > [(set (match_operand:V2HI 0 "register_operand" "=d,d") > (unspec:V2HI [(match_operand:SI 1 "reg_imm10_operand" "I,d")] > UNSPEC_REPL_PH))] This isn't right. 'I' allows any 16-bit constant, whereas the predicate only allows 10-bit constants. Thus if you had: (set (reg R1) (const_int 10000)) [REG_EQUIV (const_int 10000)] ... (set (reg R2) (unspec [(reg R1)] UNSPEC_REPL_PH)) and R1 is not allocated a hard register, reload will try to reload the second instruction as though it was: (set (reg R2) (unspec [(const_int 10000)] UNSPEC_REPL_PH)) And it will think that no reloads are needed because 10000 matches the constraints. Or the short version: your constraints can't accept things that the predicates don't. The right thing to do is to introduce a new constraint for 10-bit immediates. There aren't many free constraint letters, so you should probably use a two-letter constraint here. In other words, pick a free constraint letter and say that any constraint beginning with that letter is two characters long. Then pick one two-character sequence for 10-bit immediates. > + /* For MIPS DSP ASE */ > + MIPS_DI_FTYPE_DI_SI, > + MIPS_DI_FTYPE_DI_SI_SI, > + MIPS_DI_FTYPE_DI_V2HI_V2HI, > + MIPS_DI_FTYPE_DI_V4QI_V4QI, > + MIPS_SI_FTYPE_DI_SI, > + MIPS_SI_FTYPE_PTR_SI, > + MIPS_SI_FTYPE_SI, > + MIPS_SI_FTYPE_SI_SI, > + MIPS_SI_FTYPE_V2HI, > + MIPS_SI_FTYPE_V2HI_V2HI, > + MIPS_SI_FTYPE_V4QI, > + MIPS_SI_FTYPE_V4QI_V4QI, > + MIPS_SI_FTYPE_VOID, > + MIPS_V2HI_FTYPE_SI, > + MIPS_V2HI_FTYPE_SI_SI, > + MIPS_V2HI_FTYPE_V2HI, > + MIPS_V2HI_FTYPE_V2HI_SI, > + MIPS_V2HI_FTYPE_V2HI_V2HI, > + MIPS_V2HI_FTYPE_V4QI, > + MIPS_V2HI_FTYPE_V4QI_V2HI, > + MIPS_V4QI_FTYPE_SI, > + MIPS_V4QI_FTYPE_V2HI_V2HI, > + MIPS_V4QI_FTYPE_V4QI_SI, > + MIPS_V4QI_FTYPE_V4QI_V4QI, > + MIPS_VOID_FTYPE_SI_SI, > + MIPS_VOID_FTYPE_V2HI_V2HI, > + MIPS_VOID_FTYPE_V4QI_V4QI, Ick. Not your fault, but this enum really isn't scaling well. Maybe food for a future cleanup. > + /* Similar to MIPS_BUILTIN_DIRECT, but the last argument is an immediate > + from 0 to 63. */ > + MIPS_BUILTIN_DIRECT_IMM0_63, > + > + /* The builtin corresponds directly to an .md pattern. There is no return > + value and the arguments are mapped to operands 0 and above. */ > + MIPS_BUILTIN_DIRECT_NO_TARGET, > + > + /* Similar to MIPS_BUILTIN_DIRECT_NO_TARGET, but the last argument is an > + immediate from 0 to 63. */ > + MIPS_BUILTIN_DIRECT_NO_TARGET_IMM0_63, It would better IMO to treat the FOO_IMM0_63 in the same way as FOO and define a new predicate for 0...63 integers. Then, rather than change the interface to mips_prepare_builtin_arg, you can simply change: value = expand_expr (TREE_VALUE (*arglist), NULL_RTX, VOIDmode, 0); mode = insn_data[icode].operand[op].mode; if (!insn_data[icode].operand[op].predicate (value, mode)) value = copy_to_mode_reg (mode, value); to: value = expand_expr (TREE_VALUE (*arglist), NULL_RTX, VOIDmode, 0); mode = insn_data[icode].operand[op].mode; if (!insn_data[icode].operand[op].predicate (value, mode)) { value = copy_to_mode_reg (mode, value); if (!insn_data[icode].operand[op].predicate (value, mode)) error ("...invalid argument..."); } The error message doesn't mention the supported range, but I don't see that as a problem. People using the function ought to know. In general, I'd prefer that all operand restrictions be encoded in the predicates, not in the expansion code. > *************** const enum reg_class mips_regno_to_class > *** 644,650 **** > COP3_REGS, COP3_REGS, COP3_REGS, COP3_REGS, > COP3_REGS, COP3_REGS, COP3_REGS, COP3_REGS, > COP3_REGS, COP3_REGS, COP3_REGS, COP3_REGS, > ! COP3_REGS, COP3_REGS, COP3_REGS, COP3_REGS > }; > > /* Map register constraint character to register class. */ > --- 691,700 ---- > COP3_REGS, COP3_REGS, COP3_REGS, COP3_REGS, > COP3_REGS, COP3_REGS, COP3_REGS, COP3_REGS, > COP3_REGS, COP3_REGS, COP3_REGS, COP3_REGS, > ! COP3_REGS, COP3_REGS, COP3_REGS, COP3_REGS, > ! ACC_REGS, ACC_REGS, ACC_REGS, ACC_REGS, > ! ACC_REGS, ACC_REGS, DSP_CNTL_REGS, DSP_CNTL_REGS, > ! DSP_CNTL_REGS,DSP_CNTL_REGS, DSP_CNTL_REGS, DSP_CNTL_REGS DSP_CNTL_REGS serves no purpose. All registers in it are fixed so you might as well use ALL_REGS. > *************** mips_subword (rtx op, int high_p) > *** 2610,2615 **** > --- 2660,2667 ---- > return gen_rtx_REG (word_mode, high_p ? REGNO (op) + 1 : REGNO (op)); > if (REGNO (op) == HI_REGNUM) > return gen_rtx_REG (word_mode, high_p ? HI_REGNUM : LO_REGNUM); > + if (DSP_ACC_REG_P (REGNO (op))) > + return gen_rtx_REG (word_mode, high_p ? REGNO (op) : REGNO (op) + 1); You still aren't combining a lot of the MD/ACC code. Instead of adding a new statement here, you should just change the HI_REGNUM code to: if (ACC_REG_P (REGNO (op))) return gen_rtx_REG (word_mode, high_p ? REGNO (op) : REGNO (op) + 1); where ACC_REG_P() returns true for anything in ACC_REGS. > *************** mips_output_move (rtx dest, rtx src) > *** 2741,2746 **** > --- 2801,2817 ---- > { > if (src_code == REG) > { > + if (MD_REG_P (REGNO (src))) > + return "mf%1\t%0"; > + > + if (DSP_ACC_REG_P (REGNO (src))) > + { > + static char retval[] = "mf__\t%0,%q1"; > + retval[2] = reg_names[REGNO (src)][4]; > + retval[3] = reg_names[REGNO (src)][5]; > + return retval; > + } > + No! This is very wrong. You didn't explain this change, but reading between the lines, you seem to think that we don't generate mfhi and mflo at the moment. And of course we do. See mips_legitimize_move and the mfhilo_* patterns for a description of how we generate them and why the normal move patterns can't. Your changes to the move patterns are wrong for the same reason: > *************** beq\t%2,%.,1b\;\ > *** 3295,3309 **** > ;; in FP registers (off by default, use -mdebugh to enable). > > (define_insn "*movsi_internal" > ! [(set (match_operand:SI 0 "nonimmediate_operand" "=d,d,e,d,m,*f,*f,*f,*d,*m,*d,*z,*x,*B*C*D,*B*C*D,*d,*m") > ! (match_operand:SI 1 "move_operand" "d,U,T,m,dJ,*f,*d*J,*m,*f,*f,*z,*d,*J*d,*d,*m,*B*C*D,*B*C*D"))] > "!TARGET_MIPS16 > && (register_operand (operands[0], SImode) > || reg_or_0_operand (operands[1], SImode))" > { return mips_output_move (operands[0], operands[1]); } > ! [(set_attr "type" "arith,const,const,load,store,fmove,xfer,fpload,xfer,fpstore,xfer,xfer,mthil o,xfer,load,xfer,store") > (set_attr "mode" "SI") > ! (set_attr "length" "4,*,*,*,*,4,4,*,4,*,4,4,4,4,*,4,*")]) > > (define_insn "*movsi_mips16" > [(set (match_operand:SI 0 "nonimmediate_operand" "=d,y,d,d,d,d,d,m") > --- 3366,3380 ---- > ;; in FP registers (off by default, use -mdebugh to enable). > > (define_insn "*movsi_internal" > ! [(set (match_operand:SI 0 "nonimmediate_operand" "=d,d,e,d,m,*f,*f,*f,*d,*m,*d,*z,*x*a,*d,*B*C*D,*B*C*D,*d,*m") > ! (match_operand:SI 1 "move_operand" "d,U,T,m,dJ,*f,*d*J,*m,*f,*f,*z,*d,*J*d,*x*a,*d,*m,*B*C*D,*B*C*D"))] > "!TARGET_MIPS16 > && (register_operand (operands[0], SImode) > || reg_or_0_operand (operands[1], SImode))" > { return mips_output_move (operands[0], operands[1]); } > ! [(set_attr "type" "arith,const,const,load,store,fmove,xfer,fpload,xfer,fpstore,xfer,xfer,mthil o,mfhilo,xfer,load,xfer,store") > (set_attr "mode" "SI") > ! (set_attr "length" "4,*,*,*,*,4,4,*,4,*,4,4,4,4,4,*,4,*")]) > > (define_insn "*movsi_mips16" > [(set (match_operand:SI 0 "nonimmediate_operand" "=d,y,d,d,d,d,d,m") GR_REGS <- MD_REGS is not an alternative we can allow. I said in the last review that there was no need for a separate register class for ACC_REGS - MD_REGS. Unfortunately, this move problem shows that I was wrong. The move patterns should allow GR_REGS <- (ACC_REGS - MD_REGS) but not GR_REGS <- MD_REGS. > *************** override_options (void) > *** 4785,4790 **** > --- 4861,4872 ---- > || (regno == MD_REG_FIRST > && size == 2 * UNITS_PER_WORD))); > > + else if (DSP_ACC_REG_P (regno)) > + temp = (INTEGRAL_MODE_P (mode) > + && (size <= UNITS_PER_WORD > + || (((regno - DSP_ACC_REG_FIRST) & 1) == 0 > + && size == 2 * UNITS_PER_WORD))); > + Again, this should be combined with the MD_REGS code. > *************** mips_cannot_change_mode_class (enum mach > *** 7272,7277 **** > --- 7378,7388 ---- > and multi-word modes. */ > if (reg_classes_intersect_p (HI_REG, class)) > return true; > + > + /* ACC_REGS includes HI, LO and 6 DSP accumulator registers. > + For the same reason above, we need to check it. */ > + if (reg_classes_intersect_p (ACC_REGS, class)) > + return true; The new test encompasses the HI_REG one, which is no longer needed. > *************** mips_secondary_reload_class (enum reg_cl > *** 7352,7358 **** > > /* Copying from HI or LO to anywhere other than a general register > requires a general register. */ > ! if (class == HI_REG || class == LO_REG || class == MD_REGS) > { > if (TARGET_MIPS16 && in_p) > { > --- 7463,7470 ---- > > /* Copying from HI or LO to anywhere other than a general register > requires a general register. */ > ! if (class == HI_REG || class == LO_REG || class == MD_REGS || > ! class == ACC_REGS) Please use reg_class_subset_p (class, ACC_REGS) instead of specific == tests. > *************** mips_secondary_reload_class (enum reg_cl > *** 7361,7367 **** > } > return gp_reg_p ? NO_REGS : gr_regs; > } > ! if (MD_REG_P (regno)) > { > if (TARGET_MIPS16 && ! in_p) > { > --- 7473,7479 ---- > } > return gp_reg_p ? NO_REGS : gr_regs; > } > ! if (MD_REG_P (regno) || DSP_ACC_REG_P (regno)) > { > if (TARGET_MIPS16 && ! in_p) > { ACC_REG_P here too. > *************** mips_register_move_cost (enum machine_mo > *** 8840,8846 **** > } > else if (to == FP_REGS) > return 4; > ! else if (to == HI_REG || to == LO_REG || to == MD_REGS) > { > if (TARGET_MIPS16) > return 12; > --- 8954,8961 ---- > } > else if (to == FP_REGS) > return 4; > ! else if (to == HI_REG || to == LO_REG || to == MD_REGS || > ! to == ACC_REGS) > { > if (TARGET_MIPS16) > return 12; reg_class_subset_p again. > *************** mips_register_move_cost (enum machine_mo > *** 8861,8867 **** > else if (to == ST_REGS) > return 8; > } > ! else if (from == HI_REG || from == LO_REG || from == MD_REGS) > { > if (GR_REG_CLASS_P (to)) > { > --- 8976,8983 ---- > else if (to == ST_REGS) > return 8; > } > ! else if (from == HI_REG || from == LO_REG || from == MD_REGS || > ! from == ACC_REGS) > { > if (GR_REG_CLASS_P (to)) > { and here. > + /* Define a MIPS_BUILTIN_DIRECT_IMM* function for instruction > + CODE_FOR_mips_<INSN>. FUNCTION_TYPE and TARGET_FLAGS are > + builtin_description fields. */ > + #define DIRECT_IMM_BUILTIN(INSN, FUNCTION_TYPE, TARGET_FLAGS, MIN, MAX) \ > + { CODE_FOR_mips_ ## INSN, 0, "__builtin_mips_" #INSN, \ > + MIPS_BUILTIN_DIRECT_IMM ## MIN ## _ ##MAX, FUNCTION_TYPE, TARGET_FLAGS } OK, so this code will go after the change I described above, but: please make the comment match the arguments. You don't mention MIN or MAX. > + /* Define a MIPS_BUILTIN_DIRECT_NO_TARGET_IMM* function for instruction > + CODE_FOR_mips_<INSN>. FUNCTION_TYPE and TARGET_FLAGS are > + builtin_description fields. */ > + #define DIRECT_NO_TARGET_IMM_BUILTIN(INSN, FUNCTION_TYPE, TARGET_FLAGS, \ > + MIN, MAX) \ > + { CODE_FOR_mips_ ## INSN, 0, "__builtin_mips_" #INSN, \ > + MIPS_BUILTIN_DIRECT_NO_TARGET_IMM ## MIN ## _ ## MAX, FUNCTION_TYPE,\ > + TARGET_FLAGS } Same here. > + #define BPOSGE_BUILTIN(INSN, BUILTIN_TYPE, FUNCTION_TYPE, TARGET_FLAGS) \ > + { CODE_FOR_mips_ ## INSN, 0, "__builtin_mips_" #INSN, \ > + MIPS_BUILTIN_ ## BUILTIN_TYPE, FUNCTION_TYPE, TARGET_FLAGS } Missing comment. > + Ex: > + move target, 0 > + bposge32 label1 > + j label2 > + label1: > + move target, 1 > + label2: > + ... */ Be a bit more verbose than "Ex". Use "li" rather than "move". > + if (builtin_type == MIPS_BUILTIN_BPOSGE32) > + cmp_value = 32; > + else > + gcc_assert (0); gcc_unreachable () > ! - 6 DSP control register */ "control registers." > + DSP_CNTL_REGS, /* DSP control registers */ Like I say, this class isn't really needed. The biggest problem is that your patch still doesn't include changes to extend.texi or invoke.texi. That documentation is very much required. The patch looks good otherwise. Richard
Attachment:
mipsdsp.diff
Description: Binary data
Attachment:
mips-dsp.md
Description: Binary data
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |