This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH 2/6] Andes nds32: machine description of nds32 porting (1).


On 9/13/13 11:33 PM, Richard Sandiford wrote:
> Chung-Ju Wu <jasonwucj@gmail.com> writes:
>> It has been a while since last v2 patch.
>> I create a new v3 patch to fix some typo and indentation.
> 
> I had a read through out of curiosity, and FWIW, it looks very clean and
> well-commented to me.  See below for a few questions and comments.
> This isn't an official review though.
> 

Thanks for your time reviewing our patch! :)
These comments are so valuable that it does help improve
performance and code size for applications and benchmarks.

We have modified our machine descriptions based on your suggestions.
We also did bootstrap and regression test.

The followings are our response to your questions and comments.

>> +  /* Calculate the bytes of saving callee-saved registers on stack.  */
>> +  cfun->machine->callee_saved_regs_size = 0;
>> +  cfun->machine->callee_saved_regs_first_regno = SP_REGNUM;
>> +  cfun->machine->callee_saved_regs_last_regno  = SP_REGNUM;
>> +  /* Currently, there is no need to check $r28~$r31
>> +     because we will save them in another way.  */
>> +  for (r = 0; r < 28; r++)
>> +    {
>> +      if (NDS32_REQUIRED_CALLEE_SAVED_P (r))
>> +	{
>> +	  /* Each register is 4 bytes.  */
>> +	  cfun->machine->callee_saved_regs_size += 4;
>> +
>> +	  /* Mark the first required callee-saved register
>> +	     (only need to set it once).
>> +	     If first regno == SP_REGNUM, we can tell that
>> +	     it is the first time to be here.  */
>> +	  if (cfun->machine->callee_saved_regs_first_regno == SP_REGNUM)
>> +	    cfun->machine->callee_saved_regs_first_regno = r;
>> +	  /* Mark the last required callee-saved register.  */
>> +	  cfun->machine->callee_saved_regs_last_regno = r;
>> +	}
>> +    }
>> +
>> +  /* Note:
>> +     Since our smw/lmw instructions use Rb and Re
>> +     to store/load registers consecutively,
>> +     we need to check again to see if there is any register
>> +     which is not live in the function but in the range
>> +     between 'callee_saved_regs_first_regno' and 'callee_save_regs_last_regno'.
>> +     If we find that registers, save their register size
>> +     in the 'callee_saved_regs_size' for prologue and epilogue.
>> +
>> +     For example:
>> +     Assume that the registers $r6, $r7, $r8, $r10, and $r11 are live
>> +     in the function, so initially callee_saved_regs_size = 4 * 5 = 20.
>> +     However, although $r9 is not live in the function,
>> +     it is in the range between $r6 and $r11.
>> +     We have to increase callee_saved_regs_size so that
>> +     prologue and epilogue can use it to issue
>> +     smw/lmw instructions and adjust the offset correctly.  */
>> +  for (r =  cfun->machine->callee_saved_regs_first_regno;
>> +       r <= cfun->machine->callee_saved_regs_last_regno;
>> +       r++)
>> +    {
>> +      if (!df_regs_ever_live_p (r)
>> +	  && r > cfun->machine->callee_saved_regs_first_regno
>> +	  && r < cfun->machine->callee_saved_regs_last_regno)
>> +	{
>> +	  /* Found one register which is not live in the function
>> +	     but in the range between first_regno and last_regno.  */
>> +	  cfun->machine->callee_saved_regs_size += 4;
>> +	}
>> +    }
> 
> I wasn't sure here why you were setting callee_saved_regs_size in the
> first loop and then had the second loop to handle the others.  Couldn't
> this just be done by moving:
> 
>> +  /* Adjustment for v3push instructions:
>> +     If we are using v3push (push25/pop25) instructions,
>> +     we need to make sure Rb is $r6 and Re is
>> +     located on $r6, $r8, $r10, or $r14.
>> +     Some results above will be discarded and recomputed.
>> +     Note that it is only available under V3/V3M ISA.  */
>> +  if (TARGET_V3PUSH)
>> +    {
>> +      /* Recompute:
>> +           cfun->machine->fp_size
>> +           cfun->machine->gp_size
>> +           cfun->machine->lp_size
>> +           cfun->machine->callee_saved_regs_first_regno
>> +           cfun->machine->callee_saved_regs_last_regno
>> +           cfun->machine->callee_saved_regs_size */
>> +
>> +      /* For v3push instructions, $fp, $gp, and $lp are always saved.  */
>> +      cfun->machine->fp_size = 4;
>> +      cfun->machine->gp_size = 4;
>> +      cfun->machine->lp_size = 4;
>> +
>> +      /* Remember to set Rb = $r6.  */
>> +      cfun->machine->callee_saved_regs_first_regno = 6;
>> +
>> +      if (cfun->machine->callee_saved_regs_last_regno <= 6)
>> +	{
>> +	  /* Re = $r6 */
>> +	  cfun->machine->callee_saved_regs_last_regno = 6;
>> +	}
>> +      else if (cfun->machine->callee_saved_regs_last_regno <= 8)
>> +	{
>> +	  /* Re = $r8 */
>> +	  cfun->machine->callee_saved_regs_last_regno = 8;
>> +	}
>> +      else if (cfun->machine->callee_saved_regs_last_regno <= 10)
>> +	{
>> +	  /* Re = $r10 */
>> +	  cfun->machine->callee_saved_regs_last_regno = 10;
>> +	}
>> +      else if (cfun->machine->callee_saved_regs_last_regno <= 14)
>> +	{
>> +	  /* Re = $r14 */
>> +	  cfun->machine->callee_saved_regs_last_regno = 14;
>> +	}
>> +      else if (cfun->machine->callee_saved_regs_last_regno == SP_REGNUM)
>> +	{
>> +	  /* If last_regno is SP_REGNUM, which means
>> +	     it is never changed, so set it to Re = $r6.  */
>> +	  cfun->machine->callee_saved_regs_last_regno = 6;
>> +	}
>> +      else
>> +	{
>> +	  /* The program flow should not go here.  */
>> +	  gcc_unreachable ();
>> +	}
>> +
>> +      /* Compute pushed size of callee-saved registers.  */
>> +      cfun->machine->callee_saved_regs_size
>> +	= 4 * (cfun->machine->callee_saved_regs_last_regno
>> +	       - cfun->machine->callee_saved_regs_first_regno
>> +	       + 1);
> 
> ...this last statement out of the "if" block and making it
> conditional on "cfun->machine->callee_saved_regs_last_regno != SP_REGNUM"?
> 

Yes.  This is definitely much better and clear.
Thanks for the suggestion and we modify it in the revised patch.

>> +/* Function to create a parallel rtx pattern
>> +   which presents stack push multiple behavior.
>> +   The overall concept are:
>> +     "unspec_stack_push_multiple",
>> +     "use/clobber stack pointer",
>> +     "use Rb to Re", and "use $fp, $gp, $lp".  */
>> +static rtx
>> +nds32_gen_stack_push_multiple (rtx Rb, rtx Re, rtx En4)
>> +{
>> +  int regno;
>> +  int extra_count;
>> +  int num_use_regs;
>> +  int par_index;
>> +
>> +  rtx reg;
>> +  rtx parallel_insn;
>> +
>> +  /* We need to provide a customized rtx which contains
>> +     necessary information for data analysis,
>> +     so we create a parallel rtx like this:
>> +     (parallel [(unspec [(reg: Rb)
>> +                         (reg: Re)
>> +                         (const_int En4)] UNSPEC_STACK_PUSH_MULTIPLE)
>> +                (use (reg:SI SP_REGNUM))
>> +                (clobber (reg:SI SP_REGNUM))
>> +                (use (reg:SI Rb))
>> +                (use (reg:SI Rb+1))
>> +                ...
>> +                (use (reg:SI Re))
>> +                (use (reg:SI FP_REGNUM))
>> +                (use (reg:SI GP_REGNUM))
>> +                (use (reg:SI LP_REGNUM))]) */
> 
> I know it's a pain to implement, but the pattern should really model the
> memory too, like you do for the call-frame information.  I.e.:
> 
>   (parallel [(set (mem:SI ...)
>                   (reg:SI Rb))
>              (set (mem:SI ...)
>                   (reg:SI Rb+1))
>              ...])
> 
> Same idea for pop.  (It might be that you don't need separate call-frame
> info after that change.)
> 

You are correct.
With that change, I can discard call-frame info implementation.
Thanks!

>> +/* Function to construct isr vectors information array.
>> +   We need to check:
>> +     1. Traverse interrupt/exception/reset for settig vector id.
>> +     2. Only 'nested', 'not_nested', or 'nested_ready' in the attributes.
>> +     3. Only 'save_all' or 'partial_save' in the attributes.  */
>> +static void
>> +nds32_construct_isr_vectors_information (tree func_attrs,
>> +					 const char *func_name)
>> +{
>> +  int nested_p, not_nested_p, nested_ready_p;
>> +  int save_all_p, partial_save_p;
>> +  tree intr, excp, reset;
>> +  int temp_count;
>> +
>> +  nested_p = not_nested_p = nested_ready_p = 0;
>> +  save_all_p = partial_save_p = 0;
>> +  temp_count = 0;
>> +
>> +  /* We must check at MOST one attribute to set save-reg.  */
>> +  if (lookup_attribute ("save_all", func_attrs))
>> +    save_all_p = 1;
>> +  if (lookup_attribute ("partial_save", func_attrs))
>> +    partial_save_p = 1;
>> +
>> +  if ((save_all_p + partial_save_p) > 1)
>> +    error ("multiple save reg attributes to function %qs", func_name);
> 
> I think this kind of error is really supposed to be reported in
> TARGET_MERGE_DECL_ATTRIBUTES, where it can include full location
> information.  Same for the other checks.
> 

Move such checking in TARGET_MERGE_DECL_ATTRIBUTES accordingly.

>> +	  else
>> +	    {
>> +	      /* Issue error if it is not a valid integer value.  */
>> +	      error ("invalid id value for interrupt/exception attribute");
>> +	    }
> 
> ...and I think this kind of argument checking is supposed to go in
> TARGET_INSERT_ATTRIBUTES.
> 

Move such checking in TARGET_INSERT_ATTRIBUTES accordingly.

>> +      /* Create one more instruction to move value
>> +         into the temporary register.  */
>> +      value_move_insn = gen_movsi (tmp_reg, GEN_INT (full_value));
>> +      /* Emit rtx into insn list and receive its transformed insn rtx.  */
>> +      value_move_insn = emit_insn (value_move_insn);
> 
> This can be simplified to:
> 
>   value_move_insn = emit_move_insn (tmp_reg, GEN_INT (full_value));
> 
> which is more usual than calling gen_mov* directly.
> 

Fix it accordingly.

>> +/* Return true if MODE/TYPE need double word alignment.  */
>> +static bool
>> +nds32_needs_double_word_align (enum machine_mode mode, const_tree type)
>> +{
>> +  return (GET_MODE_ALIGNMENT (mode) > PARM_BOUNDARY
>> +	  || (type && TYPE_ALIGN (type) > PARM_BOUNDARY));
>> +}
> 
> For something ABI-related like this, it's usually better not to look at
> "mode" when "type" is nonnull, since the type directly reflects the
> source language whereas "mode" is more of an internal detail.  E.g..
> something like:
> 
>   (type ? TYPE_ALIGN (type) : GET_MODE_ALIGNMENT (mode)) > PARM_BOUNDARY
> 

Fix it accordingly.

>> +/* Function that check if 'INDEX' is valid to be a index rtx for address.
>> +
>> +   OUTER_MODE : Machine mode of outer address rtx.
>> +   OUTER_CODE : rtx code of outer address rtx.
>> +        INDEX : Check if this rtx is valid to be a index for address.
>> +         BASE : The rtx of base register.
>> +       STRICT : If it is true, we are in reload pass or after reload pass.  */
>> +static bool
>> +nds32_legitimate_index_p (enum machine_mode outer_mode,
>> +			  enum rtx_code outer_code ATTRIBUTE_UNUSED,
>> +			  rtx index,
>> +			  rtx base ATTRIBUTE_UNUSED,
>> +			  bool strict)
> 
> Minor, but since this is an internal function, it'd probably be better
> to remove the unused parameters.  Also.
> 

Fix it accordingly.

>> +	  /* Further check if the value is legal for the 'outer_mode'.  */
>> +	  if (!satisfies_constraint_Is15 (index))
>> +	    goto non_legitimate_index;
> ...
>> +non_legitimate_index:
>> +  return false;
> 
> ...better to just use "return false" here.  (I'm guessing this code
> dates back to the old GO_IF_LEGITIMATE_ADDRESS days.)
> 

Fix it accordingly.

>> +/* Function to expand builtin function for
>> +   '[(unspec [(reg)])]'.  */
> 
> Plain unspecs are pretty dangerous, since there's nothing really
> to anchor them to a particular position.  I've not yet looked through
> the .md file to see what the patterns actually look like though.
> 

Thanks for the notice.
Since our intrinsics are all unspec_volatile, I change this comment to:
  /* Function to expand builtin function for
    '[(unspec_volatile [(reg)])]'.  */

>> +  /* Determine the modes of the instruction operands.
>> +     Note that we don't have left-hand-side result,
>> +     so operands[0] IS FOR arg0.  */
>> +  mode0 = insn_data[icode].operand[0].mode;
>> +
>> +  /* Refer to nds32.intrinsic.md,
>> +     we want to ensure operands[0] is register_operand.  */
>> +  if (!((*insn_data[icode].operand[0].predicate) (op0, mode0)))
>> +    op0 = copy_to_mode_reg (mode0, op0);
>> +
>> +  /* Emit new instruction and return original target.  */
>> +  pat = GEN_FCN (icode) (op0);
>> +
>> +  if (!pat)
>> +    return 0;
>> +
>> +  emit_insn (pat);
>> +
>> +  return target;
> 
> Please use the create_*_operand/maybe_expand_insn routines from optabs.h
> for this kind of thing.
> 

Now I use create_*_operand/maybe_expand_insn to expand builtin function.
That approach is more easier to implement than my original design.
Thank you for the suggestion.

>> +/* Permitting tail calls.  */
>> +
>> +static bool
>> +nds32_warn_func_return (tree decl)
>> +{
>> +  /* Naked functions are implemented entirely in assembly, including the
>> +     return sequence, so suppress warnings about this.  */
>> +  return !nds32_naked_function_p (decl);
>> +}
> 
> The comment above the function doesn't seem to match the function.
> 

The TARGET_WARN_FUNC_RETURN is located at "17.10.13 Permitting tail calls"
in GCC Internal 4.9.0 pre-release.  I should have added comment to describe
nds32_warn_func_return() to avoid such confusion.  Like this:

  /* Permitting tail calls.  */

  /* Determine whether we need to enable warning for function return check.  */
  static bool
  nds32_warn_func_return (tree decl)

Now I refine it with above change in the revised patch.  Thanks for the catch.

>> +  /* Instruction-cache sync instruction,
>> +     requesting an arugment as staring address.  */
>> +  rtx isync_insn;
> 
> Typo: "argument as starting address".
> 

Fix it accordingly.

>> +      /* Now check [reg], [symbol_ref], and [const].  */
>> +      if (GET_CODE (x) != REG
>> +	  && GET_CODE (x) != SYMBOL_REF
>> +	  && GET_CODE (x) != CONST)
>> +	goto non_legitimate_address;
>> [...]
>> +non_legitimate_address:
>> +  return false;
> 
> As above, no need for the gotos.  Just return false directly.
> 

Fix it accordingly.

>> +    case PLUS:
>> +      /* virtual_stack_vars_rtx will eventually transfer to SP or FP
>> +         force [virtual_stack_vars + reg or const] to register first,
>> +         make the offset_address combination to
>> +         other addressing mode possible.  */
>> +      if (mode == BLKmode
>> +	  && REG_P (XEXP (x, 0))
>> +	  && (REGNO (XEXP (x, 0)) == VIRTUAL_STACK_VARS_REGNUM))
>> +	goto non_legitimate_address;
> 
> I don't understand this, sorry.  Is it an optimisation, or needed for
> correctness?
> 

This is a legacy code.  Its purpose is to increase the possibility of having
  [STACK_VARS + offset]
  -> [(FP/SP + const) + offset]
  -> [FP/SP + new_const]
In that case, we can avoid using one register to hold (FP/SP + const).

However, unfortunately we are not able to reproduce original bad case
with current gcc trunk.  So I guess it may be caused by previous
faulty nds32.c/nds32.h/nds32.md design.

I think we can remove this code fragment.
Modify it in the revised patch.

>> +static rtx
>> +nds32_legitimize_address (rtx x,
>> +			  rtx oldx ATTRIBUTE_UNUSED,
>> +			  enum machine_mode mode ATTRIBUTE_UNUSED)
>> +{
>> +  /* 'mode' is the machine mode of memory operand
>> +     that uses 'x' as address.  */
>> +
>> +  return x;
>> +}
> 
> There's no need to define a dummy function here.  Just leave the default
> TARGET_LEGITIMIZE_ADDRESS in place.
> 

Fix it accordingly.

>> +  /* If fp$, $gp, $lp, and all callee-save registers are NOT required
> 
> Typo: $fp.  Another instance later.
> 

Fix it accordingly.

>> +  /* If the function is 'naked', we do not have to generate
>> +     epilogue code fragment BUT 'ret' instruction.  */
>> +  if (cfun->machine->naked_p)
>> +    {
>> +      /* Generate return instruction by using unspec_func_return pattern.
>> +         Make sure this instruction is after gen_blockage().
>> +         NOTE that $lp will become 'live'
>> +         after this instruction has been emitted.  */
>> +      emit_insn (gen_unspec_func_return ());
>> +      return;
>> +    }
> 
> FWIW, this is different from AVR (for one), which explicitly doesn't emit a
> return instruction.  One of the uses for AVR is to chain code together
> into an .init-like section.
> 

Yes.
Personally I prefer not to generate 'ret' instruction for a naked function.
Programmers have responsibility to take all the jobs and do everything
they need with c code, attribute, and inline asm.

But some of our existing customers also expect that a naked function
is supposed to imply "no push/pop" only.  Eventually we take their
requirement into account, having 'ret' produced by compiler automatically.

>> +  if (frame_pointer_needed)
>> +    {
>> +      /* adjust $sp = $fp - ($fp size) - ($gp size) - ($lp size)
>> +       *                  - (4 * callee-saved-registers)
>> +       *
>> +       * Note: No need to adjust
>> +       *       cfun->machine->callee_saved_area_padding_bytes,
>> +       *       because we want to adjust stack pointer
>> +       *       to the position for pop instruction.
>> +       */
> 
> There are a few comments that have this style, but GNU layout is the:
> 
>       /* ~~~~~~~~~~~~~~~
>          ~~~~~~~~~~~~~~~.  */
> 
> style you use in most places.
> 

Fix it accordingly.

>> +  /* Caculate the number of 1-bit of (~ival), if there is only one 1-bit,
> 
> Typo: Calculate
> 

Fix it accordingly.

>> +     it means the original ival has only one 0-bit,
>> +     So it is ok to perform 'bclr' operation.  */
>> +
>> +  one_bit_count = __builtin_popcount ((unsigned int) (~ival));
>> +
>> +  /* 'bclr' is a performance extension instruction.  */
>> +  return (TARGET_PERF_EXT && (one_bit_count == 1));
> 
> We shouldn't really call __builtin_popcount directly.  You could just
> use popcount_hwi, or the (x & -x) == x thing.
>

Thanks for the suggestion.  I use popcount_hwi in the revised patch.
 
> Redundant brackets around the ==.  There are a few other cases in the
> patch too, although most of it seems OK.
> 

Sorry I didn't see such requirement in GNU coding standards or
GCC coding convention.  If it is not defined in coding convention,
may I keep these redundant brackets for readability? :)

>> +      switch (mode_test)
>> +	{
>> +	case QImode:
>> +	  /* 333 format.  */
>> +	  if (val >=0 && val < 8 && regno < 8)
>> +	    return ADDRESS_LO_REG_IMM3U;
>> +	  break;
>> +
>> +	case HImode:
>> +	  /* 333 format.  */
>> +	  if (val >=0 && val < 16 && (val % 2 == 0) && regno < 8)
>> +	    return ADDRESS_LO_REG_IMM3U;
> 
> Missing space before "0".  A few other cases in the same function.
> 

Fix it accordingly.

>> +   The trampoline code for nds32 target must contains folloing parts:
> 
> Typo: following.
> 

Fix it accordingly.

> Thanks,
> Richard
> 


A revised patch for nds32.c and nds32.h is attached.
Thanks for all of your review comments! :)

Best regards,
jasonwucj


Attachment: 2-nds32-backend-md-part1.v3.revised.patch
Description: Text document


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]