This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

RE: [PATCH] MIPS: Add support for P6600


Robert Suchanek <Robert.Suchanek@mips.com> writes:
> The below adds support for -march=p6600.  It includes
> a new scheduler plus performance tweaks.
> 
> gcc/ChangeLog:
> 
> 2018-06-01  Matthew Fortune  <matthew.fortune@mips.com>
>             Prachi Godbole  <prachi.godbole@imgtec.com>
> 	* config/mips/mips-cpus.def: Define P6600.
> 	* config/mips/mips-tables.opt: Regenerate.
> 	* config/mips/mips.c (mips_multipass_dfa_lookahead): Add P6600.
> 	* config/mips/mips.h (TUNE_P6600): New define.
> 	(MIPS_ISA_LEVEL_SPEC): Infer mips64r6 from p6600.
> 	* doc/invoke.texi: Add p6600 to supported architectures.

It seems that mips.com has no longer got any architecture specific
documents available for either i6400/i6500 (which did exist until
recently) nor p6600. There isn't anything particularly unusual
about i6400/i6500 support but this patch for p6600 attempts to deal
with some undocumented micro-architecture details that are not
sufficiently described in the patch to maintain over time nor is
there reference material available.

Despite the fact that I am one of the original authors of the
work, I can't accept this upstream without further information.

Some notes on the patch are below:

> diff --git a/gcc/config/mips/mips.c b/gcc/config/mips/mips.c
> index bfe64bb..9c77c13 100644
> --- a/gcc/config/mips/mips.c
> +++ b/gcc/config/mips/mips.c
> @@ -198,6 +198,15 @@ enum mips_address_type {
>    ADDRESS_SYMBOLIC
>  };
> 
> +/* Classifies an unconditional branch of interest for the P6600.  */
> +
> +enum mips_ucbranch_type {

Newline for brace.

> +  /* May not even be a branch.  */
> +  UC_UNDEFINED,
> +  UC_BALC,
> +  UC_OTHER
> +};
> +
>  /* Macros to create an enumeration identifier for a function
> prototype.  */
>  #define MIPS_FTYPE_NAME1(A, B) MIPS_##A##_FTYPE_##B
>  #define MIPS_FTYPE_NAME2(A, B, C) MIPS_##A##_FTYPE_##B##_##C
> @@ -1127,6 +1136,19 @@ static const struct mips_rtx_cost_data
>      COSTS_N_INSNS (36),           /* int_div_di */
>  		    2,            /* branch_cost */
>  		    4             /* memory_latency */
> +  },
> +  { /* P6600 */
> +    COSTS_N_INSNS (4),            /* fp_add */
> +    COSTS_N_INSNS (5),            /* fp_mult_sf */
> +    COSTS_N_INSNS (5),            /* fp_mult_df */
> +    COSTS_N_INSNS (17),           /* fp_div_sf */
> +    COSTS_N_INSNS (17),           /* fp_div_df */
> +    COSTS_N_INSNS (5),            /* int_mult_si */
> +    COSTS_N_INSNS (5),            /* int_mult_di */
> +    COSTS_N_INSNS (8),            /* int_div_si */
> +    COSTS_N_INSNS (8),            /* int_div_di */
> +		    2,            /* branch_cost */
> +		    4             /* memory_latency */
>    }
>  };
> 
> 
> @@ -14592,6 +14614,7 @@ mips_issue_rate (void)
>      case PROCESSOR_LOONGSON_2F:
>      case PROCESSOR_LOONGSON_3A:
>      case PROCESSOR_P5600:
> +    case PROCESSOR_P6600:
>        return 4;
> 
>      case PROCESSOR_XLP:
> @@ -14727,7 +14750,7 @@ mips_multipass_dfa_lookahead (void)
>    if (TUNE_OCTEON)
>      return 2;
> 
> -  if (TUNE_P5600 || TUNE_I6400)
> +  if (TUNE_P5600 || TUNE_P6600 || TUNE_I6400)
>      return 4;
> 
>    return 0;
> @@ -18647,6 +18670,28 @@ mips_orphaned_high_part_p (mips_offset_table
> *htab, rtx_insn *insn)
>    return false;
>  }
> 
> +/* Subroutine of mips_avoid_hazard.  We classify unconditional
> branches
> +   of interest for the P6600 for performance reasons.  We're
> interested
> +   in differentiating BALC from JIC, JIALC and BC.  */
> +
> +static enum mips_ucbranch_type
> +mips_classify_branch_p6600 (rtx_insn *insn)
> +{
> +  if (!(insn
> +      && USEFUL_INSN_P (insn)
> +      && GET_CODE (PATTERN (insn)) != SEQUENCE))
> +    return UC_UNDEFINED;

Let's switch this around to the following with a comment to say
ignore sequences because they represent a filled delay slot
branch (which presumably is not affected by the uArch issue).

  if (insn
      || !USEFUL_INSN_P (insn)
      || GET_CODE (PATTERN (insn)) == SEQUENCE))
    return UC_UNDEFINED;

> +
> +  if (get_attr_jal (insn) == JAL_INDIRECT /* JIC and JIALC.  */
> +      || get_attr_type (insn) == TYPE_JUMP) /* BC as it is a loose
> jump.  */
> +    return UC_OTHER;

I don't recall what a loose jump was supposed to refer to, presumably
'direct jump'. 

>  /* Subroutine of mips_reorg_process_insns.  If there is a hazard
> between
>     INSN and a previous instruction, avoid it by inserting nops after
>     instruction AFTER.
> @@ -18699,14 +18744,36 @@ mips_avoid_hazard (rtx_insn *after, rtx_insn
> *insn, int *hilo_delay,
>  	   && GET_CODE (pattern) != ASM_INPUT
>  	   && asm_noperands (pattern) < 0)
>      nops = 1;
> +  /* The P6600's branch predictor does not handle certain static
> +     sequences of back-to-back jumps well.  Inserting a no-op only
> +     costs space as the dispatch unit will disregard the nop.
> +     Here we handle the cases of back to back unconditional branches
> +     that are inefficent.  */
> +  else if (TUNE_P6600 && TARGET_CB_MAYBE && !optimize_size
> +	   && ((mips_classify_branch_p6600 (after) == UC_BALC
> +		&& mips_classify_branch_p6600 (insn) == UC_OTHER)
> +	       || (mips_classify_branch_p6600 (insn) == UC_BALC
> +		   && (mips_classify_branch_p6600 (after) == UC_OTHER))))

Unnecessary braces on the last clause here.

> +    nops = 1;

This appears to show that a BALC with a compact branch (other than BALC)
is the problem case either before or after the BALC. This is what we
need to see in a public document. Or at least (re)confirmed as the issue.

>    else
>      nops = 0;
> 
>    /* Insert the nops between this instruction and the previous one.
>       Each new nop takes us further from the last hilo hazard.  */
>    *hilo_delay += nops;
> +
> +  /* If we're tuning for the P6600, we come across an annoying GCC
> +     assumption that debug information always follows a call.  Move
> +     past any debug information in that case.  */
> +  rtx_insn *real_after = after;
> +  if (real_after && nops && CALL_P (real_after))
> +    while (TUNE_P6600 && real_after
> +	   && (NOTE_P (NEXT_INSN (real_after))
> +	       || BARRIER_P (NEXT_INSN (real_after))))
> +      real_after = NEXT_INSN (real_after);
> +

This looks OK but it seems unnecessary to limit to the P6600.  It is
just that we have not had hazards on branches before now.

>    while (nops-- > 0)
> -    emit_insn_after (gen_hazard_nop (), after);
> +    emit_insn_after (gen_hazard_nop (), real_after);
> 
>    /* Set up the state for the next instruction.  */
>    *hilo_delay += ninsns;
> @@ -18716,6 +18783,14 @@ mips_avoid_hazard (rtx_insn *after, rtx_insn
> *insn, int *hilo_delay,
>      switch (get_attr_hazard (insn))
>        {
>        case HAZARD_NONE:
> +	/* For the P6600, flag some unconditional branches as having a
> +	   pseudo-forbidden slot.  This will cause additional nop
> insertion
> +	   or SEQUENCE breaking as required.  */
> +	if (TUNE_P6600
> +	    && !optimize_size
> +	    && TARGET_CB_MAYBE
> +	    && mips_classify_branch_p6600 (insn) == UC_OTHER)
> +	  *fs_delay = true;

This appears to be saying that all unconditional branches (except
BALC) will introduce a performance penalty if followed by any other
non-delay/non-forbidden slot instruction.

>  	break;
> 
>        case HAZARD_FORBIDDEN_SLOT:
> @@ -18957,7 +19032,10 @@ mips_reorg_process_insns (void)
>  		     sequence and replace it with the delay slot instruction
>  		     then the jump to clear the forbidden slot hazard.  */
> 
> -		  if (fs_delay)
> +		  if (fs_delay || (TUNE_P6600
> +				   && TARGET_CB_MAYBE
> +				   && mips_classify_branch_p6600 (insn)
> +				      == UC_BALC))

This seems to be accounting for a BALC followed by delay slot branch
but more of a hint than a guarantee as it is going to convert the
delay slot branch to a compact branch which presumably is 'better'
for the P6600.

The implied rules from the patch seem to be:

BALC -> JIC/JIALC/BC               == BAD - insert NOP to fix
JIC/JIALC/BC -> BALC               == BAD - insert NOP to fix
JIC/JIALC/BC -> non-DS-INSN        == BAD - treat as forbidden slot
BALC -> BALC                       == OK
BALC -> Delay slot branch          == BAD - undo delay slot to fix
BALC -> Conditional compact branch == OK

>  		    {
>  		      /* Search onwards from the current position looking
> for
>  			 a SEQUENCE.  We are looking for pipeline hazards
here
> diff --git a/gcc/config/mips/p6600.md b/gcc/config/mips/p6600.md
> new file mode 100644
> index 0000000..17144a1
> --- /dev/null
> +++ b/gcc/config/mips/p6600.md
> @@ -0,0 +1,342 @@
> +;; DFA-based pipeline description for P6600.
> +;;
> +;; Copyright (C) 2007-2015 Free Software Foundation, Inc.

Copyright year should just be 2018 for first submission of a new file.

The scheduler references to fminmax and fclass types that do not yet
exist in trunk.

This patch is dependent on public information becoming available about
the uArch branch prediction rules that are implemented in this patch so
that the logic can be correctly maintained in the future.

Thanks,
Matthew


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]