[PATCH][4.6][ARM] New CPU support for Faraday cores

Ramana Radhakrishnan ramana.radhakrishnan@arm.com
Thu Dec 9 14:49:00 GMT 2010


Hi Mingfeng

Sorry about the late response and thanks for working through the issues. 
I've been off sick and only got back to looking at this today.

Some minor nits in your changelog.


> 2010-12-02  Sanjin Liu  <scliu@faraday-tech.com>
> 	  Mingfeng Wu  <mingfeng@faraday-tech.com>
> 
> 	* config/arm/arm-cores.def: Add Faraday CPU support -
> 	fa526/fa626/fa606te/fa626te/fmp626/fa726te.

',' instead of '/' in the changelog entry. fa526, fa626 etc.

> 	* config/arm/arm-tune.md: Regenerate.
> 	* config/arm/arm.c (arm_fa726te_tune): New tune_params for fa726te

It's enough to say New.
	
> 	(fa726te_sched_adjust_cost): New cost function for fa726te.

Enough to say New.

> 	(arm_issue_rate): Add fa726te.

s/Add/Handle

> 	* config/arm/arm.md (generic_sched): Add Faraday cores to generic_sched
> 	and include machine description files.

Replace sentence with :
Don't use Generic scheduler for Faraday cores. 


> 	* config/arm/bpabi.h (TARGET_FIX_V4BX_SPEC): Add fa526 and fa626.

s/Add/Handle


> >
> >> +(define_query_cpu_unit "fa726te_lsu1_pipe_e,fa726te_lsu1_pipe_w"
> "fa726te")
> >
> > You have a query_cpu_unit which you don't seem to be querying for in
> the backend in any form? Is there any thing else
> > missing in your pipeline description or has this been put in for
> future use ?
> >
> 
> I only use the units defined by query_cpu_unit in the fa726te.md. The
> two units, fa726te_lsu1_pipe_e and
> fa726te_lsu1_pipe_w, are only used for arrange the load instructions.
> Because fa726te only supports one
> ldr/str pipe, I use the query_cpu_unit to define another pseudo pipe
> for better load instruction scheduling.

The reason I asked why you used the define_cpu_unit vs
define_query_cpu_unit was because there was no backend hook that queried
for the cpu unit in question. IIRC define_cpu_unit and
define_query_cpu_unit are more or less identical except for the fact
that you can result in better minimization in one case than the other. 

However in this case the difference in the size of the automaton should
be minimal considering it isn't a very big automaton in question. 

> 
> >> +;; reservation which blocks IS
> >> +(define_reservation "fa726te_blockage" "(fa726te_is0
> +fa726te_is1)")
> >
> > Can you clarify the comment above ? Again the comments about
> sentence case and full stops hold.

> 
> 
> It is used to restrict the instruction issue to one.


Ok so make that explicit in the comment. Something like: 

" ;Reservation to restrict issue to 1.



Now on to your latest patch submission.



Still a few formatting issues from your latest patch.

 
> diff -uNr tmp/gcc-4.6-svn-167325-20101201/gcc/config/arm/fa606te.md gcc-4.6-svn-167325-20101201/gcc/config/arm/fa606te.md
> --- tmp/gcc-4.6-svn-167325-20101201/gcc/config/arm/fa606te.md	1970-01-01 08:00:00.000000000 +0800
> +++ gcc-4.6-svn-167325-20101201/gcc/config/arm/fa606te.md	2010-12-02 09:18:17.671515000 +0800
> 
<...>

> 
> +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
> +;; Branch and Call Instructions
> +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
> +
> +;; Branch instructions are difficult to model accurately.  The FA606TE
> +;; core can predict most branches.  If the branch is predicted
> +;; correctly, and predicted early enough, the branch can be completely
> +;; eliminated from the instruction stream.  Some branches can
> +;; therefore appear to require zero cycles to execute.  We assume that
> +;; all branches are predicted correctly, and that the latency is
> +;; therefore the minimum value.
> +
> +(define_insn_reservation "606te_branch_op" 0
> + (and (eq_attr "tune" "fa606te")
> +      (eq_attr "type" "branch"))
> + "fa606te_core")
> +
> +;; The latency for a call is actually the latency when the result being available.
> +;; i.e. R0 ready for int return value. For most cases, the return value is set by a
					^^^
					2 spaces between '.' and start of next sentence.

> diff -uNr tmp/gcc-4.6-svn-167325-20101201/gcc/config/arm/fa606te.md gcc-4.6-svn-167325-20101201/gcc/config/arm/fa606te.md
> --- tmp/gcc-4.6-svn-167325-20101201/gcc/config/arm/fa606te.md	1970-01-01 08:00:00.000000000 +0800
> +++ gcc-4.6-svn-167325-20101201/gcc/config/arm/fa606te.md	2010-12-02 09:18:17.671515000 +0800
> @@ -0,0 +1,171 @@
> 
> +;; The latency for a call is actually the latency when the result is available.
> +;; i.e. R0 ready for int return value. 
					 ^	

Very small nit. Remove trailing white-space. 

> 
> diff -uNr tmp/gcc-4.6-svn-167325-20101201/gcc/config/arm/fa726te.md gcc-4.6-svn-167325-20101201/gcc/config/arm/fa726te.md
> --- tmp/gcc-4.6-svn-167325-20101201/gcc/config/arm/fa726te.md	1970-01-01 08:00:00.000000000 +0800
> +++ gcc-4.6-svn-167325-20101201/gcc/config/arm/fa726te.md	2010-12-02 14:45:23.731365000 +0800

<snip...>

> 
> +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
> +;; Pipelines
> +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
> +
> +;;   The ALU pipeline has fetch, decode, execute, memory, and
> +;;   write stages.  We only need to model the execute, memory and write
> +;;   stages.
> +
> +;;	E1	E2	E3	E4	E5	WB
> +;;______________________________________________________
> +;;
> +;;      <-------------- LD/ST ----------->
> +;;    shifter + LU      <-- AU --> 
				     ^ Trailing whitespace.

> +;;      <-- AU -->     shifter + LU    CPSR     (Pipe 0)
> +;;______________________________________________________
> +;;
> +;;      <---------- MUL --------->
> +;;    shifter + LU      <-- AU --> 
				     ^ Trailing whitespace.


>From fa726te.md 

> +(define_bypass 1 "726te_alu_shift_op,726te_alu_shift_reg_op,726te_mult_op" 
									     ^
Trailing whitespace.

> +                 "726te_alu_shift_op" "arm_no_early_alu_shift_dep")
> +(define_bypass 1 "726te_alu_shift_op,726te_alu_shift_reg_op,726te_mult_op" 
									     ^
Trailing whitespace.


> +;; The latency for a call is actually the latency when the result is available.
> +;; i.e. R0 is ready for int return value. 

Likewise.


> 
> diff -uNr tmp/gcc-4.6-svn-167325-20101201/gcc/config/arm/fmp626.md gcc-4.6-svn-167325-20101201/gcc/config/arm/fmp626.md
> --- tmp/gcc-4.6-svn-167325-20101201/gcc/config/arm/fmp626.md	1970-01-01 08:00:00.000000000 +0800
> +++ gcc-4.6-svn-167325-20101201/gcc/config/arm/fmp626.md	2010-12-02 09:18:17.687514000 +0800
<snip>

> +;; Pipeline architecture
> +;;	S	E	M	W(Q1)	Q2
> +;;   ___________________________________________
> +;;    shifter alu    
		    ^^^ Multiple trailing whitespaces. 



> diff -uNr tmp/gcc-4.6-svn-167325-20101201/gcc/config/arm/bpabi.h gcc-4.6-svn-167325-20101201/gcc/config/arm/bpabi.h
> --- tmp/gcc-4.6-svn-167325-20101201/gcc/config/arm/bpabi.h	2010-12-01 18:48:48.000000000 +0800
> +++ gcc-4.6-svn-167325-20101201/gcc/config/arm/bpabi.h	2010-12-02 09:18:17.660518000 +0800
> @@ -52,7 +52,7 @@
>  /* The BPABI integer comparison routines return { -1, 0, 1 }.  */
>  #define TARGET_LIB_INT_CMP_BIASED !TARGET_BPABI
>  
> -#define TARGET_FIX_V4BX_SPEC " %{mcpu=arm8|mcpu=arm810|mcpu=strongarm*|march=armv4:--fix-v4bx}"
> +#define TARGET_FIX_V4BX_SPEC " %{mcpu=arm8|mcpu=arm810|mcpu=strongarm*|march=armv4|mcpu=fa526|mcpu=fa626:--fix-v4bx}"
> 

Exceeds the 80 character per line limit.  Can you split this across multiple lines ?

Something like
#define TARGET_FIX_V4BX_SPEC " %{mcpu=arm8|mcpu=arm810|mcpu=strongarm*\
|march=armv4|mcpu=fa526|mcpu=fa626:--fix-v4bx}"

should do the trick.

However this isn't an approval since I can't approve or reject your patch.


cheers
Ramana



On Thu, 2010-12-02 at 16:27 +0800, M.F. Wu wrote: 
> Dear Ramana,
> 
> Thank you for your comments about the patch.
> The patch has been modified as the attached
> file shows.
> 
> The Changlog:
> 
> 2010-12-02  Sanjin Liu  <scliu@faraday-tech.com>
> 	  Mingfeng Wu  <mingfeng@faraday-tech.com>
> 
> 	* config/arm/arm-cores.def: Add Faraday CPU support -
> 	fa526/fa626/fa606te/fa626te/fmp626/fa726te.
> 	* config/arm/arm-tune.md: Regenerate.
> 	* config/arm/arm.c (arm_fa726te_tune): New tune_params for fa726te
> 	(fa726te_sched_adjust_cost): New cost function for fa726te.
> 	(arm_issue_rate): Add fa726te.
> 	* config/arm/arm.md (generic_sched): Add Faraday cores to generic_sched
> 	and include machine description files.
> 	* config/arm/bpabi.h (TARGET_FIX_V4BX_SPEC): Add fa526 and fa626.
> 	* config/arm/t-arm (MD_INCLUDES): Include machine description files for
> 	Faraday cores.
> 	* config/arm/t-arm-elf: Add multilib option for Faraday cores.
> 	* config/arm/t-linux-eabi: Add multilib option for Faraday cores except
> 	fa526 and fa626.
> 	* doc/invoke.texi: Document -mcpu for Faraday cores.
> 	* config/arm/fa526.md: New file.
> 	* config/arm/fa626.md: New file.
> 	* config/arm/fa606te.md: New file.
> 	* config/arm/fa626te.md: New file.
> 	* config/arm/fmp626.md: New file.
> 	* config/arm/fa726te.md: New file.
> 
> 
> 
> 2010/11/30 Ramana Radhakrishnan <ramana.radhakrishnan@arm.com>:
> > Hi Mingfeng,
> >
> > Thanks for making these changes.
> >
> > Please do not make the Changelog a part of the final patch. Please make
> > this a part of your final mail submission and not a part of your patch.
> >
> >
> >> @@ -7913,6 +7921,36 @@
> >
> > <...>
> >
> >> +      /* Use of carry (e.g. 64-bit arithmetic) in ALU: 3-cycle latency */
> >
> > Full stop at the end of comment followed by 2 spaces before end of comment.
> > Can you please audit your patch to check for these issues ?
> >
> > This is true for comments in the machine description parts of your patch as well.
> >
> 
> Fixed.
> 
> >
> >> +      if (get_attr_conds(insn)  == CONDS_USE &&
> >> +          get_attr_type(insn) != TYPE_BRANCH)
> >> +        {
> >> +          *cost = 3;
> >> +          return false;
> >> +        }
> >
> > Space between function name and paranthesis. Thus it should be
> > get_attr_conds (insn) and not get_attr_conds(insn) as above.
> >
> >> +
> >> +      if (GET_CODE (PATTERN (insn)) == COND_EXEC
> >> +          || get_attr_conds(insn)  == CONDS_USE)
> >> +        {
> >> +          *cost = 0;
> >> +          return false;
> >> +        }
> >> +    }
> >
> > Likewise.
> >
> 
> Fixed.
> 
> >
> >> diff -uNr tmp/gcc-4.6-svn-20101116/gcc/config/arm/fa526.md
> >> >> gcc-4.6-svn-20101116/gcc/config/arm/fa526.md
> >> >> > --- tmp/gcc-4.6-svn-20101116/gcc/config/arm/fa526.md  1970-01-01
> >> >> 08:00:00.000000000 +0800
> >> >> > +++ gcc-4.6-svn-20101116/gcc/config/arm/fa526.md      2010-11-23
> >> >> 14:36:17.916371000 +0800
> >>
> >> >> +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
> >> >> > +;; Branch and Call Instructions
> >> >> >
> >> >> +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
> >> >> > +
> >
> >> > And equivalently in all the other pipeline descriptions.
> >> >
> >>
> >> Sorry, I don't understand exactly what you mean...
> >
> >
> >> +;; Branch instructions are difficult to model accurately.  The ARM
> >> +;; core can predict most branches.  If the branch is predicted
> >>
> > In the sentence above -
> > Replace ARM with FA526 and similarly in the other pipeline descriptions.
> >
> > The FA526 core isn't one made by ARM and am not sure if you can use the name
> > in that regard here :)
> >
> 
> OK, you are right. I have modified the "ARM" to our core.
> 
> >> > ;
> >> >
> >> > +const struct tune_params arm_fa726te_tune =
> >> > +{
> >> > +  arm_9e_rtx_costs,
> >> > +  fa726te_sched_adjust_cost,
> >> > +  1
> >> > +};
> >> > +
> >
> > This part of your patch is now out-of-date thanks to Ian Bolton's latest commits in that area with respect
> > to preloads. You might want to consider that in your final submission. I suppose using the defaults
> > and turning off preloads at O3 would be the correct thing to do to get your patch sheperded through.
> >
> 
> OK. I modified the arm_fa726te_tune.
> 
> const struct tune_params arm_fa726te_tune =
> {
>   arm_9e_rtx_costs,
>   fa726te_sched_adjust_cost,
>   1,
>   ARM_PREFETCH_NOT_BENEFICIAL
> };
> 
> 
> >>
> >> --- tmp/gcc-4.6-svn-20101116/gcc/config/arm/fa726te.md        1970-01-01 08:00:00.000000000 +0800
> >> +++ gcc-4.6-svn-20101116/gcc/config/arm/fa726te.md    2010-11-25 17:06:01.877554000 +0800
> >> @@ -0,0 +1,221 @@
> >>
> >> +(define_automaton "fa726te")
> >> +(automata_option "ndfa")
> >> +
> >
> > Why do you have an ndfa option here? Does this give you benefit with benchmarking on the FA726te core since this usually increases compile time
> > as the automaton ends up searching for all possible options ?
> >
> 
> Yes. the ndfa option does benefit our benchmarking, but a little. So I
> remove the ndfa option here.
> 
> >
> >> +;; pretend we have 2 LSUs (the second is ONLY for LDR), which can possibly
> >> +;; improve code quality
> >
> > Full stop at the end of the comment. pretend should start with a capital P and not lower case. (Pretend)
> > 2 spaces between the a full stop or a punctuation character that terminates a sentence and the start of
> > the next sentence. There are a number of places in your patch where one can see such cases.
> >
> 
> Fixed.
> 
> >
> >> +(define_query_cpu_unit "fa726te_lsu1_pipe_e,fa726te_lsu1_pipe_w" "fa726te")
> >
> > You have a query_cpu_unit which you don't seem to be querying for in the backend in any form? Is there any thing else
> > missing in your pipeline description or has this been put in for future use ?
> >
> 
> I only use the units defined by query_cpu_unit in the fa726te.md. The
> two units, fa726te_lsu1_pipe_e and
> fa726te_lsu1_pipe_w, are only used for arrange the load instructions.
> Because fa726te only supports one
> ldr/str pipe, I use the query_cpu_unit to define another pseudo pipe
> for better load instruction scheduling.
> 
> >> +;; reservation which blocks IS
> >> +(define_reservation "fa726te_blockage" "(fa726te_is0+fa726te_is1)")
> >
> > Can you clarify the comment above ? Again the comments about sentence case and full stops hold.
> >
> 
> It is used to restrict the instruction issue to one.
> 
> >
> >
> > cheers
> > Ramana
> >
> >
> >
> >
> >
> >




More information about the Gcc-patches mailing list