[PATCH][4.6][ARM] New CPU support for Faraday cores

M.F. Wu mingfeng6865@gmail.com
Mon Dec 13 11:01:00 GMT 2010


Dear Ramana,

Thank you for your kindly help. I have modified according to
your suggestion. The following is my Changlog.

2010-12-02  Sanjin Liu  <scliu@faraday-tech.com>
	    Mingfeng Wu  <mingfeng@faraday-tech.com>

	* config/arm/arm-cores.def: Add Faraday CPU support -
	fa526,fa626,fa606te,fa626te,fmp626,fa726te.
	* config/arm/arm-tune.md: Regenerate.
	* config/arm/arm.c (arm_fa726te_tune): New.
	(fa726te_sched_adjust_cost): New.
	(arm_issue_rate): Handle fa726te.
	* config/arm/arm.md (generic_sched): Don't use Generic scheduler for
	Faraday cores.
	* config/arm/bpabi.h (TARGET_FIX_V4BX_SPEC): Handle fa526 and fa626.
	* config/arm/t-arm (MD_INCLUDES): Include machine description files for
	Faraday cores.
	* config/arm/t-arm-elf: Add multilib option for Faraday cores.
	* config/arm/t-linux-eabi: Add multilib option for Faraday cores except
	fa526 and fa626.
	* doc/invoke.texi: Document -mcpu for Faraday cores.
	* config/arm/fa526.md: New file.
	* config/arm/fa626.md: New file.
	* config/arm/fa606te.md: New file.
	* config/arm/fa626te.md: New file.
	* config/arm/fmp626.md: New file.
	* config/arm/fa726te.md: New file.



2010/12/9 Ramana Radhakrishnan <ramana.radhakrishnan@arm.com>:
> Hi Mingfeng
>
> Sorry about the late response and thanks for working through the issues.
> I've been off sick and only got back to looking at this today.
>
> Some minor nits in your changelog.
>
>
>> 2010-12-02  Sanjin Liu  <scliu@faraday-tech.com>
>>         Mingfeng Wu  <mingfeng@faraday-tech.com>
>>
>>       * config/arm/arm-cores.def: Add Faraday CPU support -
>>       fa526/fa626/fa606te/fa626te/fmp626/fa726te.
>
> ',' instead of '/' in the changelog entry. fa526, fa626 etc.
>
>>       * config/arm/arm-tune.md: Regenerate.
>>       * config/arm/arm.c (arm_fa726te_tune): New tune_params for fa726te
>
> It's enough to say New.
>

OK.

>>       (fa726te_sched_adjust_cost): New cost function for fa726te.
>
> Enough to say New.
>

OK.

>>       (arm_issue_rate): Add fa726te.
>
> s/Add/Handle
>

OK.

>>       * config/arm/arm.md (generic_sched): Add Faraday cores to generic_sched
>>       and include machine description files.
>
> Replace sentence with :
> Don't use Generic scheduler for Faraday cores.
>

Replaced.

>
>>       * config/arm/bpabi.h (TARGET_FIX_V4BX_SPEC): Add fa526 and fa626.
>
> s/Add/Handle
>

OK.

>
>> >
>> >> +(define_query_cpu_unit "fa726te_lsu1_pipe_e,fa726te_lsu1_pipe_w"
>> "fa726te")
>> >
>> > You have a query_cpu_unit which you don't seem to be querying for in
>> the backend in any form? Is there any thing else
>> > missing in your pipeline description or has this been put in for
>> future use ?
>> >
>>
>> I only use the units defined by query_cpu_unit in the fa726te.md. The
>> two units, fa726te_lsu1_pipe_e and
>> fa726te_lsu1_pipe_w, are only used for arrange the load instructions.
>> Because fa726te only supports one
>> ldr/str pipe, I use the query_cpu_unit to define another pseudo pipe
>> for better load instruction scheduling.
>
> The reason I asked why you used the define_cpu_unit vs
> define_query_cpu_unit was because there was no backend hook that queried
> for the cpu unit in question. IIRC define_cpu_unit and
> define_query_cpu_unit are more or less identical except for the fact
> that you can result in better minimization in one case than the other.
>
> However in this case the difference in the size of the automaton should
> be minimal considering it isn't a very big automaton in question.
>
>>
>> >> +;; reservation which blocks IS
>> >> +(define_reservation "fa726te_blockage" "(fa726te_is0
>> +fa726te_is1)")
>> >
>> > Can you clarify the comment above ? Again the comments about
>> sentence case and full stops hold.
>
>>
>>
>> It is used to restrict the instruction issue to one.
>
>
> Ok so make that explicit in the comment. Something like:
>
> " ;Reservation to restrict issue to 1.
>

Modified as suggested.

>
>
> Now on to your latest patch submission.
>
>
>
> Still a few formatting issues from your latest patch.
>
>
>> diff -uNr tmp/gcc-4.6-svn-167325-20101201/gcc/config/arm/fa606te.md gcc-4.6-svn-167325-20101201/gcc/config/arm/fa606te.md
>> --- tmp/gcc-4.6-svn-167325-20101201/gcc/config/arm/fa606te.md 1970-01-01 08:00:00.000000000 +0800
>> +++ gcc-4.6-svn-167325-20101201/gcc/config/arm/fa606te.md     2010-12-02 09:18:17.671515000 +0800
>>
> <...>
>
>>
>> +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
>> +;; Branch and Call Instructions
>> +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
>> +
>> +;; Branch instructions are difficult to model accurately.  The FA606TE
>> +;; core can predict most branches.  If the branch is predicted
>> +;; correctly, and predicted early enough, the branch can be completely
>> +;; eliminated from the instruction stream.  Some branches can
>> +;; therefore appear to require zero cycles to execute.  We assume that
>> +;; all branches are predicted correctly, and that the latency is
>> +;; therefore the minimum value.
>> +
>> +(define_insn_reservation "606te_branch_op" 0
>> + (and (eq_attr "tune" "fa606te")
>> +      (eq_attr "type" "branch"))
>> + "fa606te_core")
>> +
>> +;; The latency for a call is actually the latency when the result being available.
>> +;; i.e. R0 ready for int return value. For most cases, the return value is set by a
>                                        ^^^
>                                        2 spaces between '.' and start of next sentence.
>

OK.

>> diff -uNr tmp/gcc-4.6-svn-167325-20101201/gcc/config/arm/fa606te.md gcc-4.6-svn-167325-20101201/gcc/config/arm/fa606te.md
>> --- tmp/gcc-4.6-svn-167325-20101201/gcc/config/arm/fa606te.md 1970-01-01 08:00:00.000000000 +0800
>> +++ gcc-4.6-svn-167325-20101201/gcc/config/arm/fa606te.md     2010-12-02 09:18:17.671515000 +0800
>> @@ -0,0 +1,171 @@
>>
>> +;; The latency for a call is actually the latency when the result is available.
>> +;; i.e. R0 ready for int return value.
>                                         ^
>
> Very small nit. Remove trailing white-space.
>

OK.

>>
>> diff -uNr tmp/gcc-4.6-svn-167325-20101201/gcc/config/arm/fa726te.md gcc-4.6-svn-167325-20101201/gcc/config/arm/fa726te.md
>> --- tmp/gcc-4.6-svn-167325-20101201/gcc/config/arm/fa726te.md 1970-01-01 08:00:00.000000000 +0800
>> +++ gcc-4.6-svn-167325-20101201/gcc/config/arm/fa726te.md     2010-12-02 14:45:23.731365000 +0800
>
> <snip...>
>
>>
>> +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
>> +;; Pipelines
>> +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
>> +
>> +;;   The ALU pipeline has fetch, decode, execute, memory, and
>> +;;   write stages.  We only need to model the execute, memory and write
>> +;;   stages.
>> +
>> +;;   E1      E2      E3      E4      E5      WB
>> +;;______________________________________________________
>> +;;
>> +;;      <-------------- LD/ST ----------->
>> +;;    shifter + LU      <-- AU -->
>                                     ^ Trailing whitespace.
>

OK.

>> +;;      <-- AU -->     shifter + LU    CPSR     (Pipe 0)
>> +;;______________________________________________________
>> +;;
>> +;;      <---------- MUL --------->
>> +;;    shifter + LU      <-- AU -->
>                                     ^ Trailing whitespace.
>

OK.

>
> >From fa726te.md
>
>> +(define_bypass 1 "726te_alu_shift_op,726te_alu_shift_reg_op,726te_mult_op"
>                                                                             ^
> Trailing whitespace.
>
>> +                 "726te_alu_shift_op" "arm_no_early_alu_shift_dep")
>> +(define_bypass 1 "726te_alu_shift_op,726te_alu_shift_reg_op,726te_mult_op"
>                                                                             ^
> Trailing whitespace.
>

OK.

>
>> +;; The latency for a call is actually the latency when the result is available.
>> +;; i.e. R0 is ready for int return value.
>
> Likewise.
>

OK.

>
>>
>> diff -uNr tmp/gcc-4.6-svn-167325-20101201/gcc/config/arm/fmp626.md gcc-4.6-svn-167325-20101201/gcc/config/arm/fmp626.md
>> --- tmp/gcc-4.6-svn-167325-20101201/gcc/config/arm/fmp626.md  1970-01-01 08:00:00.000000000 +0800
>> +++ gcc-4.6-svn-167325-20101201/gcc/config/arm/fmp626.md      2010-12-02 09:18:17.687514000 +0800
> <snip>
>
>> +;; Pipeline architecture
>> +;;   S       E       M       W(Q1)   Q2
>> +;;   ___________________________________________
>> +;;    shifter alu
>                    ^^^ Multiple trailing whitespaces.
>

OK.

>
>
>> diff -uNr tmp/gcc-4.6-svn-167325-20101201/gcc/config/arm/bpabi.h gcc-4.6-svn-167325-20101201/gcc/config/arm/bpabi.h
>> --- tmp/gcc-4.6-svn-167325-20101201/gcc/config/arm/bpabi.h    2010-12-01 18:48:48.000000000 +0800
>> +++ gcc-4.6-svn-167325-20101201/gcc/config/arm/bpabi.h        2010-12-02 09:18:17.660518000 +0800
>> @@ -52,7 +52,7 @@
>>  /* The BPABI integer comparison routines return { -1, 0, 1 }.  */
>>  #define TARGET_LIB_INT_CMP_BIASED !TARGET_BPABI
>>
>> -#define TARGET_FIX_V4BX_SPEC " %{mcpu=arm8|mcpu=arm810|mcpu=strongarm*|march=armv4:--fix-v4bx}"
>> +#define TARGET_FIX_V4BX_SPEC " %{mcpu=arm8|mcpu=arm810|mcpu=strongarm*|march=armv4|mcpu=fa526|mcpu=fa626:--fix-v4bx}"
>>
>
> Exceeds the 80 character per line limit.  Can you split this across multiple lines ?
>
> Something like
> #define TARGET_FIX_V4BX_SPEC " %{mcpu=arm8|mcpu=arm810|mcpu=strongarm*\
> |march=armv4|mcpu=fa526|mcpu=fa626:--fix-v4bx}"
>
> should do the trick.
>

OK. Split it into two lines.

> However this isn't an approval since I can't approve or reject your patch.
>
>
> cheers
> Ramana
>
>
>
> On Thu, 2010-12-02 at 16:27 +0800, M.F. Wu wrote:
>> Dear Ramana,
>>
>> Thank you for your comments about the patch.
>> The patch has been modified as the attached
>> file shows.
>>
>> The Changlog:
>>
>> 2010-12-02  Sanjin Liu  <scliu@faraday-tech.com>
>>         Mingfeng Wu  <mingfeng@faraday-tech.com>
>>
>>       * config/arm/arm-cores.def: Add Faraday CPU support -
>>       fa526/fa626/fa606te/fa626te/fmp626/fa726te.
>>       * config/arm/arm-tune.md: Regenerate.
>>       * config/arm/arm.c (arm_fa726te_tune): New tune_params for fa726te
>>       (fa726te_sched_adjust_cost): New cost function for fa726te.
>>       (arm_issue_rate): Add fa726te.
>>       * config/arm/arm.md (generic_sched): Add Faraday cores to generic_sched
>>       and include machine description files.
>>       * config/arm/bpabi.h (TARGET_FIX_V4BX_SPEC): Add fa526 and fa626.
>>       * config/arm/t-arm (MD_INCLUDES): Include machine description files for
>>       Faraday cores.
>>       * config/arm/t-arm-elf: Add multilib option for Faraday cores.
>>       * config/arm/t-linux-eabi: Add multilib option for Faraday cores except
>>       fa526 and fa626.
>>       * doc/invoke.texi: Document -mcpu for Faraday cores.
>>       * config/arm/fa526.md: New file.
>>       * config/arm/fa626.md: New file.
>>       * config/arm/fa606te.md: New file.
>>       * config/arm/fa626te.md: New file.
>>       * config/arm/fmp626.md: New file.
>>       * config/arm/fa726te.md: New file.
>>
>>
>>
>> 2010/11/30 Ramana Radhakrishnan <ramana.radhakrishnan@arm.com>:
>> > Hi Mingfeng,
>> >
>> > Thanks for making these changes.
>> >
>> > Please do not make the Changelog a part of the final patch. Please make
>> > this a part of your final mail submission and not a part of your patch.
>> >
>> >
>> >> @@ -7913,6 +7921,36 @@
>> >
>> > <...>
>> >
>> >> +      /* Use of carry (e.g. 64-bit arithmetic) in ALU: 3-cycle latency */
>> >
>> > Full stop at the end of comment followed by 2 spaces before end of comment.
>> > Can you please audit your patch to check for these issues ?
>> >
>> > This is true for comments in the machine description parts of your patch as well.
>> >
>>
>> Fixed.
>>
>> >
>> >> +      if (get_attr_conds(insn)  == CONDS_USE &&
>> >> +          get_attr_type(insn) != TYPE_BRANCH)
>> >> +        {
>> >> +          *cost = 3;
>> >> +          return false;
>> >> +        }
>> >
>> > Space between function name and paranthesis. Thus it should be
>> > get_attr_conds (insn) and not get_attr_conds(insn) as above.
>> >
>> >> +
>> >> +      if (GET_CODE (PATTERN (insn)) == COND_EXEC
>> >> +          || get_attr_conds(insn)  == CONDS_USE)
>> >> +        {
>> >> +          *cost = 0;
>> >> +          return false;
>> >> +        }
>> >> +    }
>> >
>> > Likewise.
>> >
>>
>> Fixed.
>>
>> >
>> >> diff -uNr tmp/gcc-4.6-svn-20101116/gcc/config/arm/fa526.md
>> >> >> gcc-4.6-svn-20101116/gcc/config/arm/fa526.md
>> >> >> > --- tmp/gcc-4.6-svn-20101116/gcc/config/arm/fa526.md  1970-01-01
>> >> >> 08:00:00.000000000 +0800
>> >> >> > +++ gcc-4.6-svn-20101116/gcc/config/arm/fa526.md      2010-11-23
>> >> >> 14:36:17.916371000 +0800
>> >>
>> >> >> +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
>> >> >> > +;; Branch and Call Instructions
>> >> >> >
>> >> >> +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
>> >> >> > +
>> >
>> >> > And equivalently in all the other pipeline descriptions.
>> >> >
>> >>
>> >> Sorry, I don't understand exactly what you mean...
>> >
>> >
>> >> +;; Branch instructions are difficult to model accurately.  The ARM
>> >> +;; core can predict most branches.  If the branch is predicted
>> >>
>> > In the sentence above -
>> > Replace ARM with FA526 and similarly in the other pipeline descriptions.
>> >
>> > The FA526 core isn't one made by ARM and am not sure if you can use the name
>> > in that regard here :)
>> >
>>
>> OK, you are right. I have modified the "ARM" to our core.
>>
>> >> > ;
>> >> >
>> >> > +const struct tune_params arm_fa726te_tune =
>> >> > +{
>> >> > +  arm_9e_rtx_costs,
>> >> > +  fa726te_sched_adjust_cost,
>> >> > +  1
>> >> > +};
>> >> > +
>> >
>> > This part of your patch is now out-of-date thanks to Ian Bolton's latest commits in that area with respect
>> > to preloads. You might want to consider that in your final submission. I suppose using the defaults
>> > and turning off preloads at O3 would be the correct thing to do to get your patch sheperded through.
>> >
>>
>> OK. I modified the arm_fa726te_tune.
>>
>> const struct tune_params arm_fa726te_tune =
>> {
>>   arm_9e_rtx_costs,
>>   fa726te_sched_adjust_cost,
>>   1,
>>   ARM_PREFETCH_NOT_BENEFICIAL
>> };
>>
>>
>> >>
>> >> --- tmp/gcc-4.6-svn-20101116/gcc/config/arm/fa726te.md        1970-01-01 08:00:00.000000000 +0800
>> >> +++ gcc-4.6-svn-20101116/gcc/config/arm/fa726te.md    2010-11-25 17:06:01.877554000 +0800
>> >> @@ -0,0 +1,221 @@
>> >>
>> >> +(define_automaton "fa726te")
>> >> +(automata_option "ndfa")
>> >> +
>> >
>> > Why do you have an ndfa option here? Does this give you benefit with benchmarking on the FA726te core since this usually increases compile time
>> > as the automaton ends up searching for all possible options ?
>> >
>>
>> Yes. the ndfa option does benefit our benchmarking, but a little. So I
>> remove the ndfa option here.
>>
>> >
>> >> +;; pretend we have 2 LSUs (the second is ONLY for LDR), which can possibly
>> >> +;; improve code quality
>> >
>> > Full stop at the end of the comment. pretend should start with a capital P and not lower case. (Pretend)
>> > 2 spaces between the a full stop or a punctuation character that terminates a sentence and the start of
>> > the next sentence. There are a number of places in your patch where one can see such cases.
>> >
>>
>> Fixed.
>>
>> >
>> >> +(define_query_cpu_unit "fa726te_lsu1_pipe_e,fa726te_lsu1_pipe_w" "fa726te")
>> >
>> > You have a query_cpu_unit which you don't seem to be querying for in the backend in any form? Is there any thing else
>> > missing in your pipeline description or has this been put in for future use ?
>> >
>>
>> I only use the units defined by query_cpu_unit in the fa726te.md. The
>> two units, fa726te_lsu1_pipe_e and
>> fa726te_lsu1_pipe_w, are only used for arrange the load instructions.
>> Because fa726te only supports one
>> ldr/str pipe, I use the query_cpu_unit to define another pseudo pipe
>> for better load instruction scheduling.
>>
>> >> +;; reservation which blocks IS
>> >> +(define_reservation "fa726te_blockage" "(fa726te_is0+fa726te_is1)")
>> >
>> > Can you clarify the comment above ? Again the comments about sentence case and full stops hold.
>> >
>>
>> It is used to restrict the instruction issue to one.
>>
>> >
>> >
>> > cheers
>> > Ramana
>> >
>> >
>> >
>> >
>> >
>> >
>
>
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: gcc-4.6-svn-167325-faraday-cpu-support.patch
Type: application/octet-stream
Size: 46701 bytes
Desc: not available
URL: <http://gcc.gnu.org/pipermail/gcc-patches/attachments/20101213/a349514b/attachment.obj>


More information about the Gcc-patches mailing list