[ARM] Scheduling VFP/NEON to core transfers (ping, RFC)

Richard Earnshaw rearnsha@arm.com
Tue Jul 17 16:26:00 GMT 2012


On 17/07/12 15:14, Julian Brown wrote:
> Hi,
> 
> Whilst looking at upstreaming some of CodeSourcery/Mentor's old
> patches, I came across the following again, originally by Mark Shinwell
> but last sent upstream by Andrew Stubbs:
> 
> http://gcc.gnu.org/ml/gcc-patches/2011-02/msg01431.html
> 
> I've attached a new version, re-tested but not re-benchmarked (I'm not
> set up to do the latter just at the moment).
> 
> I spent a while being quite unconvinced that this patch was a
> reasonable solution for the problem (of scheduling transfers between the
> NEON or VFP and the core unit) -- either on Cortex-A8, which the patch
> was originally supposed to target, or any other core version. So, this
> is what I came up with to try to convince myself.
> 
> Handling of these transfers is unfortunately dealt with rather
> haphazardly on various cores. I only looked in any detail at Cortex-A8
> and Cortex-A9. FAOD, I think we're principally concerned here with
> instructions like "movdi_vfp", which may be used to transfer either
> "VFP" or "NEON" register values to and from core registers, but also
> with movsf and friends for the "runfast"-mode case.
> 
> * On Cortex-A8, instructions with "f_2_r" (nor "r_2_f") type do not
>   participate in scheduling at all. This was probably the root of the
>   problem the patch was trying to fix originally: this means that
>   there is nothing to stop instructions which use the vfp/neon-to-core
>   result from being placed directly after the transfer. (A scheduler
>   dump shows "nothing" for the unit utilisation for these types of
>   instruction at present.)
> 
> * On Cortex-A9, the "f_2_r" and "r_2_f" types both use the (core) insn
>   reservation "cortex_a9_fps", which produces its result after 2
>   cycles. The A9 TRM on infocentre.arm.com says that VFP-to-core
>   transfers take 3 cycles, and NEON-to-core transfers take 11 cycles.
> 
> * The cortex-r4f.md and cortex-m4-fpu.md scheduler descriptions seem to
>   handle f_2_r and r_2_f sanely.
> 
> So, there are generally multiple DFA descriptions in play at any given
> time, relating to different core units with different sets of registers,
> and transferring values between those cores isn't really being
> described properly to GCC's scheduler in a consistent way at present.
> 
> Fixing this "properly", e.g. fully describing the way the NEON
> unit is decoupled from the core unit on the Cortex-A8, is probably
> impractical. But, the only cases we're concerned with at present are
> transfers of values from one unit to another. I think setting both
> "type" and "neon_type" for transfer instructions is therefore the
> right thing to do -- although given the above, and other restrictions
> (i.e. the mutual exclusion between VFP and NEON units on Cortex-A9),
> it's probably not a complete solution to the problem.
> 
> To expand:
> 
> * "type" should be set to f_2_r for MRC-type instructions because of
>   the dependency on an earlier instruction setting the input register.
> 
> * "neon_type" should be set appropriately for MRC-type instructions
>   because following NEON instructions will have dependencies on the
>   transferred value.
> 
> * ...and vice-versa for MCR-type instructions.
> 
> Incidentally other patches, probably merged from our source base, have
> introduced "neon_type" alongside "type" for a couple of instructions in
> vfp.md already: namely, *movdi_vfp and *movhf_vfp_neon: whatever we
> choose to do, we should be consistent.
> 
> So: OK to apply?
> 

This is OK.

I'll note for the record here, that I think having type, insn and
neon_type attributes all trying to do the same thing (that is, classify
the insn for scheduling purposes) is broken.  The correct fix, long
term, is to simplify these into one 'type' attribute that is then rich
enough to handle all the different classifications.  Subdividing a
classification should be done by splitting an existing class into two
entries in the same attribute, not adding some parallel alternative
attribute for the sub-division -- that way lies chaos.

However, that's rather a lot of work and this patch won't make that
problem any harder.

R.

> Thanks,
> 
> Julian
> 
> 2011-02-22  Andrew Stubbs  <ams@codesourcery.com>
> 	    Mark Shinwell  <shinwell@codesourcery.com>
>             Julian Brown  <julian@codesourcery.com>
> 
>     gcc/
>     * config/arm/vfp.md (*arm_movsi_vfp, *thumb2_movsi_vfp)
>     (*movdi_vfp_cortexa8, *movsf_vfp, *thumb2_movsf_vfp)
>     (*movdf_vfp, *thumb2_movdf_vfp, *movsfcc_vfp)
>     (*thumb2_movsfcc_vfp, *movdfcc_vfp, *thumb2_movdfcc_vfp): Add
>     neon_type.
>     * config/arm/arm.md (neon_type): Update comment.
> 
> 
> neon-type-vfplite-to-core-1.diff
> 
> 
> commit 44d6297b8e73eae42b3d6a0da7ca857f549049c1
> Author: Julian Brown <jbrown@build6-lucid-cs.sje.mentorg.com>
> Date:   Fri Jul 13 10:32:46 2012 -0700
> 
>     Set neon_type for vfp-to-core transfers.
> 
> diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
> index 8888382..e9da56d 100644
> --- a/gcc/config/arm/arm.md
> +++ b/gcc/config/arm/arm.md
> @@ -397,8 +397,6 @@
>  (define_attr "ldsched" "no,yes" (const (symbol_ref "arm_ld_sched")))
>  
>  ;; Classification of NEON instructions for scheduling purposes.
> -;; Do not set this attribute and the "type" attribute together in
> -;; any one instruction pattern.
>  (define_attr "neon_type"
>     "neon_int_1,\
>     neon_int_2,\
> diff --git a/gcc/config/arm/vfp.md b/gcc/config/arm/vfp.md
> index 8369949..4a56d57 100644
> --- a/gcc/config/arm/vfp.md
> +++ b/gcc/config/arm/vfp.md
> @@ -78,6 +78,7 @@
>    "
>    [(set_attr "predicable" "yes")
>     (set_attr "type" "*,*,*,*,load1,store1,r_2_f,f_2_r,fcpys,f_loads,f_stores")
> +   (set_attr "neon_type" "*,*,*,*,*,*,neon_mcr,neon_mrc,neon_vmov,*,*")
>     (set_attr "insn" "mov,mov,mvn,mov,*,*,*,*,*,*,*")
>     (set_attr "pool_range"     "*,*,*,*,4096,*,*,*,*,1020,*")
>     (set_attr "neg_pool_range" "*,*,*,*,4084,*,*,*,*,1008,*")]
> @@ -120,6 +121,7 @@
>    "
>    [(set_attr "predicable" "yes")
>     (set_attr "type" "*,*,*,*,load1,load1,store1,store1,r_2_f,f_2_r,fcpys,f_loads,f_stores")
> +   (set_attr "neon_type" "*,*,*,*,*,*,*,*,neon_mcr,neon_mrc,neon_vmov,*,*")
>     (set_attr "insn" "mov,mov,mvn,mov,*,*,*,*,*,*,*,*,*")
>     (set_attr "pool_range"     "*,*,*,*,1020,4096,*,*,*,*,*,1020,*")
>     (set_attr "neg_pool_range" "*,*,*,*,   0,   0,*,*,*,*,*,1008,*")]
> @@ -212,6 +214,7 @@
>      }
>    "
>    [(set_attr "type" "*,*,*,*,load2,load2,store2,r_2_f,f_2_r,ffarithd,f_loadd,f_stored")
> +   (set_attr "neon_type" "*,*,*,*,*,*,*,neon_mcr_2_mcrr,neon_mrrc,neon_vmov,*,*")
>     (set (attr "length") (cond [(eq_attr "alternative" "1") (const_int 8)
>                                 (eq_attr "alternative" "2") (const_int 12)
>                                 (eq_attr "alternative" "3") (const_int 16)
> @@ -370,6 +373,7 @@
>    [(set_attr "predicable" "yes")
>     (set_attr "type"
>       "r_2_f,f_2_r,fconsts,f_loads,f_stores,load1,store1,fcpys,*")
> +   (set_attr "neon_type" "neon_mcr,neon_mrc,*,*,*,*,*,neon_vmov,*")
>     (set_attr "insn" "*,*,*,*,*,*,*,*,mov")
>     (set_attr "pool_range" "*,*,*,1020,*,4096,*,*,*")
>     (set_attr "neg_pool_range" "*,*,*,1008,*,4080,*,*,*")]
> @@ -407,6 +411,7 @@
>    [(set_attr "predicable" "yes")
>     (set_attr "type"
>       "r_2_f,f_2_r,fconsts,f_loads,f_stores,load1,store1,fcpys,*")
> +   (set_attr "neon_type" "neon_mcr,neon_mrc,*,*,*,*,*,neon_vmov,*")
>     (set_attr "insn" "*,*,*,*,*,*,*,*,mov")
>     (set_attr "pool_range" "*,*,*,1020,*,4092,*,*,*")
>     (set_attr "neg_pool_range" "*,*,*,1008,*,0,*,*,*")]
> @@ -450,6 +455,7 @@
>    "
>    [(set_attr "type"
>       "r_2_f,f_2_r,fconstd,f_loadd,f_stored,load2,store2,ffarithd,*")
> +   (set_attr "neon_type" "neon_mcr_2_mcrr,neon_mrrc,*,*,*,*,*,neon_vmov,*")
>     (set (attr "length") (cond [(eq_attr "alternative" "5,6,8") (const_int 8)
>  			       (eq_attr "alternative" "7")
>  				(if_then_else
> @@ -493,6 +499,7 @@
>    "
>    [(set_attr "type"
>       "r_2_f,f_2_r,fconstd,f_loadd,f_stored,load2,store2,ffarithd,*")
> +   (set_attr "neon_type" "neon_mcr_2_mcrr,neon_mrrc,*,*,*,*,*,neon_vmov,*")
>     (set (attr "length") (cond [(eq_attr "alternative" "5,6,8") (const_int 8)
>  			       (eq_attr "alternative" "7")
>  				(if_then_else
> @@ -527,7 +534,8 @@
>     fmrs%D3\\t%0, %2\;fmrs%d3\\t%0, %1"
>     [(set_attr "conds" "use")
>      (set_attr "length" "4,4,8,4,4,8,4,4,8")
> -    (set_attr "type" "fcpys,fcpys,fcpys,r_2_f,r_2_f,r_2_f,f_2_r,f_2_r,f_2_r")]
> +    (set_attr "type" "fcpys,fcpys,fcpys,r_2_f,r_2_f,r_2_f,f_2_r,f_2_r,f_2_r")
> +    (set_attr "neon_type" "neon_vmov,neon_vmov,neon_vmov,neon_mcr,neon_mcr,neon_mcr,neon_mrc,neon_mrc,neon_mrc")]
>  )
>  
>  (define_insn "*thumb2_movsfcc_vfp"
> @@ -550,7 +558,8 @@
>     ite\\t%D3\;fmrs%D3\\t%0, %2\;fmrs%d3\\t%0, %1"
>     [(set_attr "conds" "use")
>      (set_attr "length" "6,6,10,6,6,10,6,6,10")
> -    (set_attr "type" "fcpys,fcpys,fcpys,r_2_f,r_2_f,r_2_f,f_2_r,f_2_r,f_2_r")]
> +    (set_attr "type" "fcpys,fcpys,fcpys,r_2_f,r_2_f,r_2_f,f_2_r,f_2_r,f_2_r")
> +    (set_attr "neon_type" "neon_vmov,neon_vmov,neon_vmov,neon_mcr,neon_mcr,neon_mcr,neon_mrc,neon_mrc,neon_mrc")]
>  )
>  
>  (define_insn "*movdfcc_vfp"
> @@ -573,7 +582,8 @@
>     fmrrd%D3\\t%Q0, %R0, %P2\;fmrrd%d3\\t%Q0, %R0, %P1"
>     [(set_attr "conds" "use")
>      (set_attr "length" "4,4,8,4,4,8,4,4,8")
> -    (set_attr "type" "ffarithd,ffarithd,ffarithd,r_2_f,r_2_f,r_2_f,f_2_r,f_2_r,f_2_r")]
> +    (set_attr "type" "ffarithd,ffarithd,ffarithd,r_2_f,r_2_f,r_2_f,f_2_r,f_2_r,f_2_r")
> +    (set_attr "neon_type" "neon_vmov,neon_vmov,neon_vmov,neon_mcr_2_mcrr,neon_mcr_2_mcrr,neon_mcr_2_mcrr,neon_mrrc,neon_mrrc,neon_mrrc")]
>  )
>  
>  (define_insn "*thumb2_movdfcc_vfp"
> @@ -596,7 +606,8 @@
>     ite\\t%D3\;fmrrd%D3\\t%Q0, %R0, %P2\;fmrrd%d3\\t%Q0, %R0, %P1"
>     [(set_attr "conds" "use")
>      (set_attr "length" "6,6,10,6,6,10,6,6,10")
> -    (set_attr "type" "ffarithd,ffarithd,ffarithd,r_2_f,r_2_f,r_2_f,f_2_r,f_2_r,f_2_r")]
> +    (set_attr "type" "ffarithd,ffarithd,ffarithd,r_2_f,r_2_f,r_2_f,f_2_r,f_2_r,f_2_r")
> +    (set_attr "neon_type" "neon_vmov,neon_vmov,neon_vmov,neon_mcr_2_mcrr,neon_mcr_2_mcrr,neon_mcr_2_mcrr,neon_mrrc,neon_mrrc,neon_mrrc")]
>  )
>  
>  
> 






More information about the Gcc-patches mailing list