This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[ARM] Scheduling VFP/NEON to core transfers (ping, RFC)


Hi,

Whilst looking at upstreaming some of CodeSourcery/Mentor's old
patches, I came across the following again, originally by Mark Shinwell
but last sent upstream by Andrew Stubbs:

http://gcc.gnu.org/ml/gcc-patches/2011-02/msg01431.html

I've attached a new version, re-tested but not re-benchmarked (I'm not
set up to do the latter just at the moment).

I spent a while being quite unconvinced that this patch was a
reasonable solution for the problem (of scheduling transfers between the
NEON or VFP and the core unit) -- either on Cortex-A8, which the patch
was originally supposed to target, or any other core version. So, this
is what I came up with to try to convince myself.

Handling of these transfers is unfortunately dealt with rather
haphazardly on various cores. I only looked in any detail at Cortex-A8
and Cortex-A9. FAOD, I think we're principally concerned here with
instructions like "movdi_vfp", which may be used to transfer either
"VFP" or "NEON" register values to and from core registers, but also
with movsf and friends for the "runfast"-mode case.

* On Cortex-A8, instructions with "f_2_r" (nor "r_2_f") type do not
  participate in scheduling at all. This was probably the root of the
  problem the patch was trying to fix originally: this means that
  there is nothing to stop instructions which use the vfp/neon-to-core
  result from being placed directly after the transfer. (A scheduler
  dump shows "nothing" for the unit utilisation for these types of
  instruction at present.)

* On Cortex-A9, the "f_2_r" and "r_2_f" types both use the (core) insn
  reservation "cortex_a9_fps", which produces its result after 2
  cycles. The A9 TRM on infocentre.arm.com says that VFP-to-core
  transfers take 3 cycles, and NEON-to-core transfers take 11 cycles.

* The cortex-r4f.md and cortex-m4-fpu.md scheduler descriptions seem to
  handle f_2_r and r_2_f sanely.

So, there are generally multiple DFA descriptions in play at any given
time, relating to different core units with different sets of registers,
and transferring values between those cores isn't really being
described properly to GCC's scheduler in a consistent way at present.

Fixing this "properly", e.g. fully describing the way the NEON
unit is decoupled from the core unit on the Cortex-A8, is probably
impractical. But, the only cases we're concerned with at present are
transfers of values from one unit to another. I think setting both
"type" and "neon_type" for transfer instructions is therefore the
right thing to do -- although given the above, and other restrictions
(i.e. the mutual exclusion between VFP and NEON units on Cortex-A9),
it's probably not a complete solution to the problem.

To expand:

* "type" should be set to f_2_r for MRC-type instructions because of
  the dependency on an earlier instruction setting the input register.

* "neon_type" should be set appropriately for MRC-type instructions
  because following NEON instructions will have dependencies on the
  transferred value.

* ...and vice-versa for MCR-type instructions.

Incidentally other patches, probably merged from our source base, have
introduced "neon_type" alongside "type" for a couple of instructions in
vfp.md already: namely, *movdi_vfp and *movhf_vfp_neon: whatever we
choose to do, we should be consistent.

So: OK to apply?

Thanks,

Julian

2011-02-22  Andrew Stubbs  <ams@codesourcery.com>
	    Mark Shinwell  <shinwell@codesourcery.com>
            Julian Brown  <julian@codesourcery.com>

    gcc/
    * config/arm/vfp.md (*arm_movsi_vfp, *thumb2_movsi_vfp)
    (*movdi_vfp_cortexa8, *movsf_vfp, *thumb2_movsf_vfp)
    (*movdf_vfp, *thumb2_movdf_vfp, *movsfcc_vfp)
    (*thumb2_movsfcc_vfp, *movdfcc_vfp, *thumb2_movdfcc_vfp): Add
    neon_type.
    * config/arm/arm.md (neon_type): Update comment.
commit 44d6297b8e73eae42b3d6a0da7ca857f549049c1
Author: Julian Brown <jbrown@build6-lucid-cs.sje.mentorg.com>
Date:   Fri Jul 13 10:32:46 2012 -0700

    Set neon_type for vfp-to-core transfers.

diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 8888382..e9da56d 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -397,8 +397,6 @@
 (define_attr "ldsched" "no,yes" (const (symbol_ref "arm_ld_sched")))
 
 ;; Classification of NEON instructions for scheduling purposes.
-;; Do not set this attribute and the "type" attribute together in
-;; any one instruction pattern.
 (define_attr "neon_type"
    "neon_int_1,\
    neon_int_2,\
diff --git a/gcc/config/arm/vfp.md b/gcc/config/arm/vfp.md
index 8369949..4a56d57 100644
--- a/gcc/config/arm/vfp.md
+++ b/gcc/config/arm/vfp.md
@@ -78,6 +78,7 @@
   "
   [(set_attr "predicable" "yes")
    (set_attr "type" "*,*,*,*,load1,store1,r_2_f,f_2_r,fcpys,f_loads,f_stores")
+   (set_attr "neon_type" "*,*,*,*,*,*,neon_mcr,neon_mrc,neon_vmov,*,*")
    (set_attr "insn" "mov,mov,mvn,mov,*,*,*,*,*,*,*")
    (set_attr "pool_range"     "*,*,*,*,4096,*,*,*,*,1020,*")
    (set_attr "neg_pool_range" "*,*,*,*,4084,*,*,*,*,1008,*")]
@@ -120,6 +121,7 @@
   "
   [(set_attr "predicable" "yes")
    (set_attr "type" "*,*,*,*,load1,load1,store1,store1,r_2_f,f_2_r,fcpys,f_loads,f_stores")
+   (set_attr "neon_type" "*,*,*,*,*,*,*,*,neon_mcr,neon_mrc,neon_vmov,*,*")
    (set_attr "insn" "mov,mov,mvn,mov,*,*,*,*,*,*,*,*,*")
    (set_attr "pool_range"     "*,*,*,*,1020,4096,*,*,*,*,*,1020,*")
    (set_attr "neg_pool_range" "*,*,*,*,   0,   0,*,*,*,*,*,1008,*")]
@@ -212,6 +214,7 @@
     }
   "
   [(set_attr "type" "*,*,*,*,load2,load2,store2,r_2_f,f_2_r,ffarithd,f_loadd,f_stored")
+   (set_attr "neon_type" "*,*,*,*,*,*,*,neon_mcr_2_mcrr,neon_mrrc,neon_vmov,*,*")
    (set (attr "length") (cond [(eq_attr "alternative" "1") (const_int 8)
                                (eq_attr "alternative" "2") (const_int 12)
                                (eq_attr "alternative" "3") (const_int 16)
@@ -370,6 +373,7 @@
   [(set_attr "predicable" "yes")
    (set_attr "type"
      "r_2_f,f_2_r,fconsts,f_loads,f_stores,load1,store1,fcpys,*")
+   (set_attr "neon_type" "neon_mcr,neon_mrc,*,*,*,*,*,neon_vmov,*")
    (set_attr "insn" "*,*,*,*,*,*,*,*,mov")
    (set_attr "pool_range" "*,*,*,1020,*,4096,*,*,*")
    (set_attr "neg_pool_range" "*,*,*,1008,*,4080,*,*,*")]
@@ -407,6 +411,7 @@
   [(set_attr "predicable" "yes")
    (set_attr "type"
      "r_2_f,f_2_r,fconsts,f_loads,f_stores,load1,store1,fcpys,*")
+   (set_attr "neon_type" "neon_mcr,neon_mrc,*,*,*,*,*,neon_vmov,*")
    (set_attr "insn" "*,*,*,*,*,*,*,*,mov")
    (set_attr "pool_range" "*,*,*,1020,*,4092,*,*,*")
    (set_attr "neg_pool_range" "*,*,*,1008,*,0,*,*,*")]
@@ -450,6 +455,7 @@
   "
   [(set_attr "type"
      "r_2_f,f_2_r,fconstd,f_loadd,f_stored,load2,store2,ffarithd,*")
+   (set_attr "neon_type" "neon_mcr_2_mcrr,neon_mrrc,*,*,*,*,*,neon_vmov,*")
    (set (attr "length") (cond [(eq_attr "alternative" "5,6,8") (const_int 8)
 			       (eq_attr "alternative" "7")
 				(if_then_else
@@ -493,6 +499,7 @@
   "
   [(set_attr "type"
      "r_2_f,f_2_r,fconstd,f_loadd,f_stored,load2,store2,ffarithd,*")
+   (set_attr "neon_type" "neon_mcr_2_mcrr,neon_mrrc,*,*,*,*,*,neon_vmov,*")
    (set (attr "length") (cond [(eq_attr "alternative" "5,6,8") (const_int 8)
 			       (eq_attr "alternative" "7")
 				(if_then_else
@@ -527,7 +534,8 @@
    fmrs%D3\\t%0, %2\;fmrs%d3\\t%0, %1"
    [(set_attr "conds" "use")
     (set_attr "length" "4,4,8,4,4,8,4,4,8")
-    (set_attr "type" "fcpys,fcpys,fcpys,r_2_f,r_2_f,r_2_f,f_2_r,f_2_r,f_2_r")]
+    (set_attr "type" "fcpys,fcpys,fcpys,r_2_f,r_2_f,r_2_f,f_2_r,f_2_r,f_2_r")
+    (set_attr "neon_type" "neon_vmov,neon_vmov,neon_vmov,neon_mcr,neon_mcr,neon_mcr,neon_mrc,neon_mrc,neon_mrc")]
 )
 
 (define_insn "*thumb2_movsfcc_vfp"
@@ -550,7 +558,8 @@
    ite\\t%D3\;fmrs%D3\\t%0, %2\;fmrs%d3\\t%0, %1"
    [(set_attr "conds" "use")
     (set_attr "length" "6,6,10,6,6,10,6,6,10")
-    (set_attr "type" "fcpys,fcpys,fcpys,r_2_f,r_2_f,r_2_f,f_2_r,f_2_r,f_2_r")]
+    (set_attr "type" "fcpys,fcpys,fcpys,r_2_f,r_2_f,r_2_f,f_2_r,f_2_r,f_2_r")
+    (set_attr "neon_type" "neon_vmov,neon_vmov,neon_vmov,neon_mcr,neon_mcr,neon_mcr,neon_mrc,neon_mrc,neon_mrc")]
 )
 
 (define_insn "*movdfcc_vfp"
@@ -573,7 +582,8 @@
    fmrrd%D3\\t%Q0, %R0, %P2\;fmrrd%d3\\t%Q0, %R0, %P1"
    [(set_attr "conds" "use")
     (set_attr "length" "4,4,8,4,4,8,4,4,8")
-    (set_attr "type" "ffarithd,ffarithd,ffarithd,r_2_f,r_2_f,r_2_f,f_2_r,f_2_r,f_2_r")]
+    (set_attr "type" "ffarithd,ffarithd,ffarithd,r_2_f,r_2_f,r_2_f,f_2_r,f_2_r,f_2_r")
+    (set_attr "neon_type" "neon_vmov,neon_vmov,neon_vmov,neon_mcr_2_mcrr,neon_mcr_2_mcrr,neon_mcr_2_mcrr,neon_mrrc,neon_mrrc,neon_mrrc")]
 )
 
 (define_insn "*thumb2_movdfcc_vfp"
@@ -596,7 +606,8 @@
    ite\\t%D3\;fmrrd%D3\\t%Q0, %R0, %P2\;fmrrd%d3\\t%Q0, %R0, %P1"
    [(set_attr "conds" "use")
     (set_attr "length" "6,6,10,6,6,10,6,6,10")
-    (set_attr "type" "ffarithd,ffarithd,ffarithd,r_2_f,r_2_f,r_2_f,f_2_r,f_2_r,f_2_r")]
+    (set_attr "type" "ffarithd,ffarithd,ffarithd,r_2_f,r_2_f,r_2_f,f_2_r,f_2_r,f_2_r")
+    (set_attr "neon_type" "neon_vmov,neon_vmov,neon_vmov,neon_mcr_2_mcrr,neon_mcr_2_mcrr,neon_mcr_2_mcrr,neon_mrrc,neon_mrrc,neon_mrrc")]
 )
 
 

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]