[PATCH][mpost]: automatic multi-target compilation

Joern Rennecke amylaar@spamcop.net
Thu Jul 16 12:45:00 GMT 2009


This patch allows to vectorize loops for a different target than the main
compilation target, and automatically initiate DMA to copy the input arrays
to the vector target and DMA the results back.

I have tested this with a simple test file vloop.c, which is the
second attachment to this email.
I've configured the compiler on gcc11.fsffrance.org with the options:
--target=arc-elf32 --with-extra-target-list=mxp-elf --with-headers  
--with-newlib --with-mpfr=/opt/cfarm/mpfr-2.4.1
The purpose of the build was not to get a fully working toolchain yet,  
but just
a working cc1.

I've compiled the test file with:
./cc1 -O2 -ftree-vectorize vloop.c -fdump-tree-all  
-ftree-vectorizer-verbose=9 -msimd

I've attached the compiler output vloop.s as the third attachement to this
email.
The assembler output templates for the dma in / out and target call
are just assembler comments so far; I gather this part is irrelevant for the
review of the tree optimizer patches, or if you want to try to make this work
for other target tuple sets, and I don't want to delay the patch review
unnecessarily any further.

There is certainly still a lot of work to be done to make this truly useful
for common application code, but that goes beyond the scope of the
current milepost project, and I think such work is best done incrementally
on a branch.
-------------- next part --------------
2009-07-16  J"orn Rennecke  <joern.rennecke@arc.com>

	* targhooks.c (default_common_data_with_target): New function.
	(default_get_pmode): Likewise.
	* targhooks.h (default_common_data_with_target): Declare.
	(default_get_pmode): Likewise.
	* tree.c (build2_stat): Use targetm.sizetype.
	(build_pointer_type_for_mode): Use *targetm.ptr_mode.
	(get_get_name_decl): New function.
	* tree.h (enum omp_clause_schedule_kind): New value
	OMP_CLAUSE_SCHEDULE_MASTER.
	(tree sizetype_tab): Now target specific.
	(get_get_name_decl): Declare.
	(lookup_attr_target): Declare.
	* target.h (struct gimple_stmt_iterator_d): Forward delcaration.
	(struct gcc_target): New members get_pmode, sizetype_tab,
	common_data_with_target, copy_to_target, copy_from_target,
	build_call_on_target.
	* omp-low.c (expand_parallel_call): If child function has target_arch
	attribute, use targetm.build_call_on_target hook.
	(expand_omp_taskreg): Also check for
	gimple_omp_taskreg_data_arg (entry_stmt) being an INDIRECT_REF.
	(expand_numa_for_static_nochunk): New function.
	(expand_omp_for): Check for OMP_CLAUSE_SCHEDULE_MASTER.
	* toplev.c (lang_dependent_init) [!EXTRA_TARGET]:
	Do an EXTRA_TARGETS_CALL of initialize_sizetypes.
	(lang_dependent_init) [EXTRA_TARGET]: Fix up size_type_node.
	* tree-ssa-loop-ivopts.c (produce_memory_decl_rtl):
	Use *targetm.get_pmode.
	(computation_cost): Use tree_expand_expr.
	(force_expr_to_var_cost): Use targetm.sizetype.
	(rewrite_use_address ): Use tree_create_mem_ref.
	* expr.c [!EXTRA_TARGET] (tree_expand_expr): New function.
	expr.h (tree_expand_expr): Declare.
	* tree-parloops.c (separate_decls_in_region_name): New parameter
	new_target.  Changed all callers.
	(separate_decls_in_region_stmt): Likewise.
	(add_size_for_param_array): New function.
	(struct clsn_data): New members result_seq and loop.
	(create_loads_and_stores_for_name): If array contents have to be
	copied, insert statements to copy to/from the callee target.
	(separate_decls_in_region): Likewise.  Emit statements to allocate 
	parameter array area for this purpose.
	Change last parameter from usinged to loop.  Changed caller.
	(canonicalize_loop_ivs): Use sizetype for the callee target.
	(create_parallel_loop): If target has data memory separate from
	caller, use OMP_CLAUSE_SCHEDULE_MASTER.
	(gen_parallel_loop): Set targetm_pnt to the callee target during
	the canonicalize_loop_ivs call.
	* tree-ssa-address.c (target.h): Include.
	[!EXTRA_TARGET] (tree_mem_ref_addr, tree_create_mem_ref): New functions.
	* function.c (lookup_attr_target): New function, broken out of:
	(allocate_struct_function).
	* tree-affine.c (target.h): Include.
	(add_elt_to_tree): Use targetm.sizetype.
	(aff_combination_to_tree): Likewise.
	* target-def.h (TARGET_GET_PMODE): Define.
	(TARGET_COMMON_DATA_WITH_TARGET, TARGET_COPY_TO_TARGET): Likewise.
	(TARGET_COPY_FROM_TARGET, TARGET_BUILD_CALL_ON_TARGET): Likewise.
	(TARGET_INITIALIZER): Initialize new members.
	* tree-vect-transform.c (vect_decompose_addr_base_for_vector_ref):
	New function.
	(param_array_hash, param_array_eq): Likewise.
	(vect_create_data_ref_ptr): If target has data memory separate from
	caller, create hash table of parameter arrays with information on
	accesses.
	* cfgloop.h (struct tree_range, struct param_array_d): New struct.
	(param_array): New typedef.
	(struct loop): New members param_arrays, vect_vars.
	* tree-flow.h (tree_create_mem_ref): Declare.
	* gimple.h (struct gimple_stmt_iterator_d): New struct tag.
	* Makefile.in (tree-ssa-address.o): Depend on $(TARGET_H).
	(tree-affine.o): Likewise.
	* config/arc/predicates.md (simd_arg_vector): New predicate.
	* config/arc/arc.c (gimple.h, tree-flow.h): Include.
	(enum arc_builtins): Reduce value range.
	New values ARC_SIMD_BUILTIN_CALL, ARC_SIMD_BUILTIN_DMA_IN,
	ARC_SIMD_BUILTIN_DMA_OUT.
	New tag ARC_BUILTIN_END.
	(arc_copy_to_target, arc_copy_from_target): New functions.
	(arc_build_call_on_target): New function.
	(TARGET_COPY_TO_TARGET, TARGET_COPY_FROM_TARGET): Override.
	(TARGET_BUILD_CALL_ON_TARGET): Likewise.
	(arc_builtin_decls): New array.
	(def_mbuiltin): Update arc_builtin_decls.
	(arc_expand_builtin): Handle ARC_SIMD_BUILTIN_CALL.
	(enum simd_insn_args_type): Add void_Ra_Rb_Rc and void_Ra.
	(arc_simd_builtin_desc_list): Add simd_dma_in, simd_dma_out, simd_call.
	(arc_init_simd_builtins): Process void_Ra_Rb_Rc and void_Ra.
	(arc_expand_simd_builtin): Handle void_Ra_Rb_Rc and void_Ra.
	* config/arc/arc.h: (FIRST_PSEUDO_REGISTER): Change to 147.
	(FIXED_REGISTERS): SDM is fixed.
	(CALL_USED_REGISTERS): SDM is call used.
	(REGISTER_NAMES): Add SDM name.
	* config/arc/arc.md (SDM): Define as 146.
	(*movhi_insn): Add v/c alternative.
	* config/arc/t-arc ($(out_object_file)):
	Depend on $(GIMPLE_H) and $(TREE_FLOW_H).
	* config/arc/arc-modes.def: Add CC_BLK.
	* config/arc/simdext.md (UNSPEC_ARC_SIMD_DMA): Define.
	(simd_dma_in, simd_dma_out, simd_call): New patterns.

Index: targhooks.c
===================================================================
--- targhooks.c	(revision 148225)
+++ targhooks.c	(working copy)
@@ -874,6 +874,18 @@ default_vectype_for_scalar_type (tree sc
   return vectype;
 }
 
+bool
+default_common_data_with_target (struct gcc_target *other)
+{
+  return &this_targetm == other;
+}
+
+enum machine_mode
+default_get_pmode (void)
+{
+  return Pmode;
+}
+
 #include "gt-targhooks.h"
 
 END_TARGET_SPECIFIC
Index: targhooks.h
===================================================================
--- targhooks.h	(revision 148225)
+++ targhooks.h	(working copy)
@@ -119,6 +119,8 @@ extern int /*enum reg_class*/ default_se
 						secondary_reload_info *);
 extern bool default_override_options (bool);
 extern tree default_vectype_for_scalar_type (tree, FILE *);
+extern bool default_common_data_with_target (struct gcc_target *);
+extern enum machine_mode default_get_pmode (void);
 END_TARGET_SPECIFIC
 extern void hook_void_bitmap (bitmap);
 extern bool default_handle_c_option (size_t, const char *, int);
Index: tree.c
===================================================================
--- tree.c	(revision 148225)
+++ tree.c	(working copy)
@@ -3300,7 +3300,8 @@ build2_stat (enum tree_code code, tree t
   if (code == POINTER_PLUS_EXPR && arg0 && arg1 && tt)
     gcc_assert (POINTER_TYPE_P (tt) && POINTER_TYPE_P (TREE_TYPE (arg0))
 		&& INTEGRAL_TYPE_P (TREE_TYPE (arg1))
-		&& useless_type_conversion_p (sizetype, TREE_TYPE (arg1)));
+		&& useless_type_conversion_p (targetm.sizetype,
+					      TREE_TYPE (arg1)));
 
   t = make_node_stat (code PASS_MEM_STAT);
   TREE_TYPE (t) = tt;
@@ -5547,7 +5548,7 @@ build_pointer_type_for_mode (tree to_typ
 tree
 build_pointer_type (tree to_type)
 {
-  return build_pointer_type_for_mode (to_type, ptr_mode, false);
+  return build_pointer_type_for_mode (to_type, *targetm.ptr_mode, false);
 }
 
 /* Same as build_pointer_type_for_mode, but for REFERENCE_TYPE.  */
@@ -8984,6 +8985,28 @@ get_name (tree t)
     }
 }
 
+/* Return the declaration belonging to the return value of decl_name.  */
+tree
+get_get_name_decl (tree t)
+{
+  tree stripped_decl;
+
+  stripped_decl = t;
+  STRIP_NOPS (stripped_decl);
+  if (DECL_P (stripped_decl) && DECL_NAME (stripped_decl))
+    return stripped_decl;
+  else
+    {
+      switch (TREE_CODE (stripped_decl))
+	{
+	case ADDR_EXPR:
+	  return get_get_name_decl (TREE_OPERAND (stripped_decl, 0));
+	default:
+	  return NULL_TREE;
+	}
+    }
+}
+
 /* Return true if TYPE has a variable argument list.  */
 
 bool
Index: tree.h
===================================================================
--- tree.h	(revision 148225)
+++ tree.h	(working copy)
@@ -1807,7 +1807,9 @@ enum omp_clause_schedule_kind
   OMP_CLAUSE_SCHEDULE_DYNAMIC,
   OMP_CLAUSE_SCHEDULE_GUIDED,
   OMP_CLAUSE_SCHEDULE_AUTO,
-  OMP_CLAUSE_SCHEDULE_RUNTIME
+  OMP_CLAUSE_SCHEDULE_RUNTIME,
+  /* Used internally for NUMA targets to schedule on the main processor.  */
+  OMP_CLAUSE_SCHEDULE_MASTER
 };
 
 #define OMP_CLAUSE_SCHEDULE_KIND(NODE) \
@@ -4340,7 +4342,9 @@ enum size_type_kind
   SBITSIZETYPE,		/* Signed representation of sizes in bits.  */
   TYPE_KIND_LAST};
 
+START_TARGET_SPECIFIC
 extern GTY(()) tree sizetype_tab[(int) TYPE_KIND_LAST];
+END_TARGET_SPECIFIC
 
 #define sizetype sizetype_tab[(int) SIZETYPE]
 #define bitsizetype sizetype_tab[(int) BITSIZETYPE]
@@ -4717,6 +4721,7 @@ extern tree *call_expr_argp (tree, int);
 extern tree call_expr_arglist (tree);
 extern tree create_artificial_label (void);
 extern const char *get_name (tree);
+extern tree get_get_name_decl (tree);
 extern bool stdarg_p (tree);
 extern bool prototype_p (tree);
 extern int function_args_count (tree);
@@ -4980,6 +4985,7 @@ extern void expand_dummy_function_end (v
 extern unsigned int init_function_for_compilation (void);
 END_TARGET_SPECIFIC
 /* Allocate_struct_function uses targetm->name.  */
+extern int lookup_attr_target (tree);
 extern void allocate_struct_function (tree, bool);
 START_TARGET_SPECIFIC
 extern void push_struct_function (tree fndecl);
Index: target.h
===================================================================
--- target.h	(revision 148225)
+++ target.h	(working copy)
@@ -688,6 +688,8 @@ struct target_option_hooks
   bool (*override) (bool main_target);
 };
 
+struct gimple_stmt_iterator_d;
+
 /* ??? the use of the target vector makes it necessary to cast
    target-specific enums from/to int, since we expose the function
    signatures of target specific hooks that operate e.g. on enum reg_class
@@ -705,6 +707,11 @@ struct gcc_target
   /* Points to the ptr_mode variable for this target.  */
   enum machine_mode *ptr_mode;
 
+  enum machine_mode (*get_pmode) (void);
+
+  /* The sizetype table for this target.  */
+  tree *sizetype_tab;
+
   /* Functions that output assembler for the target.  */
   struct asm_out asm_out;
 
@@ -884,6 +891,21 @@ struct gcc_target
   /* Undo the effects of encode_section_info on the symbol string.  */
   const char * (* strip_name_encoding) (const char *);
 
+  /* Say if the target OTHER shares its data memory with this target.  */
+  bool (*common_data_with_target) (struct gcc_target *other);
+  /* Emit gimple to copy SIZE bytes from SRC on this target to DEST on
+     TARGET.  */
+  void (*copy_to_target) (struct gimple_stmt_iterator_d *,
+			  struct gcc_target *, tree, tree, tree);
+  /* Emit gimple to copy SIZE bytes from SRC on TARGET to DEST on this
+     target.  */
+  void (*copy_from_target) (struct gimple_stmt_iterator_d *,
+			    struct gcc_target *, tree, tree, tree);
+  /* Generate gimple for a call to fn with NARGS arguments ARGS
+     on target OTHER.  */
+  void (*build_call_on_target) (struct gimple_stmt_iterator_d *,
+				struct gcc_target *, int nargs, tree *args);
+
   /* If shift optabs for MODE are known to always truncate the shift count,
      return the mask that they apply.  Return 0 otherwise.  */
   unsigned HOST_WIDE_INT (* shift_truncation_mask) (enum machine_mode mode);
Index: omp-low.c
===================================================================
--- omp-low.c	(revision 148225)
+++ omp-low.c	(working copy)
@@ -2867,6 +2867,7 @@ expand_parallel_call (struct omp_region 
   gimple_stmt_iterator gsi;
   gimple stmt;
   int start_ix;
+  tree child_fn, attr;
 
   clauses = gimple_omp_parallel_clauses (entry_stmt);
 
@@ -2989,7 +2990,21 @@ expand_parallel_call (struct omp_region 
     t1 = null_pointer_node;
   else
     t1 = build_fold_addr_expr (t);
-  t2 = build_fold_addr_expr (gimple_omp_parallel_child_fn (entry_stmt));
+  child_fn = gimple_omp_parallel_child_fn (entry_stmt);
+  t2 = build_fold_addr_expr (child_fn);
+
+  attr = lookup_attribute ("target_arch", DECL_ATTRIBUTES (child_fn));
+  if (attr)
+    {
+      tree args[2];
+
+      args[0] = t2;
+      args[1] = force_gimple_operand_gsi (&gsi, t1, true, NULL_TREE, false,
+					  GSI_CONTINUE_LINKING);
+      struct gcc_target *tgt = targetm_array[lookup_attr_target (child_fn)];
+      targetm.build_call_on_target (&gsi, tgt, 2, args);
+      return;
+    }
 
   if (ws_args)
     {
@@ -3004,12 +3019,7 @@ expand_parallel_call (struct omp_region 
   force_gimple_operand_gsi (&gsi, t, true, NULL_TREE,
 			    false, GSI_CONTINUE_LINKING);
 
-  t = gimple_omp_parallel_data_arg (entry_stmt);
-  if (t == NULL)
-    t = null_pointer_node;
-  else
-    t = build_fold_addr_expr (t);
-  t = build_call_expr (gimple_omp_parallel_child_fn (entry_stmt), 1, t);
+  t = build_call_expr (child_fn, 1, t1);
   force_gimple_operand_gsi (&gsi, t, true, NULL_TREE,
 			    false, GSI_CONTINUE_LINKING);
 
@@ -3344,7 +3354,9 @@ expand_omp_taskreg (struct omp_region *r
 	 a function call that has been inlined, the original PARM_DECL
 	 .OMP_DATA_I may have been converted into a different local
 	 variable.  In which case, we need to keep the assignment.  */
-      if (gimple_omp_taskreg_data_arg (entry_stmt))
+      tree data_arg = gimple_omp_taskreg_data_arg (entry_stmt);
+
+      if (data_arg)
 	{
 	  basic_block entry_succ_bb = single_succ (entry_bb);
 	  gimple_stmt_iterator gsi;
@@ -3367,9 +3379,10 @@ expand_omp_taskreg (struct omp_region *r
 		  /* We're ignore the subcode because we're
 		     effectively doing a STRIP_NOPS.  */
 
-		  if (TREE_CODE (arg) == ADDR_EXPR
-		      && TREE_OPERAND (arg, 0)
-		        == gimple_omp_taskreg_data_arg (entry_stmt))
+		  if ((TREE_CODE (arg) == ADDR_EXPR
+		       && TREE_OPERAND (arg, 0) == data_arg)
+		      || (TREE_CODE (data_arg) == INDIRECT_REF
+			  && TREE_OPERAND (data_arg, 0) == arg))
 		    {
 		      parcopy_stmt = stmt;
 		      break;
@@ -4202,6 +4215,170 @@ expand_omp_for_static_nochunk (struct om
 			   recompute_dominator (CDI_DOMINATORS, fin_bb));
 }
 
+/* Like expand_omp_for_static_nochunk, but don't emit code for iteration
+   space partitioning - that is supposed to be done on the main processor.  */
+static void
+expand_numa_for_static_nochunk (struct omp_region *region,
+				struct omp_for_data *fd)
+{
+  tree n, q, s0, e0, e, t, nthreads, threadid;
+  tree type, itype, vmain, vback;
+  basic_block entry_bb, exit_bb, seq_start_bb, body_bb, cont_bb;
+  basic_block fin_bb;
+  gimple_stmt_iterator gsi;
+  gimple stmt;
+
+  itype = type = TREE_TYPE (fd->loop.v);
+  if (POINTER_TYPE_P (type))
+    itype = lang_hooks.types.type_for_size (TYPE_PRECISION (type), 0);
+
+  entry_bb = region->entry;
+  cont_bb = region->cont;
+  gcc_assert (EDGE_COUNT (entry_bb->succs) == 2);
+  gcc_assert (BRANCH_EDGE (entry_bb)->dest == FALLTHRU_EDGE (cont_bb)->dest);
+  seq_start_bb = split_edge (FALLTHRU_EDGE (entry_bb));
+  body_bb = single_succ (seq_start_bb);
+  gcc_assert (BRANCH_EDGE (cont_bb)->dest == body_bb);
+  gcc_assert (EDGE_COUNT (cont_bb->succs) == 2);
+  fin_bb = FALLTHRU_EDGE (cont_bb)->dest;
+  exit_bb = region->exit;
+
+  /* Iteration space partitioning goes in ENTRY_BB.  */
+  gsi = gsi_last_bb (entry_bb);
+  gcc_assert (gimple_code (gsi_stmt (gsi)) == GIMPLE_OMP_FOR);
+
+#if 0
+  t = build_call_expr (built_in_decls[BUILT_IN_OMP_GET_NUM_THREADS], 0);
+#else
+  t = size_one_node;
+#endif
+  t = fold_convert (itype, t);
+  nthreads = force_gimple_operand_gsi (&gsi, t, true, NULL_TREE,
+				       true, GSI_SAME_STMT);
+  
+#if 0
+  t = build_call_expr (built_in_decls[BUILT_IN_OMP_GET_THREAD_NUM], 0);
+#else
+  t = size_zero_node;
+#endif
+  t = fold_convert (itype, t);
+  threadid = force_gimple_operand_gsi (&gsi, t, true, NULL_TREE,
+				       true, GSI_SAME_STMT);
+
+  fd->loop.n1
+    = force_gimple_operand_gsi (&gsi, fold_convert (type, fd->loop.n1),
+				true, NULL_TREE, true, GSI_SAME_STMT);
+  fd->loop.n2
+    = force_gimple_operand_gsi (&gsi, fold_convert (itype, fd->loop.n2),
+				true, NULL_TREE, true, GSI_SAME_STMT);
+  fd->loop.step
+    = force_gimple_operand_gsi (&gsi, fold_convert (itype, fd->loop.step),
+				true, NULL_TREE, true, GSI_SAME_STMT);
+
+  t = build_int_cst (itype, (fd->loop.cond_code == LT_EXPR ? -1 : 1));
+  t = fold_build2 (PLUS_EXPR, itype, fd->loop.step, t);
+  t = fold_build2 (PLUS_EXPR, itype, t, fd->loop.n2);
+  t = fold_build2 (MINUS_EXPR, itype, t, fold_convert (itype, fd->loop.n1));
+  if (TYPE_UNSIGNED (itype) && fd->loop.cond_code == GT_EXPR)
+    t = fold_build2 (TRUNC_DIV_EXPR, itype,
+		     fold_build1 (NEGATE_EXPR, itype, t),
+		     fold_build1 (NEGATE_EXPR, itype, fd->loop.step));
+  else
+    t = fold_build2 (TRUNC_DIV_EXPR, itype, t, fd->loop.step);
+  t = fold_convert (itype, t);
+  n = force_gimple_operand_gsi (&gsi, t, true, NULL_TREE, true, GSI_SAME_STMT);
+
+  t = fold_build2 (TRUNC_DIV_EXPR, itype, n, nthreads);
+  q = force_gimple_operand_gsi (&gsi, t, true, NULL_TREE, true, GSI_SAME_STMT);
+
+  t = fold_build2 (MULT_EXPR, itype, q, nthreads);
+  t = fold_build2 (NE_EXPR, itype, t, n);
+  t = fold_build2 (PLUS_EXPR, itype, q, t);
+  q = force_gimple_operand_gsi (&gsi, t, true, NULL_TREE, true, GSI_SAME_STMT);
+
+  t = build2 (MULT_EXPR, itype, q, threadid);
+  s0 = force_gimple_operand_gsi (&gsi, t, true, NULL_TREE, true, GSI_SAME_STMT);
+
+  t = fold_build2 (PLUS_EXPR, itype, s0, q);
+  t = fold_build2 (MIN_EXPR, itype, t, n);
+  e0 = force_gimple_operand_gsi (&gsi, t, true, NULL_TREE, true, GSI_SAME_STMT);
+
+  t = build2 (GE_EXPR, boolean_type_node, s0, e0);
+  gsi_insert_before (&gsi, gimple_build_cond_empty (t), GSI_SAME_STMT);
+
+  /* Remove the GIMPLE_OMP_FOR statement.  */
+  gsi_remove (&gsi, true);
+
+  /* Setup code for sequential iteration goes in SEQ_START_BB.  */
+  gsi = gsi_start_bb (seq_start_bb);
+
+  t = fold_convert (itype, s0);
+  t = fold_build2 (MULT_EXPR, itype, t, fd->loop.step);
+  if (POINTER_TYPE_P (type))
+    t = fold_build2 (POINTER_PLUS_EXPR, type, fd->loop.n1,
+		     fold_convert (sizetype, t));
+  else
+    t = fold_build2 (PLUS_EXPR, type, t, fd->loop.n1);
+  t = force_gimple_operand_gsi (&gsi, t, false, NULL_TREE,
+				false, GSI_CONTINUE_LINKING);
+  stmt = gimple_build_assign (fd->loop.v, t);
+  gsi_insert_after (&gsi, stmt, GSI_CONTINUE_LINKING);
+ 
+  t = fold_convert (itype, e0);
+  t = fold_build2 (MULT_EXPR, itype, t, fd->loop.step);
+  if (POINTER_TYPE_P (type))
+    t = fold_build2 (POINTER_PLUS_EXPR, type, fd->loop.n1,
+		     fold_convert (sizetype, t));
+  else
+    t = fold_build2 (PLUS_EXPR, type, t, fd->loop.n1);
+  e = force_gimple_operand_gsi (&gsi, t, true, NULL_TREE,
+				false, GSI_CONTINUE_LINKING);
+
+  /* The code controlling the sequential loop replaces the
+     GIMPLE_OMP_CONTINUE.  */
+  gsi = gsi_last_bb (cont_bb);
+  stmt = gsi_stmt (gsi);
+  gcc_assert (gimple_code (stmt) == GIMPLE_OMP_CONTINUE);
+  vmain = gimple_omp_continue_control_use (stmt);
+  vback = gimple_omp_continue_control_def (stmt);
+
+  if (POINTER_TYPE_P (type))
+    t = fold_build2 (POINTER_PLUS_EXPR, type, vmain,
+		     fold_convert (sizetype, fd->loop.step));
+  else
+    t = fold_build2 (PLUS_EXPR, type, vmain, fd->loop.step);
+  t = force_gimple_operand_gsi (&gsi, t, false, NULL_TREE,
+				true, GSI_SAME_STMT);
+  stmt = gimple_build_assign (vback, t);
+  gsi_insert_before (&gsi, stmt, GSI_SAME_STMT);
+
+  t = build2 (fd->loop.cond_code, boolean_type_node, vback, e);
+  gsi_insert_before (&gsi, gimple_build_cond_empty (t), GSI_SAME_STMT);
+
+  /* Remove the GIMPLE_OMP_CONTINUE statement.  */
+  gsi_remove (&gsi, true);
+
+  /* Replace the GIMPLE_OMP_RETURN with a barrier, or nothing.  */
+  gsi = gsi_last_bb (exit_bb);
+  if (!gimple_omp_return_nowait_p (gsi_stmt (gsi)))
+    force_gimple_operand_gsi (&gsi, build_omp_barrier (), false, NULL_TREE,
+			      false, GSI_SAME_STMT);
+  gsi_remove (&gsi, true);
+
+  /* Connect all the blocks.  */
+  find_edge (entry_bb, seq_start_bb)->flags = EDGE_FALSE_VALUE;
+  find_edge (entry_bb, fin_bb)->flags = EDGE_TRUE_VALUE;
+
+  find_edge (cont_bb, body_bb)->flags = EDGE_TRUE_VALUE;
+  find_edge (cont_bb, fin_bb)->flags = EDGE_FALSE_VALUE;
+ 
+  set_immediate_dominator (CDI_DOMINATORS, seq_start_bb, entry_bb);
+  set_immediate_dominator (CDI_DOMINATORS, body_bb,
+			   recompute_dominator (CDI_DOMINATORS, body_bb));
+  set_immediate_dominator (CDI_DOMINATORS, fin_bb,
+			   recompute_dominator (CDI_DOMINATORS, fin_bb));
+}
+
 
 /* A subroutine of expand_omp_for.  Generate code for a parallel
    loop with static schedule and a specified chunk size.  Given
@@ -4533,6 +4710,8 @@ expand_omp_for (struct omp_region *regio
       else
 	expand_omp_for_static_chunk (region, &fd);
     }
+  else if (fd.sched_kind == OMP_CLAUSE_SCHEDULE_MASTER)
+    expand_numa_for_static_nochunk (region, &fd);
   else
     {
       int fn_index, start_ix, next_ix;
Index: toplev.c
===================================================================
--- toplev.c	(revision 148227)
+++ toplev.c	(working copy)
@@ -2117,12 +2117,14 @@ lang_dependent_init_target (void)
 }
 
 EXTRA_TARGETS_DECL (int lang_dependent_init (const char *));
+EXTRA_TARGETS_DECL (int initialize_sizetypes (bool));
 
 /* Language-dependent initialization.  Returns nonzero on success.  */
 int
 lang_dependent_init (const char *name)
 {
   location_t save_loc ATTRIBUTE_UNUSED;
+  bool signed_sizetype ATTRIBUTE_UNUSED;
 
   targetm_pnt = &this_targetm;
 #ifndef EXTRA_TARGET
@@ -2135,11 +2137,18 @@ lang_dependent_init (const char *name)
   if (lang_hooks.init () == 0)
     return 0;
   input_location = save_loc;
+  signed_sizetype = !TYPE_UNSIGNED (sizetype);
+  EXTRA_TARGETS_CALL (initialize_sizetypes (signed_sizetype));
   EXTRA_TARGETS_CALL (lang_dependent_init (name));
   targetm_pnt = &this_targetm;
 
   init_asm_output (name);
-#endif /* !EXTRA_TARGET */
+#else /* EXTRA_TARGET */
+  if (TYPE_MODE (sizetype) != ptr_mode)
+    sizetype
+      = lang_hooks.types.type_for_mode (ptr_mode, TYPE_UNSIGNED (sizetype));
+  set_sizetype (size_type_node);
+#endif /* EXTRA_TARGET */
 
   /* This creates various _DECL nodes, so needs to be called after the
      front end is initialized.  */
Index: tree-ssa-loop-ivopts.c
===================================================================
--- tree-ssa-loop-ivopts.c	(revision 148225)
+++ tree-ssa-loop-ivopts.c	(working copy)
@@ -2582,7 +2582,7 @@ produce_memory_decl_rtl (tree obj, int *
   if (TREE_STATIC (obj) || DECL_EXTERNAL (obj))
     {
       const char *name = IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (obj));
-      x = gen_rtx_SYMBOL_REF (Pmode, name);
+      x = gen_rtx_SYMBOL_REF ((*targetm.get_pmode) (), name);
       SET_SYMBOL_REF_DECL (x, obj);
       x = gen_rtx_MEM (DECL_MODE (obj), x);
       targetm.encode_section_info (obj, x, true);
@@ -2670,7 +2670,7 @@ computation_cost (tree expr, bool speed)
   crtl->maybe_hot_insn_p = speed;
   walk_tree (&expr, prepare_decl_rtl, &regno, NULL);
   start_sequence ();
-  rslt = expand_expr (expr, NULL_RTX, TYPE_MODE (type), EXPAND_NORMAL);
+  rslt = tree_expand_expr (expr, NULL_RTX, TYPE_MODE (type), EXPAND_NORMAL);
   seq = get_insns ();
   end_sequence ();
   default_rtl_profile ();
@@ -3280,9 +3280,9 @@ force_expr_to_var_cost (tree expr, bool 
 	  symbol_cost[i] = computation_cost (addr, i) + 1;
 
 	  address_cost[i]
-	    = computation_cost (build2 (POINTER_PLUS_EXPR, type,
-					addr,
-					build_int_cst (sizetype, 2000)), i) + 1;
+	    = computation_cost (build2 (POINTER_PLUS_EXPR, type, addr,
+					build_int_cst (targetm.sizetype, 2000)),
+					i) + 1;
 	  if (dump_file && (dump_flags & TDF_DETAILS))
 	    {
 	      fprintf (dump_file, "force_expr_to_var_cost %s costs:\n", i ? "speed" : "size");
@@ -5487,7 +5487,7 @@ rewrite_use_address (struct ivopts_data 
   gcc_assert (ok);
   unshare_aff_combination (&aff);
 
-  ref = create_mem_ref (&bsi, TREE_TYPE (*use->op_p), &aff, data->speed);
+  ref = tree_create_mem_ref (&bsi, TREE_TYPE (*use->op_p), &aff, data->speed);
   copy_ref_info (ref, *use->op_p);
   *use->op_p = ref;
 }
Index: expr.c
===================================================================
--- expr.c	(revision 148225)
+++ expr.c	(working copy)
@@ -9499,6 +9499,26 @@ expand_expr_real_1 (tree exp, rtx target
   return REDUCE_BIT_FIELD (temp);
 }
 #undef REDUCE_BIT_FIELD
+
+#ifndef EXTRA_TARGET
+EXTRA_TARGETS_DECL (rtx expand_expr_real (tree, rtx, enum machine_mode,
+		    enum expand_modifier, rtx *));
+/* Like expand_expr, but dispatch according to targetm, so this is suitable
+   for tree optimizers that don't have target-specific variants.  */
+rtx
+tree_expand_expr (tree exp, rtx target, enum machine_mode mode,
+		  enum expand_modifier modifier)
+{
+
+  rtx (*expand_expr_array[]) (tree, rtx, enum machine_mode,
+                                    enum expand_modifier, rtx *)
+    = { &expand_expr_real, EXTRA_TARGETS_EXPAND_COMMA (&,expand_expr_real) };
+
+  return ((*expand_expr_array[targetm.target_arch])
+	  (exp, target, mode, modifier, NULL));
+}
+
+#endif /* EXTRA_TARGET */
 

 /* Subroutine of above: reduce EXP to the precision of TYPE (in the
    signedness of TYPE), possibly returning the result in TARGET.  */
Index: expr.h
===================================================================
--- expr.h	(revision 148225)
+++ expr.h	(working copy)
@@ -561,6 +561,9 @@ expand_expr (tree exp, rtx target, enum 
   return expand_expr_real (exp, target, mode, modifier, NULL);
 }
 
+extern rtx tree_expand_expr (tree, rtx, enum machine_mode,
+			     enum expand_modifier);
+
 static inline rtx
 expand_normal (tree exp)
 {
Index: tree-parloops.c
===================================================================
--- tree-parloops.c	(revision 148488)
+++ tree-parloops.c	(working copy)
@@ -736,7 +736,7 @@ expr_invariant_in_region_p (edge entry, 
 static tree
 separate_decls_in_region_name (tree name,
 			       htab_t name_copies, htab_t decl_copies,
-			       bool copy_name_p)
+			       bool copy_name_p, int new_target)
 {
   tree copy, var, var_copy;
   unsigned idx, uid, nuid;
@@ -760,7 +760,14 @@ separate_decls_in_region_name (tree name
   dslot = htab_find_slot_with_hash (decl_copies, &ielt, uid, INSERT);
   if (!*dslot)
     {
-      var_copy = create_tmp_var (TREE_TYPE (var), get_name (var));
+      tree type = TREE_TYPE (var);
+
+      if (new_target != targetm.target_arch && POINTER_TYPE_P (type))
+	type
+	  = ((TREE_CODE (type) == POINTER_TYPE
+	      ? build_pointer_type_for_mode : build_reference_type_for_mode)
+	     (TREE_TYPE (type), *targetm_array[new_target]->ptr_mode, false));
+      var_copy = create_tmp_var (type, get_name (var));
       DECL_GIMPLE_REG_P (var_copy) = DECL_GIMPLE_REG_P (var);
       add_referenced_var (var_copy);
       nielt = XNEW (struct int_tree_map);
@@ -810,7 +817,8 @@ separate_decls_in_region_name (tree name
 
 static void
 separate_decls_in_region_stmt (edge entry, edge exit, gimple stmt,
-			       htab_t name_copies, htab_t decl_copies)
+			       htab_t name_copies, htab_t decl_copies,
+			       unsigned new_target)
 {
   use_operand_p use;
   def_operand_p def;
@@ -825,7 +833,7 @@ separate_decls_in_region_stmt (edge entr
     name = DEF_FROM_PTR (def);
     gcc_assert (TREE_CODE (name) == SSA_NAME);
     copy = separate_decls_in_region_name (name, name_copies, decl_copies,
-					  false);
+					  false, new_target);
     gcc_assert (copy == name);
   }
 
@@ -837,7 +845,7 @@ separate_decls_in_region_stmt (edge entr
 
     copy_name_p = expr_invariant_in_region_p (entry, exit, name);
     copy = separate_decls_in_region_name (name, name_copies, decl_copies,
-					  copy_name_p);
+					  copy_name_p, new_target);
     SET_USE (use, copy);
   }
 }
@@ -879,6 +887,41 @@ add_field_for_name (void **slot, void *d
   return 1;
 }
 
+/* Called by the NUMA case of separate_decls_in_region via htab_traverse.
+   Computes the callee target start address and size of a parameter array
+   described by *SLOT and updates the size description of the parameter area;
+   DATA points to the parameter area description SIZES_ADDR[0..2].
+   SIZES_ADDR[0] tallies the per-iteration size, and SIZES_ADDR[1] the
+   iteration-independent constant size.
+   SIZES_ADDR[2] contains the callee target start address of the parameter
+   area.  */
+static int
+add_size_for_param_array (void **slot, void *data)
+{
+  param_array elt = (param_array) *slot;
+  tree *sizes_addr = (tree *) data;
+  tree min, max, offset, size, stride_tree;
+
+  stride_tree = build_int_cst (size_type_node, elt->stride);
+  min = elt->read_offset.min ? elt->read_offset.min : elt->write_offset.min;
+  if (elt->write_offset.min && tree_int_cst_lt (elt->write_offset.min, min))
+    min = elt->write_offset.min;
+  max = elt->read_offset.max ? elt->read_offset.max : elt->write_offset.max;
+  if (elt->write_offset.max && tree_int_cst_lt (max, elt->write_offset.max))
+    max = elt->write_offset.max;
+  offset = size_binop (MULT_EXPR, sizes_addr[2], sizes_addr[0]);
+  offset = size_binop (PLUS_EXPR, offset, sizes_addr[1]);
+  sizes_addr[0] = size_binop (PLUS_EXPR, sizes_addr[0], stride_tree);
+  sizes_addr[1]
+    = size_binop (PLUS_EXPR, sizes_addr[1], size_binop (MINUS_EXPR, max, min));
+  elt->callee_base
+    = fold_build2 (POINTER_PLUS_EXPR, TREE_TYPE (sizes_addr[3]), sizes_addr[3],
+		   size_binop (MINUS_EXPR, offset, min));
+  size = size_binop (MULT_EXPR, sizes_addr[2], stride_tree);
+  elt->size = size_binop (PLUS_EXPR, size, size_binop (MINUS_EXPR, max, min));
+  return 1;
+}
+
 /* Callback for htab_traverse.  A local result is the intermediate result 
    computed by a single 
    thread, or the initial value in case no iteration was executed.
@@ -930,6 +973,8 @@ struct clsn_data
 
   basic_block store_bb;
   basic_block load_bb;
+  gimple_seq result_seq;
+  struct loop *loop;
 };
 
 /* Callback for htab_traverse.  Create an atomic instruction for the
@@ -1102,10 +1147,67 @@ create_loads_and_stores_for_name (void *
   tree type = TREE_TYPE (elt->new_name);
   tree struct_type = TREE_TYPE (TREE_TYPE (clsn_data->load));
   tree load_struct;
+  tree src;
+  struct loop *loop = clsn_data->loop;
 
   gsi = gsi_last_bb (clsn_data->store_bb);
   t = build3 (COMPONENT_REF, type, clsn_data->store, elt->field, NULL_TREE);
-  stmt = gimple_build_assign (t, ssa_name (elt->version));
+  src = ssa_name (elt->version);
+  if (loop->param_arrays)
+    {
+      tree var = SSA_NAME_VAR (src), dst;
+      struct tree_map *m, m_in;
+      param_array a;
+      struct gcc_target *loop_target = targetm_array[loop->target_arch];
+      tree src_param = src;
+
+      m_in.base.from = var;
+      m = (struct tree_map *) htab_find_with_hash (loop->vect_vars, &m_in,
+						   DECL_UID (var));
+      if (m)
+	var = m->to;
+      a = (param_array) htab_find (loop->param_arrays, &var);
+
+      gcc_assert (a);
+      gcc_assert (operand_equal_p (a->caller_base, var, 0));
+      dst = a->callee_base;
+      gcc_assert (TYPE_MODE (TREE_TYPE (src)) == TYPE_MODE (TREE_TYPE (dst)));
+      if (TREE_CODE (TREE_TYPE (src)) == POINTER_TYPE
+	  && TYPE_MODE (TREE_TYPE (src)) != ptr_mode)
+	{
+	  tree param_type;
+
+	  param_type = build_pointer_type (TREE_TYPE (TREE_TYPE (src)));
+	  src_param = fold_convert (param_type, src);
+	  param_type = build_pointer_type (TREE_TYPE (TREE_TYPE (dst)));
+	  dst = fold_convert (param_type, dst);
+	}
+      if (!is_gimple_val (src))
+	{
+	  var = create_tmp_var (TREE_TYPE (src_param),
+				IDENTIFIER_POINTER (DECL_NAME (var)));
+	  var = make_ssa_name (var, NULL);
+	  stmt = gimple_build_assign (var, src_param);
+	  SSA_NAME_DEF_STMT (var) = stmt;
+	  mark_virtual_ops_for_renaming (stmt);
+	  gsi_insert_after (&gsi, stmt, GSI_CONTINUE_LINKING);
+	  src_param = var;
+	}
+      if (a->read_offset.min)
+	(*targetm.copy_to_target) (&gsi, loop_target, dst, src_param, a->size);
+      if (a->write_offset.min)
+	{
+          gimple_stmt_iterator i;
+
+	  if (!clsn_data->result_seq)
+	    clsn_data->result_seq = gimple_seq_alloc ();
+          i = gsi_last (clsn_data->result_seq);
+	  (*targetm.copy_from_target) (&i, loop_target, dst, src_param,
+				       a->size);
+	}
+      src = dst;
+    }
+  stmt = gimple_build_assign (t, src);
   mark_virtual_ops_for_renaming (stmt);
   gsi_insert_after (&gsi, stmt, GSI_NEW_STMT);
 
@@ -1156,9 +1258,10 @@ create_loads_and_stores_for_name (void *
 static void
 separate_decls_in_region (edge entry, edge exit, htab_t reduction_list,
 			  tree *arg_struct, tree *new_arg_struct, 
-			  struct clsn_data *ld_st_data, unsigned new_target)
+			  struct clsn_data *ld_st_data, struct loop *loop)
 
 {
+  int new_target = loop->target_arch;
   basic_block bb1 = split_edge (entry);
   basic_block bb0 = single_pred (bb1);
   htab_t name_copies = htab_create (10, name_to_copy_elt_hash,
@@ -1173,6 +1276,7 @@ separate_decls_in_region (edge entry, ed
   basic_block bb;
   basic_block entry_bb = bb1;
   basic_block exit_bb = exit->dest;
+  tree copy_base_var, copy_base;
 
   entry = single_succ_edge (entry_bb);
   gather_blocks_in_sese_region (entry_bb, exit_bb, &body);
@@ -1183,11 +1287,13 @@ separate_decls_in_region (edge entry, ed
 	{
 	  for (gsi = gsi_start_phis (bb); !gsi_end_p (gsi); gsi_next (&gsi))
 	    separate_decls_in_region_stmt (entry, exit, gsi_stmt (gsi),
-					   name_copies, decl_copies);
+					   name_copies, decl_copies,
+					   new_target);
 
 	  for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
 	    separate_decls_in_region_stmt (entry, exit, gsi_stmt (gsi),
-					   name_copies, decl_copies);
+					   name_copies, decl_copies,
+					   new_target);
 	}
     }
 
@@ -1208,6 +1314,9 @@ separate_decls_in_region (edge entry, ed
 			      type);
       TYPE_NAME (type) = type_name;
 
+      /* ??? For ARCompact / mxp, we should be able to transfer most or all
+	 values directly from ARCompact core to mxp vector register.
+	 OTOH, who is willing to fund the development work?  */
       htab_traverse (name_copies, add_field_for_name, type);
       if (reduction_list && htab_elements (reduction_list) > 0)
 	{
@@ -1216,6 +1325,47 @@ separate_decls_in_region (edge entry, ed
                          type);
 	}
       layout_type (type);
+      bool numa
+	= !(*targetm.common_data_with_target) (targetm_array[new_target]);
+
+      if (numa)
+	{
+	  /* Calculate how much memory we need on the new_target side.  */
+	  tree sizes_addr[4];
+	  tree size;
+	  tree niter;
+	  tree ptype, fn_type, fn;
+	  gimple_stmt_iterator gsi;
+	  gimple stmt;
+
+	  ptype = (build_pointer_type_for_mode
+		    (void_type_node, *targetm_array[new_target]->ptr_mode,
+		     false));
+	  copy_base_var = create_tmp_var (ptype, "copy_base");
+	  add_referenced_var (copy_base_var);
+	  niter = number_of_latch_executions (loop);
+	  sizes_addr[0] = size_zero_node;
+	  sizes_addr[1] = size_in_bytes (type);
+	  sizes_addr[2] = niter;
+	  copy_base = make_ssa_name (copy_base_var, 0);
+	  sizes_addr[3] = copy_base;
+	  htab_traverse (loop->param_arrays, add_size_for_param_array,
+			 sizes_addr);
+	  size = size_binop (PLUS_EXPR,
+			     size_binop (MULT_EXPR, niter, sizes_addr[0]),
+			     sizes_addr[1]);
+	  /* Emit gimple to allocate SIZE bytes, assign to copy_base */
+	  fn_type = build_function_type_list (integer_type_node,
+					      integer_type_node, NULL_TREE);
+	  fn = get_identifier ("__simd_malloc");
+	  fn = build_decl (FUNCTION_DECL, fn, fn_type);
+	  stmt = gimple_build_call (fn, 1, size);
+	  SSA_NAME_DEF_STMT (copy_base) = stmt;
+	  gimple_call_set_lhs (stmt, copy_base);
+	  mark_virtual_ops_for_renaming (stmt);
+	  gsi = gsi_last_bb (bb0);
+	  gsi_insert_after (&gsi, stmt, GSI_NEW_STMT);
+	}
  
       /* Create the loads and stores.  */
       *arg_struct = create_tmp_var (type, ".paral_data_store");
@@ -1231,9 +1381,19 @@ separate_decls_in_region (edge entry, ed
       ld_st_data->load = *new_arg_struct;
       ld_st_data->store_bb = bb0;
       ld_st_data->load_bb = bb1;
+      ld_st_data->result_seq = NULL;
+      ld_st_data->loop = loop;
 
       htab_traverse (name_copies, create_loads_and_stores_for_name,
 		     ld_st_data);
+      if (numa)
+	{
+	  gsi = gsi_last_bb (bb0);
+	  (*targetm.copy_to_target) (&gsi, targetm_array[new_target], copy_base,
+				     build_fold_addr_expr (*arg_struct),
+				     size_in_bytes (type));
+	  *arg_struct = build_fold_indirect_ref (copy_base);
+	}
 
       /* Load the calculation from memory (after the join of the threads).  */
 
@@ -1242,10 +1402,12 @@ separate_decls_in_region (edge entry, ed
 	  htab_traverse (reduction_list, create_stores_for_reduction,
                         ld_st_data); 
 	  clsn_data.load = make_ssa_name (nvar, NULL);
-	  clsn_data.load_bb = exit->dest;
+	  clsn_data.load_bb = exit_bb;
 	  clsn_data.store = ld_st_data->store;
 	  create_final_loads_for_reduction (reduction_list, &clsn_data);
 	}
+      gsi = gsi_after_labels (split_edge (exit));
+      gsi_insert_seq_before (&gsi, ld_st_data->result_seq, GSI_NEW_STMT);
     }
 
   htab_delete (decl_copies);
@@ -1410,7 +1572,9 @@ canonicalize_loop_ivs (struct loop *loop
       remove_phi_node (&psi, false);
 
       atype = TREE_TYPE (res);
-      mtype = POINTER_TYPE_P (atype) ? sizetype : atype;
+      mtype = (POINTER_TYPE_P (atype)
+	       ? targetm_array[loop->target_arch]->sizetype_tab[SIZETYPE]
+	       : atype);
       val = fold_build2 (MULT_EXPR, mtype, unshare_expr (iv.step),
 			 fold_convert (mtype, var_before));
       val = fold_build2 (POINTER_TYPE_P (atype)
@@ -1572,6 +1736,8 @@ create_parallel_loop (struct loop *loop,
   gimple stmt, for_stmt, phi, cond_stmt;
   tree cvar, cvar_init, initvar, cvar_next, cvar_base, type;
   edge exit, nexit, guard, end, e;
+  bool numa
+    = !(*targetm.common_data_with_target) (targetm_array[loop->target_arch]);
 
   /* Prepare the GIMPLE_OMP_PARALLEL statement.  */
   bb = loop_preheader_edge (loop)->src;
@@ -1650,7 +1816,10 @@ create_parallel_loop (struct loop *loop,
   gimple_cond_set_lhs (cond_stmt, cvar_base);
   type = TREE_TYPE (cvar);
   t = build_omp_clause (OMP_CLAUSE_SCHEDULE);
-  OMP_CLAUSE_SCHEDULE_KIND (t) = OMP_CLAUSE_SCHEDULE_STATIC;
+  if (numa)
+    OMP_CLAUSE_SCHEDULE_KIND (t) = OMP_CLAUSE_SCHEDULE_MASTER;
+  else
+    OMP_CLAUSE_SCHEDULE_KIND (t) = OMP_CLAUSE_SCHEDULE_STATIC;
 
   for_stmt = gimple_build_omp_for (NULL, t, 1, NULL);
   gimple_omp_for_set_index (for_stmt, 0, initvar);
@@ -1697,6 +1866,7 @@ gen_parallel_loop (struct loop *loop, ht
   unsigned prob;
   bool arch_change = loop->target_arch != cfun->target_arch;
   bool parallelize_all = arch_change;
+  struct gcc_target *save_target;
 
   /* From
 
@@ -1794,7 +1964,10 @@ gen_parallel_loop (struct loop *loop, ht
   free_original_copy_tables ();
 
   /* Base all the induction variables in LOOP on a single control one.  */
+  save_target = targetm_pnt;
+  targetm_pnt = targetm_array[loop->target_arch];
   canonicalize_loop_ivs (loop, reduction_list, &nit);
+  targetm_pnt = save_target;
 
   /* Ensure that the exit condition is the first statement in the loop.  */
   if (!parallelize_all)
@@ -1813,7 +1986,7 @@ gen_parallel_loop (struct loop *loop, ht
   /* In the old loop, move all variables non-local to the loop to a structure
      and back, and create separate decls for the variables used in loop.  */
   separate_decls_in_region (entry, exit, reduction_list, &arg_struct, 
-			    &new_arg_struct, &clsn_data, loop->target_arch);
+			    &new_arg_struct, &clsn_data, loop);
 
   /* Create the parallel constructs.  */
   parallel_head
Index: tree-ssa-address.c
===================================================================
--- tree-ssa-address.c	(revision 148225)
+++ tree-ssa-address.c	(working copy)
@@ -42,6 +42,7 @@ along with GCC; see the file COPYING3.  
 #include "expr.h"
 #include "ggc.h"
 #include "tree-affine.h"
+#include "target.h"
 #include "multi-target.h"
 
 /* TODO -- handling of symbols (according to Richard Hendersons
@@ -247,6 +248,20 @@ addr_for_mem_ref (struct mem_address *ad
 }
 
 #ifndef EXTRA_TARGET
+EXTRA_TARGETS_DECL (rtx addr_for_mem_ref (struct mem_address *, bool));
+
+/* Like addr_for_mem_ref, but dispatch according to targetm, so this is
+   suitable for tree optimizers that don't have target-specific variants.  */
+
+rtx
+tree_addr_for_mem_ref (struct mem_address *addr, bool really_expand)
+{
+  rtx (*addr_for_mem_ref_array[]) (struct mem_address *, bool)
+    = { &addr_for_mem_ref, EXTRA_TARGETS_EXPAND_COMMA (&,addr_for_mem_ref) };
+
+  return (*addr_for_mem_ref_array[targetm.target_arch]) (addr, really_expand);
+}
+
 /* Returns address of MEM_REF in TYPE.  */
 
 tree
@@ -698,6 +713,23 @@ create_mem_ref (gimple_stmt_iterator *gs
 }
 
 #ifndef EXTRA_TARGET
+EXTRA_TARGETS_DECL (tree create_mem_ref (gimple_stmt_iterator *gsi, tree type,
+					 aff_tree *addr, bool speed));
+
+/* Like create_mem_ref, but dispatch according to targetm, so this is
+   suitable for tree optimizers that don't have target-specific variants.  */
+
+tree
+tree_create_mem_ref (gimple_stmt_iterator *gsi, tree type, aff_tree *addr,
+		     bool speed)
+{
+  tree (*create_mem_ref_array[]) (gimple_stmt_iterator *, tree, aff_tree *,
+				  bool)
+    = { &create_mem_ref, EXTRA_TARGETS_EXPAND_COMMA (&,create_mem_ref) };
+
+  return (*create_mem_ref_array[targetm.target_arch]) (gsi, type, addr, speed);
+}
+
 /* Copies components of the address from OP to ADDR.  */
 
 void
Index: function.c
===================================================================
--- function.c	(revision 148225)
+++ function.c	(working copy)
@@ -4083,6 +4083,27 @@ static void (* const allocate_struct_fun
       EXTRA_TARGETS_EXPAND_COMMA (&,allocate_struct_function_1)
     };
 
+/* If FNDECL has a target _arch attribute, return the index of that target
+   architecture in targetm_array; otherwise, return 0.  */
+int
+lookup_attr_target (tree fndecl)
+{
+  int i = 0;
+#if NUM_TARGETS > 1
+  const char *arch_name = targetm.name;
+  tree attr = NULL_TREE;
+
+  if (fndecl)
+    attr = lookup_attribute ("target_arch", DECL_ATTRIBUTES (fndecl));
+  if (attr)
+    arch_name = TREE_STRING_POINTER (TREE_VALUE (TREE_VALUE (attr)));
+  for (; targetm_array[i]; i++)
+    if (strcmp (targetm_array[i]->name, arch_name) == 0)
+      break;
+#endif
+  return i;
+}
+
 /* Allocate a function structure for FNDECL and set its contents
    to the defaults.  Set cfun to the newly-allocated object.
    Some of the helper functions invoked during initialization assume
@@ -4099,19 +4120,7 @@ static void (* const allocate_struct_fun
 void
 allocate_struct_function (tree fndecl, bool abstract_p)
 {
-  int i = 0;
-#if NUM_TARGETS > 1
-  const char *arch_name = targetm.name;
-  tree attr = NULL_TREE;
-
-  if (fndecl)
-    attr = lookup_attribute ("target_arch", DECL_ATTRIBUTES (fndecl));
-  if (attr)
-    arch_name = TREE_STRING_POINTER (TREE_VALUE (TREE_VALUE (attr)));
-  for (; targetm_array[i]; i++)
-    if (strcmp (targetm_array[i]->name, arch_name) == 0)
-      break;
-#endif
+  int i = lookup_attr_target (fndecl);
   cfun = GGC_CNEW (struct function);
   cfun->target_arch = i;
   targetm_pnt = targetm_array[i];
Index: tree-affine.c
===================================================================
--- tree-affine.c	(revision 148213)
+++ tree-affine.c	(working copy)
@@ -32,6 +32,7 @@ along with GCC; see the file COPYING3.  
 #include "tree-affine.h"
 #include "gimple.h"
 #include "flags.h"
+#include "target.h"
 
 /* Extends CST as appropriate for the affine combinations COMB.  */
 
@@ -352,7 +353,7 @@ add_elt_to_tree (tree expr, tree type, t
   enum tree_code code;
   tree type1 = type;
   if (POINTER_TYPE_P (type))
-    type1 = sizetype;
+    type1 = targetm.sizetype;
 
   scale = double_int_ext_for_comb (scale, comb);
   elt = fold_convert (type1, elt);
@@ -415,7 +416,7 @@ aff_combination_to_tree (aff_tree *comb)
   double_int off, sgn;
   tree type1 = type;
   if (POINTER_TYPE_P (type))
-    type1 = sizetype;
+    type1 = targetm.sizetype;
 
   gcc_assert (comb->n == MAX_AFF_ELTS || comb->rest == NULL_TREE);
 
Index: target-def.h
===================================================================
--- target-def.h	(revision 148225)
+++ target-def.h	(working copy)
@@ -32,6 +32,8 @@
 
 /* TARGET_NAME is defined by the Makefile.  */
 
+#define TARGET_GET_PMODE default_get_pmode
+
 /* Assembler output.  */
 #ifndef TARGET_ASM_OPEN_PAREN
 #define TARGET_ASM_OPEN_PAREN "("
@@ -444,6 +446,22 @@
 #define TARGET_STRIP_NAME_ENCODING default_strip_name_encoding
 #endif
 
+#ifndef TARGET_COMMON_DATA_WITH_TARGET
+#define TARGET_COMMON_DATA_WITH_TARGET default_common_data_with_target
+#endif
+
+#ifndef TARGET_COPY_TO_TARGET
+#define TARGET_COPY_TO_TARGET 0
+#endif
+
+#ifndef TARGET_COPY_FROM_TARGET
+#define TARGET_COPY_FROM_TARGET 0
+#endif
+
+#ifndef TARGET_BUILD_CALL_ON_TARGET
+#define TARGET_BUILD_CALL_ON_TARGET 0
+#endif
+
 #ifndef TARGET_BINDS_LOCAL_P
 #define TARGET_BINDS_LOCAL_P default_binds_local_p
 #endif
@@ -847,6 +865,8 @@
   TARGET_NAME,					\
   TARGET_NUM,					\
   &ptr_mode,					\
+  TARGET_GET_PMODE,				\
+  &sizetype_tab[0],				\
   TARGET_ASM_OUT,				\
   TARGET_SCHED,					\
   TARGET_VECTORIZE,				\
@@ -896,6 +916,10 @@
   TARGET_MANGLE_DECL_ASSEMBLER_NAME,		\
   TARGET_ENCODE_SECTION_INFO,			\
   TARGET_STRIP_NAME_ENCODING,			\
+  TARGET_COMMON_DATA_WITH_TARGET,		\
+  TARGET_COPY_TO_TARGET,			\
+  TARGET_COPY_FROM_TARGET,			\
+  TARGET_BUILD_CALL_ON_TARGET,			\
   TARGET_SHIFT_TRUNCATION_MASK,			\
   TARGET_MIN_DIVISIONS_FOR_RECIP_MUL,		\
   TARGET_MODE_REP_EXTENDED,			\
Index: tree-vect-transform.c
===================================================================
--- tree-vect-transform.c	(revision 148488)
+++ tree-vect-transform.c	(working copy)
@@ -969,6 +969,108 @@ vect_create_addr_base_for_vector_ref (gi
   return vec_stmt;
 }
 
+/* Function vect_decompose_addr_base_for_vector_ref.
+
+   Decompose the address of the first memory location
+   that will be accessed for a data reference.
+
+   Input:
+   STMT: The statement containing the data reference.
+   OFFSET: Optional. If supplied, it is be added to the initial address.
+   LOOP:    Specify relative to which loop-nest should the address be computed.
+            For example, when the dataref is in an inner-loop nested in an
+	    outer-loop that is now being vectorized, LOOP can be either the
+	    outer-loop, or the inner-loop. The first memory location accessed
+	    by the following dataref ('in' points to short):
+
+		for (i=0; i<N; i++)
+		   for (j=0; j<M; j++)
+		     s += in[i+j]
+
+	    is as follows:
+	    if LOOP=i_loop:	&in		(relative to i_loop)
+	    if LOOP=j_loop: 	&in+i*2B	(relative to j_loop)
+
+   Output:
+   1. Return a GENERIC expression whose value is the address derived from the
+      base address derived from the declaration of the array / variable in
+      the memory access.
+   2. Decompose the offset from there to the address of the memory
+      location of the first vector of the data reference into a constant part
+      *coffset and a variable part *voffset.
+
+   FORNOW: We are only handling array accesses with step 1.  */
+
+static tree
+vect_decompose_addr_base_for_vector_ref (gimple stmt, tree offset,
+					 struct loop *loop,
+					 tree *coffset, tree *voffset)
+{
+  stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
+  struct data_reference *dr = STMT_VINFO_DATA_REF (stmt_info);
+  struct loop *containing_loop = (gimple_bb (stmt))->loop_father;
+  tree data_ref_base = unshare_expr (DR_BASE_ADDRESS (dr));
+  tree base_name;
+  tree base_offset = unshare_expr (DR_OFFSET (dr));
+  tree init = unshare_expr (DR_INIT (dr));
+  tree step = TYPE_SIZE_UNIT (TREE_TYPE (DR_REF (dr)));
+  tree tmp_var, tmp_off;
+
+  gcc_assert (loop);
+  if (loop != containing_loop)
+    {
+      loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
+      struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
+
+      gcc_assert (nested_in_vect_loop_p (loop, stmt));
+
+      data_ref_base = unshare_expr (STMT_VINFO_DR_BASE_ADDRESS (stmt_info));
+      base_offset = unshare_expr (STMT_VINFO_DR_OFFSET (stmt_info));
+      init = unshare_expr (STMT_VINFO_DR_INIT (stmt_info));
+    }
+
+  /* Create data_ref_base */
+  base_name = build_fold_indirect_ref (data_ref_base);
+
+  init = fold_convert (sizetype, init);
+  if (offset)
+    {
+      gcc_assert (really_constant_p (offset));
+      offset = fold_build2 (MULT_EXPR, sizetype,
+			    fold_convert (sizetype, offset), step);
+      init = size_binop (PLUS_EXPR, init, offset);
+    }
+
+  split_constant_offset (base_offset, &tmp_var, &tmp_off);
+  base_offset = fold_convert (sizetype, tmp_var);
+  init = size_binop (PLUS_EXPR, init, fold_convert (sizetype, tmp_off));
+
+  *coffset = init;
+  *voffset = base_offset;
+
+  /* We rely here on get_name only accepting a variable declaration or its
+     address, not any PLUS_EXPR with some other offset.  */
+  gcc_assert (get_name (base_name));
+
+  return base_name;
+}
+
+static unsigned int
+param_array_hash (const void *p)
+{
+  param_array elem = (param_array) p;
+
+  return htab_hash_pointer (elem->decl);
+}
+
+static int
+param_array_eq (const void *p0, const void *p1)
+{
+  param_array e0 = (param_array) p0;
+  param_array e1 = (param_array) p1;
+
+  return e0->decl == e1->decl;
+}
 
 /* Function vect_create_data_ref_ptr.
 
@@ -1044,6 +1146,8 @@ vect_create_data_ref_ptr (gimple stmt, s
   gimple incr;
   tree step;
   alias_set_type ptr_alias_set = 0;
+  bool numa = !((*targetm_array[cfun->target_arch]->common_data_with_target)
+		 (targetm_array[loop->target_arch]));
   enum machine_mode tptrmode = *targetm_array[loop->target_arch]->ptr_mode;
 
   /* Check the step (evolution) of the load in LOOP, and record
@@ -1115,6 +1219,74 @@ vect_create_data_ref_ptr (gimple stmt, s
       mark_sym_for_renaming (tag);
     }
 
+  if (numa)
+    {
+      tree decl;
+      void **slot;
+      param_array new_a;
+      int stride;
+      bool load_p = at_loop != NULL;
+      tree base, coffset, voffset;
+      struct tree_range *rangep;
+      struct tree_map *m;
+
+      if (!loop->param_arrays)
+	{
+	  loop->param_arrays
+	    = htab_create (10, param_array_hash, param_array_eq, free);
+	  loop->vect_vars
+	    = htab_create (10, tree_map_hash, tree_map_eq, free);
+	}
+
+      /* If we want to handle vectorizing outer loops, we need a more
+	 complex model of the to-be-transferred arrays than an index range
+	 and a single stride.  I.e. we'd have to consider that the entire
+	 access range of the inner loop must be present, and write overlap with
+	 a following simulatanously processed range must be avoided.  */
+      gcc_assert (!nested_in_vect_loop);
+      /* Moreover, if the address is initialized inside the loop (in the
+	 preheader of the inner loop), we'd need to arrange for the DMA
+	 to be somewhere else.  */
+      gcc_assert (!at_loop || at_loop == loop);
+      gcc_assert (!*inv_p);
+      stride = tree_low_cst (TYPE_SIZE_UNIT (vectype), 1);
+      base = vect_decompose_addr_base_for_vector_ref (stmt, offset, loop,
+						      &coffset, &voffset);
+      decl = get_get_name_decl (base);
+      slot = htab_find_slot (loop->param_arrays, &decl, INSERT);
+      new_a = *(param_array *) slot;
+      if (!new_a)
+	{
+	  new_a = XCNEW (struct param_array_d);
+	  *slot = new_a;
+	  new_a->decl = decl;
+	  new_a->caller_base = base;
+	  new_a->stride = stride;
+	  new_a->invar_offset = voffset;
+	  rangep = load_p ? &new_a->read_offset : &new_a->write_offset;
+	  rangep->min = rangep->max = coffset;
+	}
+      else
+	{
+	  gcc_assert (operand_equal_p (new_a->caller_base, base, 0));
+	  gcc_assert (new_a->stride == stride);
+	  gcc_assert (operand_equal_p (new_a->invar_offset, voffset, 0));
+
+	  rangep = load_p ? &new_a->read_offset : &new_a->write_offset;
+	  if (!rangep->min || tree_int_cst_lt (coffset, rangep->min))
+	    rangep->min = coffset;
+	  if (!rangep->max || tree_int_cst_lt (rangep->max, coffset))
+	    rangep->max = coffset;
+	}
+      m = XCNEW (struct tree_map);
+      m->hash = DECL_UID (vect_ptr);
+      m->base.from = vect_ptr;
+      m->to = decl;
+      slot = htab_find_slot_with_hash (loop->vect_vars, m, m->hash, INSERT);
+      gcc_assert (*slot == NULL);
+      *slot = m;
+    }
+
   /** Note: If the dataref is in an inner-loop nested in LOOP, and we are
       vectorizing LOOP (i.e. outer-loop vectorization), we need to create two
       def-use update cycles for the pointer: One relative to the outer-loop
Index: cfgloop.h
===================================================================
--- cfgloop.h	(revision 148225)
+++ cfgloop.h	(working copy)
@@ -100,6 +100,41 @@ enum loop_estimation
   EST_AVAILABLE
 };
 
+struct tree_range GTY (()) { tree min, max; } ;
+
+typedef struct param_array_d GTY (())
+{
+  /* The declaration of the base variable, as obtained with get_name_decl.
+     Its name is used to compute the hash key.  */
+  tree decl;
+  /* The expression how this variable actually forms the base address for
+     the access, in the calling context.  */
+  tree caller_base;
+  /* The expression to initialize the variable on the callee side.  */
+  tree callee_base;
+  /* All accesses to this array should agree on stride, because otherwise it
+     is not straightforward to slice this array into separate index ranges.  */
+  int stride;
+  /* Likewise, all accesses should agree on the non-constant offset.  */
+  tree invar_offset;
+  /* max_{read,write}_offset includes the size of the access mode, and thus
+    points to the first not-accessed byte.
+    max_write_offset - min_write_offset must not be larger than stride to
+    allow vectorized operation.
+    forward iteration: the index range is divided into monotonically
+    increasing slices such that the inputs of a slice and all preceding slices
+    have been fully read before its output is written; when operating on a
+    slice, the biv is incremented.
+    backward iteration: likewise, with decreasing index ranges and bivs.
+    max_write_offset - min_read_offset must not be larger than stride to
+    allow forward iteration.
+    max_read_offset - min_write_offset must not be larger than stride to
+    allow backward iteration.  */
+  struct tree_range read_offset;
+  struct tree_range write_offset;
+  tree size;
+} *param_array;
+
 /* Structure to hold information for each natural loop.  */
 struct loop GTY ((chain_next ("%h.next")))
 {
@@ -164,6 +199,16 @@ struct loop GTY ((chain_next ("%h.next")
 
   /* Head of the cyclic list of the exits of the loop.  */
   struct loop_exit *exits;
+
+  /* arrays that are passed from a calling context left on another target
+     architecture.
+     We could use a separate array, hash table or similar to map loop index
+     to the relevant param_array pointer to save compile-time space when
+     this feature is not used (e.g. only a single architecture configured),
+     but that'll require some care to keep keep the mapping in sync when
+     the loop array is resized.  */
+  htab_t GTY ((param_is (struct param_array_d))) param_arrays;
+  htab_t GTY ((param_is (struct tree_map))) vect_vars;
 };
 
 /* Flags for state of loop structure.  */
Index: tree-flow.h
===================================================================
--- tree-flow.h	(revision 148225)
+++ tree-flow.h	(working copy)
@@ -1176,6 +1176,8 @@ tree create_mem_ref (gimple_stmt_iterato
 rtx addr_for_mem_ref (struct mem_address *, bool);
 tree maybe_fold_tmr (tree);
 END_TARGET_SPECIFIC
+tree tree_create_mem_ref (gimple_stmt_iterator *, tree, 
+			  struct affine_tree_combination *, bool);
 void get_address_description (tree, struct mem_address *);
 
 void init_alias_heapvars (void);
Index: Makefile.in
===================================================================
--- Makefile.in	(revision 149001)
+++ Makefile.in	(working copy)
@@ -2251,7 +2251,7 @@ tree-ssa-loop-unswitch.o : tree-ssa-loop
    coretypes.h $(TREE_DUMP_H) $(TREE_PASS_H) $(BASIC_BLOCK_H) hard-reg-set.h \
     $(TREE_INLINE_H)
 tree-ssa-address.o : tree-ssa-address.c $(TREE_FLOW_H) $(CONFIG_H) \
-   $(SYSTEM_H) $(RTL_H) $(TREE_H) $(TM_P_H) \
+   $(SYSTEM_H) $(RTL_H) $(TREE_H) $(TM_P_H) $(TARGET_H) \
    output.h $(DIAGNOSTIC_H) $(TIMEVAR_H) $(TM_H) coretypes.h $(TREE_DUMP_H) \
    $(TREE_PASS_H) $(FLAGS_H) $(TREE_INLINE_H) $(RECOG_H) insn-config.h \
    $(EXPR_H) gt-tree-ssa-address.h $(GGC_H) tree-affine.h
@@ -2289,7 +2289,8 @@ tree-ssa-loop-ivopts.o : tree-ssa-loop-i
    gt-tree-ssa-loop-ivopts.h
 tree-affine.o : tree-affine.c tree-affine.h $(CONFIG_H) pointer-set.h \
    $(SYSTEM_H) $(RTL_H) $(TREE_H) $(TM_P_H) hard-reg-set.h $(GIMPLE_H) \
-   output.h $(DIAGNOSTIC_H) $(TM_H) coretypes.h $(TREE_DUMP_H) $(FLAGS_H)
+   output.h $(DIAGNOSTIC_H) $(TM_H) coretypes.h $(TREE_DUMP_H) $(FLAGS_H) \
+   $(TARGET_H)
 tree-ssa-loop-manip.o : tree-ssa-loop-manip.c $(TREE_FLOW_H) $(CONFIG_H) \
    $(SYSTEM_H) coretypes.h $(TM_H) $(TREE_H) $(RTL_H) $(TM_P_H) hard-reg-set.h \
    $(BASIC_BLOCK_H) output.h $(DIAGNOSTIC_H) $(TREE_FLOW_H) $(TREE_DUMP_H) \
Index: gimple.h
===================================================================
--- gimple.h	(revision 148225)
+++ gimple.h	(working copy)
@@ -241,7 +241,7 @@ set_bb_seq (basic_block bb, gimple_seq s
 
 /* Iterator object for GIMPLE statement sequences.  */
 
-typedef struct
+typedef struct gimple_stmt_iterator_d
 {
   /* Sequence node holding the current statement.  */
   gimple_seq_node ptr;
Index: config/arc/predicates.md
===================================================================
--- config/arc/predicates.md	(revision 148226)
+++ config/arc/predicates.md	(working copy)
@@ -758,3 +758,18 @@ (define_special_predicate "immediate_usi
     (match_test "INTVAL (op) >= 0")
     (and (match_test "const_double_operand (op, mode)")
 	 (match_test "CONST_DOUBLE_HIGH (op) == 0"))))
+
+(define_predicate "simd_arg_vector"
+  (match_code "parallel")
+{
+  int i = XVECLEN (op, 0) - 1;
+
+  for (;i >= 0; i--)
+    {
+      rtx arg = XVECEXP (op, 0, i);
+	
+      if (!REG_P (arg) || REGNO (arg) < 66 || REGNO (arg) >= 66 + 8)
+	return false;
+    }
+  return true;
+})
Index: config/arc/arc.c
===================================================================
--- config/arc/arc.c	(revision 148226)
+++ config/arc/arc.c	(working copy)
@@ -58,6 +58,8 @@ along with GCC; see the file COPYING3.  
 #include "tm-constrs.h"
 #include "reload.h" /* For operands_match_p */
 #include "df.h"
+#include "gimple.h"
+#include "tree-flow.h"
 #include "multi-target.h"
 
 START_TARGET_SPECIFIC
@@ -181,131 +183,136 @@ enum arc_builtins {
   ARC_BUILTIN_TRAP_S     =   20,
   ARC_BUILTIN_UNIMP_S    =   21,
 
+  ARC_SIMD_BUILTIN_CALL,
   /* Sentinel to mark start of simd builtins */
-  ARC_SIMD_BUILTIN_BEGIN      = 1000,
+  ARC_SIMD_BUILTIN_BEGIN      = 100,
 
-  ARC_SIMD_BUILTIN_VADDAW     = 1001,
-  ARC_SIMD_BUILTIN_VADDW      = 1002,
-  ARC_SIMD_BUILTIN_VAVB       = 1003,
-  ARC_SIMD_BUILTIN_VAVRB      = 1004,
-  ARC_SIMD_BUILTIN_VDIFAW     = 1005,
-  ARC_SIMD_BUILTIN_VDIFW      = 1006,
-  ARC_SIMD_BUILTIN_VMAXAW     = 1007,
-  ARC_SIMD_BUILTIN_VMAXW      = 1008,
-  ARC_SIMD_BUILTIN_VMINAW     = 1009,
-  ARC_SIMD_BUILTIN_VMINW      = 1010,
-  ARC_SIMD_BUILTIN_VMULAW     = 1011,
-  ARC_SIMD_BUILTIN_VMULFAW    = 1012,
-  ARC_SIMD_BUILTIN_VMULFW     = 1013,
-  ARC_SIMD_BUILTIN_VMULW      = 1014,
-  ARC_SIMD_BUILTIN_VSUBAW     = 1015,
-  ARC_SIMD_BUILTIN_VSUBW      = 1016,
-  ARC_SIMD_BUILTIN_VSUMMW     = 1017,
-  ARC_SIMD_BUILTIN_VAND       = 1018,
-  ARC_SIMD_BUILTIN_VANDAW     = 1019,
-  ARC_SIMD_BUILTIN_VBIC       = 1020,
-  ARC_SIMD_BUILTIN_VBICAW     = 1021,
-  ARC_SIMD_BUILTIN_VOR        = 1022,
-  ARC_SIMD_BUILTIN_VXOR       = 1023,
-  ARC_SIMD_BUILTIN_VXORAW     = 1024,
-  ARC_SIMD_BUILTIN_VEQW       = 1025,
-  ARC_SIMD_BUILTIN_VLEW       = 1026,
-  ARC_SIMD_BUILTIN_VLTW       = 1027,
-  ARC_SIMD_BUILTIN_VNEW       = 1028,
-  ARC_SIMD_BUILTIN_VMR1AW     = 1029,
-  ARC_SIMD_BUILTIN_VMR1W      = 1030,
-  ARC_SIMD_BUILTIN_VMR2AW     = 1031,
-  ARC_SIMD_BUILTIN_VMR2W      = 1032,
-  ARC_SIMD_BUILTIN_VMR3AW     = 1033,
-  ARC_SIMD_BUILTIN_VMR3W      = 1034,
-  ARC_SIMD_BUILTIN_VMR4AW     = 1035,
-  ARC_SIMD_BUILTIN_VMR4W      = 1036,
-  ARC_SIMD_BUILTIN_VMR5AW     = 1037,
-  ARC_SIMD_BUILTIN_VMR5W      = 1038,
-  ARC_SIMD_BUILTIN_VMR6AW     = 1039,
-  ARC_SIMD_BUILTIN_VMR6W      = 1040,
-  ARC_SIMD_BUILTIN_VMR7AW     = 1041,
-  ARC_SIMD_BUILTIN_VMR7W      = 1042,
-  ARC_SIMD_BUILTIN_VMRB       = 1043,
-  ARC_SIMD_BUILTIN_VH264F     = 1044,
-  ARC_SIMD_BUILTIN_VH264FT    = 1045,
-  ARC_SIMD_BUILTIN_VH264FW    = 1046,
-  ARC_SIMD_BUILTIN_VVC1F      = 1047,
-  ARC_SIMD_BUILTIN_VVC1FT     = 1048,
+  ARC_SIMD_BUILTIN_VADDAW     = 101,
+  ARC_SIMD_BUILTIN_VADDW      = 102,
+  ARC_SIMD_BUILTIN_VAVB       = 103,
+  ARC_SIMD_BUILTIN_VAVRB      = 104,
+  ARC_SIMD_BUILTIN_VDIFAW     = 105,
+  ARC_SIMD_BUILTIN_VDIFW      = 106,
+  ARC_SIMD_BUILTIN_VMAXAW     = 107,
+  ARC_SIMD_BUILTIN_VMAXW      = 108,
+  ARC_SIMD_BUILTIN_VMINAW     = 109,
+  ARC_SIMD_BUILTIN_VMINW      = 110,
+  ARC_SIMD_BUILTIN_VMULAW     = 111,
+  ARC_SIMD_BUILTIN_VMULFAW    = 112,
+  ARC_SIMD_BUILTIN_VMULFW     = 113,
+  ARC_SIMD_BUILTIN_VMULW      = 114,
+  ARC_SIMD_BUILTIN_VSUBAW     = 115,
+  ARC_SIMD_BUILTIN_VSUBW      = 116,
+  ARC_SIMD_BUILTIN_VSUMMW     = 117,
+  ARC_SIMD_BUILTIN_VAND       = 118,
+  ARC_SIMD_BUILTIN_VANDAW     = 119,
+  ARC_SIMD_BUILTIN_VBIC       = 120,
+  ARC_SIMD_BUILTIN_VBICAW     = 121,
+  ARC_SIMD_BUILTIN_VOR        = 122,
+  ARC_SIMD_BUILTIN_VXOR       = 123,
+  ARC_SIMD_BUILTIN_VXORAW     = 124,
+  ARC_SIMD_BUILTIN_VEQW       = 125,
+  ARC_SIMD_BUILTIN_VLEW       = 126,
+  ARC_SIMD_BUILTIN_VLTW       = 127,
+  ARC_SIMD_BUILTIN_VNEW       = 128,
+  ARC_SIMD_BUILTIN_VMR1AW     = 129,
+  ARC_SIMD_BUILTIN_VMR1W      = 130,
+  ARC_SIMD_BUILTIN_VMR2AW     = 131,
+  ARC_SIMD_BUILTIN_VMR2W      = 132,
+  ARC_SIMD_BUILTIN_VMR3AW     = 133,
+  ARC_SIMD_BUILTIN_VMR3W      = 134,
+  ARC_SIMD_BUILTIN_VMR4AW     = 135,
+  ARC_SIMD_BUILTIN_VMR4W      = 136,
+  ARC_SIMD_BUILTIN_VMR5AW     = 137,
+  ARC_SIMD_BUILTIN_VMR5W      = 138,
+  ARC_SIMD_BUILTIN_VMR6AW     = 139,
+  ARC_SIMD_BUILTIN_VMR6W      = 140,
+  ARC_SIMD_BUILTIN_VMR7AW     = 141,
+  ARC_SIMD_BUILTIN_VMR7W      = 142,
+  ARC_SIMD_BUILTIN_VMRB       = 143,
+  ARC_SIMD_BUILTIN_VH264F     = 144,
+  ARC_SIMD_BUILTIN_VH264FT    = 145,
+  ARC_SIMD_BUILTIN_VH264FW    = 146,
+  ARC_SIMD_BUILTIN_VVC1F      = 147,
+  ARC_SIMD_BUILTIN_VVC1FT     = 148,
 
   /* Va, Vb, rlimm instructions */
-  ARC_SIMD_BUILTIN_VBADDW     = 1050,
-  ARC_SIMD_BUILTIN_VBMAXW     = 1051,
-  ARC_SIMD_BUILTIN_VBMINW     = 1052,
-  ARC_SIMD_BUILTIN_VBMULAW    = 1053,
-  ARC_SIMD_BUILTIN_VBMULFW    = 1054,
-  ARC_SIMD_BUILTIN_VBMULW     = 1055,
-  ARC_SIMD_BUILTIN_VBRSUBW    = 1056,
-  ARC_SIMD_BUILTIN_VBSUBW     = 1057,
+  ARC_SIMD_BUILTIN_VBADDW     = 150,
+  ARC_SIMD_BUILTIN_VBMAXW     = 151,
+  ARC_SIMD_BUILTIN_VBMINW     = 152,
+  ARC_SIMD_BUILTIN_VBMULAW    = 153,
+  ARC_SIMD_BUILTIN_VBMULFW    = 154,
+  ARC_SIMD_BUILTIN_VBMULW     = 155,
+  ARC_SIMD_BUILTIN_VBRSUBW    = 156,
+  ARC_SIMD_BUILTIN_VBSUBW     = 157,
 
   /* Va, Vb, Ic instructions */
-  ARC_SIMD_BUILTIN_VASRW      = 1060,
-  ARC_SIMD_BUILTIN_VSR8       = 1061,
-  ARC_SIMD_BUILTIN_VSR8AW     = 1062,
+  ARC_SIMD_BUILTIN_VASRW      = 160,
+  ARC_SIMD_BUILTIN_VSR8       = 161,
+  ARC_SIMD_BUILTIN_VSR8AW     = 162,
 
   /* Va, Vb, u6 instructions */
-  ARC_SIMD_BUILTIN_VASRRWi    = 1065,
-  ARC_SIMD_BUILTIN_VASRSRWi   = 1066,
-  ARC_SIMD_BUILTIN_VASRWi     = 1067,
-  ARC_SIMD_BUILTIN_VASRPWBi   = 1068,
-  ARC_SIMD_BUILTIN_VASRRPWBi  = 1069,
-  ARC_SIMD_BUILTIN_VSR8AWi    = 1070,
-  ARC_SIMD_BUILTIN_VSR8i      = 1071,
+  ARC_SIMD_BUILTIN_VASRRWi    = 165,
+  ARC_SIMD_BUILTIN_VASRSRWi   = 166,
+  ARC_SIMD_BUILTIN_VASRWi     = 167,
+  ARC_SIMD_BUILTIN_VASRPWBi   = 168,
+  ARC_SIMD_BUILTIN_VASRRPWBi  = 169,
+  ARC_SIMD_BUILTIN_VSR8AWi    = 170,
+  ARC_SIMD_BUILTIN_VSR8i      = 171,
 
   /* Va, Vb, u8 (simm) instructions*/
-  ARC_SIMD_BUILTIN_VMVAW      = 1075,
-  ARC_SIMD_BUILTIN_VMVW       = 1076,
-  ARC_SIMD_BUILTIN_VMVZW      = 1077,
-  ARC_SIMD_BUILTIN_VD6TAPF    = 1078,
+  ARC_SIMD_BUILTIN_VMVAW      = 175,
+  ARC_SIMD_BUILTIN_VMVW       = 176,
+  ARC_SIMD_BUILTIN_VMVZW      = 177,
+  ARC_SIMD_BUILTIN_VD6TAPF    = 178,
 
   /* Va, rlimm, u8 (simm) instructions*/
-  ARC_SIMD_BUILTIN_VMOVAW     = 1080,
-  ARC_SIMD_BUILTIN_VMOVW      = 1081,
-  ARC_SIMD_BUILTIN_VMOVZW     = 1082,
+  ARC_SIMD_BUILTIN_VMOVAW     = 180,
+  ARC_SIMD_BUILTIN_VMOVW      = 181,
+  ARC_SIMD_BUILTIN_VMOVZW     = 182,
 
   /* Va, Vb instructions */
-  ARC_SIMD_BUILTIN_VABSAW     = 1085,
-  ARC_SIMD_BUILTIN_VABSW      = 1086,
-  ARC_SIMD_BUILTIN_VADDSUW    = 1087,
-  ARC_SIMD_BUILTIN_VSIGNW     = 1088,
-  ARC_SIMD_BUILTIN_VEXCH1     = 1089,
-  ARC_SIMD_BUILTIN_VEXCH2     = 1090,
-  ARC_SIMD_BUILTIN_VEXCH4     = 1091,
-  ARC_SIMD_BUILTIN_VUPBAW     = 1092,
-  ARC_SIMD_BUILTIN_VUPBW      = 1093,
-  ARC_SIMD_BUILTIN_VUPSBAW    = 1094,
-  ARC_SIMD_BUILTIN_VUPSBW     = 1095,
-
-  ARC_SIMD_BUILTIN_VDIRUN     = 1100,
-  ARC_SIMD_BUILTIN_VDORUN     = 1101,
-  ARC_SIMD_BUILTIN_VDIWR      = 1102,
-  ARC_SIMD_BUILTIN_VDOWR      = 1103,
-
-  ARC_SIMD_BUILTIN_VREC       = 1105,
-  ARC_SIMD_BUILTIN_VRUN       = 1106,
-  ARC_SIMD_BUILTIN_VRECRUN    = 1107,
-  ARC_SIMD_BUILTIN_VENDREC    = 1108,
-
-  ARC_SIMD_BUILTIN_VLD32WH    = 1110,
-  ARC_SIMD_BUILTIN_VLD32WL    = 1111,
-  ARC_SIMD_BUILTIN_VLD64      = 1112,
-  ARC_SIMD_BUILTIN_VLD32      = 1113,
-  ARC_SIMD_BUILTIN_VLD64W     = 1114,
-  ARC_SIMD_BUILTIN_VLD128     = 1115,
-  ARC_SIMD_BUILTIN_VST128     = 1116,
-  ARC_SIMD_BUILTIN_VST64      = 1117,
+  ARC_SIMD_BUILTIN_VABSAW     = 185,
+  ARC_SIMD_BUILTIN_VABSW      = 186,
+  ARC_SIMD_BUILTIN_VADDSUW    = 187,
+  ARC_SIMD_BUILTIN_VSIGNW     = 188,
+  ARC_SIMD_BUILTIN_VEXCH1     = 189,
+  ARC_SIMD_BUILTIN_VEXCH2     = 190,
+  ARC_SIMD_BUILTIN_VEXCH4     = 191,
+  ARC_SIMD_BUILTIN_VUPBAW     = 192,
+  ARC_SIMD_BUILTIN_VUPBW      = 193,
+  ARC_SIMD_BUILTIN_VUPSBAW    = 194,
+  ARC_SIMD_BUILTIN_VUPSBW     = 195,
+
+  ARC_SIMD_BUILTIN_VDIRUN     = 200,
+  ARC_SIMD_BUILTIN_VDORUN     = 201,
+  ARC_SIMD_BUILTIN_VDIWR      = 202,
+  ARC_SIMD_BUILTIN_VDOWR      = 203,
+
+  ARC_SIMD_BUILTIN_VREC       = 205,
+  ARC_SIMD_BUILTIN_VRUN       = 206,
+  ARC_SIMD_BUILTIN_VRECRUN    = 207,
+  ARC_SIMD_BUILTIN_VENDREC    = 208,
+
+  ARC_SIMD_BUILTIN_VLD32WH    = 210,
+  ARC_SIMD_BUILTIN_VLD32WL    = 211,
+  ARC_SIMD_BUILTIN_VLD64      = 212,
+  ARC_SIMD_BUILTIN_VLD32      = 213,
+  ARC_SIMD_BUILTIN_VLD64W     = 214,
+  ARC_SIMD_BUILTIN_VLD128     = 215,
+  ARC_SIMD_BUILTIN_VST128     = 216,
+  ARC_SIMD_BUILTIN_VST64      = 217,
+
+  ARC_SIMD_BUILTIN_VST16_N    = 220,
+  ARC_SIMD_BUILTIN_VST32_N    = 221,
 
-  ARC_SIMD_BUILTIN_VST16_N    = 1120,
-  ARC_SIMD_BUILTIN_VST32_N    = 1121,
+  ARC_SIMD_BUILTIN_VINTI,
 
-  ARC_SIMD_BUILTIN_VINTI      = 1201,
+  ARC_SIMD_BUILTIN_DMA_IN,
+  ARC_SIMD_BUILTIN_DMA_OUT,
 
-  ARC_SIMD_BUILTIN_END
+  ARC_SIMD_BUILTIN_END,
+  ARC_BUILTIN_END = ARC_SIMD_BUILTIN_END
 };
 
 /* A nop is needed between a 4 byte insn that sets the condition codes and
@@ -401,6 +408,13 @@ static bool arc_preserve_reload_p (rtx i
 static rtx arc_delegitimize_address (rtx);
 static bool arc_can_follow_jump (const_rtx follower, const_rtx followee);
 
+static void arc_copy_to_target (gimple_stmt_iterator *, struct gcc_target *,
+				tree, tree, tree);
+static void arc_copy_from_target (gimple_stmt_iterator *, struct gcc_target *,
+				  tree, tree, tree);
+static void arc_build_call_on_target (gimple_stmt_iterator *,
+				      struct gcc_target *, int, tree *);
+
 static rtx frame_insn (rtx);
 
 /* initialize the GCC target structure.  */
@@ -517,6 +531,13 @@ static rtx frame_insn (rtx);
 #undef TARGET_MAX_ANCHOR_OFFSET
 #define TARGET_MAX_ANCHOR_OFFSET (1020)
 
+#undef TARGET_COPY_TO_TARGET
+#define TARGET_COPY_TO_TARGET arc_copy_to_target
+#undef TARGET_COPY_FROM_TARGET
+#define TARGET_COPY_FROM_TARGET arc_copy_from_target
+#undef TARGET_BUILD_CALL_ON_TARGET
+#define TARGET_BUILD_CALL_ON_TARGET arc_build_call_on_target
+
 extern enum reg_class arc_secondary_reload (bool, rtx, enum reg_class,
 					    enum machine_mode,
 					    struct secondary_reload_info *);
@@ -5544,12 +5565,16 @@ arc_cannot_force_const_mem (rtx x)
 }
 
 
+static tree arc_builtin_decls[ARC_BUILTIN_END];
+
 /* Generic function to define a builtin */
 #define def_mbuiltin(MASK, NAME, TYPE, CODE)				\
   do									\
     {									\
        if (MASK)                                                        \
-          add_builtin_function ((NAME), (TYPE), (CODE), BUILT_IN_MD, NULL, NULL_TREE); \
+	arc_builtin_decls[(CODE)] 					\
+	  = add_builtin_function ((NAME), (TYPE), (CODE), BUILT_IN_MD,	\
+				  NULL, NULL_TREE);			\
     }									\
   while (0)
 
@@ -5871,6 +5896,37 @@ arc_expand_builtin (tree exp,
 	emit_insn (gen_unimp_s (const1_rtx));
 	return NULL_RTX;
 
+    case ARC_SIMD_BUILTIN_CALL:
+      int nargs, i;
+
+      icode = CODE_FOR_simd_call;
+      arg0 = CALL_EXPR_ARG (exp, 0); /* Ra */
+      mode0 =  insn_data[icode].operand[0].mode;
+      op0 = expand_expr (arg0, NULL_RTX, mode0, EXPAND_NORMAL);
+      if (mode0 == VOIDmode)
+	mode0 = GET_MODE (op0);
+
+      if (! (*insn_data[icode].operand[0].predicate) (op0, mode0))
+	op0 = copy_to_mode_reg (mode0, op0);
+      nargs = call_expr_nargs (exp) - 1;
+      op1 = gen_rtx_PARALLEL (VOIDmode, rtvec_alloc (nargs));
+      for (i = 0; i < nargs; i++)
+	{
+	  rtx reg;
+
+	  arg0 = CALL_EXPR_ARG (exp, 1+i);
+	  op0 = expand_expr (arg0, NULL_RTX, VOIDmode, EXPAND_NORMAL);
+	  mode0 = GET_MODE (op0);
+	  if (mode0 == VOIDmode)
+	    mode0 = SImode;
+	  reg = gen_rtx_REG (mode0, 66+i);
+	  emit_move_insn (reg, op0);
+	  XVECEXP (op1, 0, i) = reg;
+	}
+
+      emit_insn (gen_simd_call (op0, op1));
+      return NULL_RTX;
+
     default:
 	break;
     }
@@ -6905,7 +6961,10 @@ enum simd_insn_args_type {
   void_Va_Ib_u8,
 
   Va_Vb_Ic_u8,
-  void_Va_u3_Ib_u8
+  void_Va_u3_Ib_u8,
+
+  void_Ra_Rb_Rc,
+  void_Ra
 };
 
 struct builtin_description
@@ -6914,8 +6973,6 @@ struct builtin_description
   const enum insn_code     icode;
   const char * const       name;
   const enum arc_builtins  code;
-  const enum rtx_code      comparison;
-  const unsigned int       flag;
 };
 
 static const struct builtin_description arc_simd_builtin_desc_list[] =
@@ -6923,7 +6980,7 @@ static const struct builtin_description 
   /* VVV builtins go first */
 #define SIMD_BUILTIN(type,code, string, builtin) \
   { type,CODE_FOR_##code, "__builtin_arc_" string, \
-    ARC_SIMD_BUILTIN_##builtin, UNKNOWN, 0 },
+    ARC_SIMD_BUILTIN_##builtin, },
 
   SIMD_BUILTIN (Va_Vb_Vc,    vaddaw_insn,   "vaddaw",     VADDAW)
   SIMD_BUILTIN (Va_Vb_Vc,     vaddw_insn,    "vaddw",      VADDW)
@@ -7051,6 +7108,10 @@ static const struct builtin_description 
   SIMD_BUILTIN (void_Va_u3_Ib_u8,  vst32_n_insn,  "vst32_n",   VST32_N)
 
   SIMD_BUILTIN (void_u6,  vinti_insn,  "vinti",   VINTI)
+
+  SIMD_BUILTIN (void_Ra_Rb_Rc,  simd_dma_in,  "simd_dma_in",   DMA_IN)
+  SIMD_BUILTIN (void_Ra_Rb_Rc,  simd_dma_out,  "simd_dma_out",   DMA_OUT)
+  SIMD_BUILTIN (void_Ra,  simd_call,  "simd_call",   CALL)
 };
 
 static void
@@ -7105,6 +7166,17 @@ arc_init_simd_builtins (void)
   tree v8hi_ftype_v8hi
     = build_function_type (V8HI_type_node, tree_cons (NULL_TREE, V8HI_type_node,endlink));
 
+  tree void_ftype_ptr_ptr_int
+    = build_function_type (void_type_node,
+			   tree_cons (NULL_TREE, ptr_type_node,
+				      tree_cons (NULL_TREE, ptr_type_node,
+						 tree_cons (NULL_TREE,
+							    integer_type_node,
+							    endlink))));
+  tree void_ftype_fn
+    = build_function_type (void_type_node,
+			   tree_cons (NULL_TREE, ptr_type_node, endlink));
+
   /* These asserts have been introduced to ensure that the order of builtins
      does not get messed up, else the initialization goes wrong */
   gcc_assert (arc_simd_builtin_desc_list [0].args_type == Va_Vb_Vc);
@@ -7167,6 +7239,16 @@ arc_init_simd_builtins (void)
   for (; arc_simd_builtin_desc_list [i].args_type == void_u6; i++)
     def_mbuiltin (TARGET_SIMD_SET, arc_simd_builtin_desc_list [i].name,  void_ftype_int, arc_simd_builtin_desc_list [i].code);
 
+  gcc_assert (arc_simd_builtin_desc_list [i].args_type == void_Ra_Rb_Rc);
+  for (; arc_simd_builtin_desc_list [i].args_type == void_Ra_Rb_Rc; i++)
+    def_mbuiltin (TARGET_SIMD_SET, arc_simd_builtin_desc_list[i].name,
+		  void_ftype_ptr_ptr_int, arc_simd_builtin_desc_list[i].code);
+
+  gcc_assert (arc_simd_builtin_desc_list [i].args_type == void_Ra);
+  for (; arc_simd_builtin_desc_list [i].args_type == void_Ra; i++)
+    def_mbuiltin (TARGET_SIMD_SET, arc_simd_builtin_desc_list[i].name,
+		  void_ftype_fn, arc_simd_builtin_desc_list[i].code);
+
   gcc_assert(i == ARRAY_SIZE (arc_simd_builtin_desc_list));
 }
 
@@ -7618,6 +7700,60 @@ arc_expand_simd_builtin (tree exp,
     emit_insn (pat);
     return NULL_RTX;
 
+  case void_Ra_Rb_Rc:
+    icode = d->icode;
+    arg0 = CALL_EXPR_ARG (exp, 0); /* Ra */
+    arg1 = CALL_EXPR_ARG (exp, 1); /* Rb */
+    arg2 = CALL_EXPR_ARG (exp, 2); /* Rc */
+
+    mode0 =  insn_data[icode].operand[0].mode;
+    mode1 =  insn_data[icode].operand[1].mode;
+    mode2 =  insn_data[icode].operand[2].mode;
+
+    op0 = expand_expr (arg0, NULL_RTX, mode0, EXPAND_NORMAL);
+    if (mode0 == VOIDmode)
+      mode0 = GET_MODE (op0);
+    op1 = expand_expr (arg1, NULL_RTX, mode1, EXPAND_NORMAL);
+    if (mode1 == VOIDmode)
+      mode1 = GET_MODE (op1);
+    op2 = expand_expr (arg2, NULL_RTX, mode2, EXPAND_NORMAL);
+    if (mode2 == VOIDmode)
+      mode2 = GET_MODE (op2);
+
+    if (! (*insn_data[icode].operand[0].predicate) (op0, mode0))
+      op0 = copy_to_mode_reg (mode0, op0);
+    if (! (*insn_data[icode].operand[1].predicate) (op1, mode1))
+      op1 = copy_to_mode_reg (mode1, op1);
+    if (! (*insn_data[icode].operand[2].predicate) (op2, mode2))
+      op2 = copy_to_mode_reg (mode2, op2);
+
+    pat = GEN_FCN (icode) (op0, op1, op2);
+    if (! pat)
+      return 0;
+  
+    emit_insn (pat);
+    return NULL_RTX;
+
+  case void_Ra:
+    icode = d->icode;
+    arg0 = CALL_EXPR_ARG (exp, 0); /* Ra */
+
+    mode0 =  insn_data[icode].operand[0].mode;
+
+    op0 = expand_expr (arg0, NULL_RTX, mode0, EXPAND_NORMAL);
+    if (mode0 == VOIDmode)
+      mode0 = GET_MODE (op0);
+
+    if (! (*insn_data[icode].operand[0].predicate) (op0, mode0))
+      op0 = copy_to_mode_reg (mode0, op0);
+
+    pat = GEN_FCN (icode) (op0);
+    if (! pat)
+      return 0;
+  
+    emit_insn (pat);
+    return NULL_RTX;
+
   default:
     gcc_unreachable ();
   }
@@ -8886,6 +9022,45 @@ arc_dead_or_set_postreload_p (const_rtx 
   return 1;
 }
 
+static void
+arc_copy_to_target (gimple_stmt_iterator *gsi, struct gcc_target *target,
+		    tree dst, tree src, tree size)
+{
+  tree fn, t;
+
+  gcc_assert (strcmp (target->name, "mxp-elf") == 0);
+  fn = build_fold_addr_expr (arc_builtin_decls[ARC_SIMD_BUILTIN_DMA_IN]);
+  t = build_call_nary (void_type_node, fn, 3, dst, src, size);
+  force_gimple_operand_gsi (gsi, t, true, NULL_TREE, false,
+			    GSI_CONTINUE_LINKING);
+}
+
+static void
+arc_copy_from_target (gimple_stmt_iterator *gsi, struct gcc_target *target,
+		      tree dst, tree src, tree size)
+{
+  tree fn, t;
+
+  gcc_assert (strcmp (target->name, "mxp-elf") == 0);
+  fn = build_fold_addr_expr (arc_builtin_decls[ARC_SIMD_BUILTIN_DMA_OUT]);
+  t = build_call_nary (void_type_node, fn, 3, dst, src, size);
+  force_gimple_operand_gsi (gsi, t, true, NULL_TREE, false,
+			    GSI_CONTINUE_LINKING);
+}
+
+static void
+arc_build_call_on_target (gimple_stmt_iterator *gsi, struct gcc_target *target,
+			  int nargs, tree *args)
+{
+  tree fn, t;
+
+  gcc_assert (strcmp (target->name, "mxp-elf") == 0);
+  fn = build_fold_addr_expr (arc_builtin_decls[ARC_SIMD_BUILTIN_CALL]);
+  t = build_call_array (void_type_node, fn, nargs, args);
+  force_gimple_operand_gsi (gsi, t, true, NULL_TREE, false,
+			    GSI_CONTINUE_LINKING);
+}
+
 #include "gt-arc.h"
 
 END_TARGET_SPECIFIC
Index: config/arc/arc.h
===================================================================
--- config/arc/arc.h	(revision 148226)
+++ config/arc/arc.h	(working copy)
@@ -467,8 +467,10 @@ if (GET_MODE_CLASS (MODE) == MODE_INT		\
 
 /* r63 is pc, r64-r127 = simd vregs, r128-r143 = simd dma config regs
    r144, r145 = lp_start, lp_end
-   and therefore the pseudo registers start from r146 */
-#define FIRST_PSEUDO_REGISTER 146
+   r146 = SDM (not really a register, but we pretend it is for dam_in / dma_out
+   patterns)
+   and therefore the pseudo registers start from r147 */
+#define FIRST_PSEUDO_REGISTER 147
 
 /* 1 for registers that have pervasive standard uses
    and are not available for the register allocator.
@@ -529,7 +531,7 @@ if (GET_MODE_CLASS (MODE) == MODE_INT		\
 				\
   0, 0, 0, 0, 0, 0, 0, 0,       \
   0, 0, 0, 0, 0, 0, 0, 0,	\
-  1, 1}
+  1, 1, 1}
 
 /* 1 for registers not available across function calls.
    These must include the FIXED_REGISTERS and also any
@@ -565,7 +567,7 @@ if (GET_MODE_CLASS (MODE) == MODE_INT		\
 				\
   0, 0, 0, 0, 0, 0, 0, 0,       \
   0, 0, 0, 0, 0, 0, 0, 0,	\
-  1, 1}
+  1, 1, 1}
 
 /* Macro to conditionally modify fixed_regs/call_used_regs.  */
 
@@ -1654,7 +1656,7 @@ extern char rname56[], rname57[], rname5
  "vr56", "vr57", "vr58", "vr59",     "vr60",   "vr61",   "vr62",  "vr63",	\
   "dr0",  "dr1",  "dr2",  "dr3",      "dr4",    "dr5",    "dr6",   "dr7",	\
   "dr0",  "dr1",  "dr2",  "dr3",      "dr4",    "dr5",    "dr6",   "dr7",	\
-  "lp_start", "lp_end" \
+  "lp_start", "lp_end", "SDM" \
 }
 
 /* Entry to the insn conditionalizer.  */
Index: config/arc/arc.md
===================================================================
--- config/arc/arc.md	(revision 148226)
+++ config/arc/arc.md	(working copy)
@@ -145,6 +145,7 @@ (define_constants
    (CC_REG 61)
    (LP_START 144)
    (LP_END 145)
+   (SDM 146)
   ]
 )
 
@@ -667,8 +668,8 @@ (define_expand "movhi"
   "if (prepare_move_operands (operands, HImode)) DONE;")
 
 (define_insn "*movhi_insn"
-  [(set (match_operand:HI 0 "move_dest_operand" "=Rcq,Rcq#q,w, w,w,???w,Rcq#q,w,Rcq,S,r,m,???m,VUsc")
-	(match_operand:HI 1 "move_src_operand"   "cL,cP,Rcq#q,cL,I,?Rac,  ?i,?i,T,Rcq,m,c,?Rac,i"))]
+  [(set (match_operand:HI 0 "move_dest_operand" "=Rcq,Rcq#q,w, w,w,???w,Rcq#q,w,Rcq,S,r,m,???m,VUsc,v")
+	(match_operand:HI 1 "move_src_operand"   "cL,cP,Rcq#q,cL,I,?Rac,  ?i,?i,T,Rcq,m,c,?Rac,i,c"))]
   "register_operand (operands[0], HImode)
    || register_operand (operands[1], HImode)
    || (CONSTANT_P (operands[1])
@@ -690,10 +691,11 @@ (define_insn "*movhi_insn"
    ldw%U1%V1 %0,%1
    stw%U0%V0 %1,%0
    stw%U0%V0 %1,%0
-   stw%U0%V0 %S1,%0"
-  [(set_attr "type" "move,move,move,move,move,move,move,move,load,store,load,store,store,store")
-   (set_attr "iscompact" "maybe,maybe,maybe,false,false,false,maybe_limm,false,true,true,false,false,false,false")
-   (set_attr "cond" "canuse,canuse_limm,canuse,canuse,canuse_limm,canuse,canuse,canuse,nocond,nocond,nocond,nocond,nocond,nocond")])
+   stw%U0%V0 %S1,%0
+   vmovw %0,%1,1"
+  [(set_attr "type" "move,move,move,move,move,move,move,move,load,store,load,store,store,store,move")
+   (set_attr "iscompact" "maybe,maybe,maybe,false,false,false,maybe_limm,false,true,true,false,false,false,false,false")
+   (set_attr "cond" "canuse,canuse_limm,canuse,canuse,canuse_limm,canuse,canuse,canuse,nocond,nocond,nocond,nocond,nocond,nocond,nocond")])
 
 (define_expand "movsi"
   [(set (match_operand:SI 0 "move_dest_operand" "")
Index: config/arc/t-arc
===================================================================
--- config/arc/t-arc	(revision 148226)
+++ config/arc/t-arc	(working copy)
@@ -84,6 +84,6 @@ $(T)profil-uclibc.o: $(srcdir)/config/ar
 $(T)libgmon.a: $(T)mcount.o $(T)gmon.o $(T)dcache_linesz.o $(PROFILE_OSDEP)
 	$(AR_CREATE_FOR_TARGET) $@ $^
 
-$(out_object_file): gt-arc.h
+$(out_object_file): gt-arc.h $(GIMPLE_H) $(TREE_FLOW_H)
 
 EXTRA_MULTILIB_PARTS = crtend.o crtbegin.o crtendS.o crtbeginS.o crti.o crtn.o libgmon.a crtg.o crtgend.o
Index: config/arc/arc-modes.def
===================================================================
--- config/arc/arc-modes.def	(revision 148226)
+++ config/arc/arc-modes.def	(working copy)
@@ -28,6 +28,7 @@ CC_MODE (CC_FP_GE);
 CC_MODE (CC_FP_ORD);
 CC_MODE (CC_FP_UNEQ);
 CC_MODE (CC_FPX);
+CC_MODE (CC_BLK); /* BLKmode is not tracked by data flow...  */
 
 /* Vector modes.  */
 VECTOR_MODES (INT, 4);        /*            V4QI V2HI */
Index: config/arc/simdext.md
===================================================================
--- config/arc/simdext.md	(revision 148226)
+++ config/arc/simdext.md	(working copy)
@@ -131,6 +131,8 @@ (define_constants
 
   (UNSPEC_ARC_SIMD_VCAST     1200)
   (UNSPEC_ARC_SIMD_VINTI     1201)
+
+  (UNSPEC_ARC_SIMD_DMA       1202)
    ]
 )
 
@@ -1311,3 +1313,41 @@ (define_insn "vinti_insn"
   [(set_attr "type" "simd_vcontrol")
    (set_attr "length" "4")
    (set_attr "cond" "nocond")])
+
+;; DMA in/out for ARCompact / mxp interworking
+;; These are emitted on the ARCompact side.
+
+;; copy main memory starting at operand 0 to SDM starting at operand 1;
+;; transfer size is operand 2.
+(define_insn "simd_dma_in"
+  [(set (reg:CC_BLK SDM)
+	(unspec [(reg:CC_BLK SDM)
+		 (mem:BLK (match_operand:SI 1 "nonmemory_operand"))
+		 (match_operand 0 "nonmemory_operand")
+		 (match_operand:SI 2 "nonmemory_operand")]
+	 UNSPEC_ARC_SIMD_DMA))]
+  "TARGET_SIMD_SET"
+  "` dma_in %0 %1 %2"
+  [(set_attr "length" "42")])
+
+;; copy SDM starting at operand 0 to main memory starting at operand 1;
+;; transfer size is operand 2.
+(define_insn "simd_dma_out"
+  [(set (mem:BLK (match_operand:SI 1 "nonmemory_operand"))
+	(unspec [(reg:CC_BLK SDM)
+		 (match_operand 0 "nonmemory_operand")
+		 (match_operand:SI 2 "nonmemory_operand")]
+	 UNSPEC_ARC_SIMD_DMA))]
+  "TARGET_SIMD_SET"
+  "` dma_out %0 %1 %2"
+  [(set_attr "length" "42")])
+
+(define_insn "simd_call"
+  [(set (reg:CC_BLK SDM)
+	(unspec [(match_operand 0 "nonmemory_operand")
+		 (match_operand 1 "simd_arg_vector")
+		 (reg:CC_BLK SDM)]
+	 UNSPEC_ARC_SIMD_DMA))]
+  "TARGET_SIMD_SET"
+  "` simd call"
+  [(set_attr "length" "42")])
-------------- next part --------------
A non-text attachment was scrubbed...
Name: vloop.c
Type: text/x-csrc
Size: 188 bytes
Desc: not available
URL: <http://gcc.gnu.org/pipermail/gcc-patches/attachments/20090716/73e1f318/attachment.bin>
-------------- next part --------------
	.file	"vloop.c"
	.cpu A5
	.section .text
	.align 4
	.global	f
	.type	f, @function
f:
.LFB0:
	push_s blink
.LCFI0:
	mov r0,1542
.LCFI1:
	bl.d @__simd_malloc;1
	sub_s sp,sp,8
	extw r2,r0
	mov_s r5,r2
	add r5,r5,518
	` dma_in r5 @c 512
	mov_s r5,r0
	add r5,r5,518
	add_s r3,sp,8
	stw r5,[sp,6]
	mov_s r4,r0
	vmovw vr2,r0,1
	add r6,r2,6
	add r5,r0,6
	add r2,r2,1030
	add r0,r0,1030
	` dma_in r2 @b 512
	stw.a r0,[r3,-6]
	stw r5,[sp,4]
	ld.a blink,[sp,8]
.LCFI2:
	` dma_in r4 r3 6
	` simd call
	` dma_out r6 @a 512
.LCFI3:
	j_s.d [blink]
	add_s sp,sp,4
.LFE0:
	.size	f, .-f
	.arch	"mxp-elf"
	.text
	.balign 4
	.type	&f._loopfn.0, @function
&f._loopfn.0:
	viv.1 i1,vr2
	vmvw.3 vr5,vr62
	vld16_2 vr5,[i1,0]
	vld16_3 vr4,[i1,4]
	vld16_2 vr4,[i1,2]
	vmov.3 vr7,512
	vmvw.2 vr4,vr62
.L8:
	vmvw.1 vr4,vr5
	vxsumwi.1 vr3,vr4,8
	vxsumwi.1 vr2,vr5,4
	vmvw.2 vr3,vr4
	vmvw.2 vr2,vr4
	vaddnaw.3 vr3,vr4,vr3
	vaddnaw.3 vr2,vr4,vr2
	viv.1 i2,vr3
	viv.1 i1,vr2
	vld128 vr3,[i2,0]
	vld128 vr2,[i1,0]
	vxsumwi.1 vr6,vr4,4
	vmvw.2 vr6,vr4
	vmov.3 vr8,16
	vaddnaw.3 vr6,vr4,vr6
	vaddnaw.3 vr5,vr5,vr8
	vne.2 vr0,vr5,vr7
	vjp.i1 @.L8
	vaddnaw.255 vr2,vr3,vr2
	viv.1 i2,vr6
	vst128 vr2,[i2,0]
	vjb vr31,pcl
	vnop
	vnop
	vnop
	.size	&f._loopfn.0, .-&f._loopfn.0
	.global	a
	.section .bss
	.align 128
	.type	a, @object
	.size	a, 512
a:
	.zero	512
	.global	b
	.align 128
	.type	b, @object
	.size	b, 512
b:
	.zero	512
	.global	c
	.align 128
	.type	c, @object
	.size	c, 512
c:
	.zero	512
	.section	.debug_frame,"",@progbits
.Lframe0:
	.4byte	@.LECIE0-@.LSCIE0
.LSCIE0:
	.4byte	0xffffffff
	.byte	0x1
	.string	""
	.uleb128 0x1
	.sleb128 -4
	.byte	0x1f
	.byte	0xc
	.uleb128 0x1c
	.uleb128 0x0
	.align 4
.LECIE0:
.LSFDE0:
	.4byte	@.LEFDE0-@.LASFDE0
.LASFDE0:
	.4byte	@.Lframe0
	.4byte	@.LFB0
	.4byte	@.LFE0-@.LFB0
	.byte	0x4
	.4byte	@.LCFI0-@.LFB0
	.byte	0xe
	.uleb128 0x4
	.byte	0x4
	.4byte	@.LCFI1-@.LCFI0
	.byte	0xe
	.uleb128 0xc
	.byte	0x11
	.uleb128 0x1f
	.sleb128 1
	.byte	0x4
	.4byte	@.LCFI3-@.LCFI1
	.byte	0xe
	.uleb128 0x8
	.align 4
.LEFDE0:
	.ident	"GCC: (GNU) 4.4.0"


More information about the Gcc-patches mailing list