[PATCH][mpost]: automatic multi-target compilation
Joern Rennecke
amylaar@spamcop.net
Thu Jul 16 12:45:00 GMT 2009
This patch allows to vectorize loops for a different target than the main
compilation target, and automatically initiate DMA to copy the input arrays
to the vector target and DMA the results back.
I have tested this with a simple test file vloop.c, which is the
second attachment to this email.
I've configured the compiler on gcc11.fsffrance.org with the options:
--target=arc-elf32 --with-extra-target-list=mxp-elf --with-headers
--with-newlib --with-mpfr=/opt/cfarm/mpfr-2.4.1
The purpose of the build was not to get a fully working toolchain yet,
but just
a working cc1.
I've compiled the test file with:
./cc1 -O2 -ftree-vectorize vloop.c -fdump-tree-all
-ftree-vectorizer-verbose=9 -msimd
I've attached the compiler output vloop.s as the third attachement to this
email.
The assembler output templates for the dma in / out and target call
are just assembler comments so far; I gather this part is irrelevant for the
review of the tree optimizer patches, or if you want to try to make this work
for other target tuple sets, and I don't want to delay the patch review
unnecessarily any further.
There is certainly still a lot of work to be done to make this truly useful
for common application code, but that goes beyond the scope of the
current milepost project, and I think such work is best done incrementally
on a branch.
-------------- next part --------------
2009-07-16 J"orn Rennecke <joern.rennecke@arc.com>
* targhooks.c (default_common_data_with_target): New function.
(default_get_pmode): Likewise.
* targhooks.h (default_common_data_with_target): Declare.
(default_get_pmode): Likewise.
* tree.c (build2_stat): Use targetm.sizetype.
(build_pointer_type_for_mode): Use *targetm.ptr_mode.
(get_get_name_decl): New function.
* tree.h (enum omp_clause_schedule_kind): New value
OMP_CLAUSE_SCHEDULE_MASTER.
(tree sizetype_tab): Now target specific.
(get_get_name_decl): Declare.
(lookup_attr_target): Declare.
* target.h (struct gimple_stmt_iterator_d): Forward delcaration.
(struct gcc_target): New members get_pmode, sizetype_tab,
common_data_with_target, copy_to_target, copy_from_target,
build_call_on_target.
* omp-low.c (expand_parallel_call): If child function has target_arch
attribute, use targetm.build_call_on_target hook.
(expand_omp_taskreg): Also check for
gimple_omp_taskreg_data_arg (entry_stmt) being an INDIRECT_REF.
(expand_numa_for_static_nochunk): New function.
(expand_omp_for): Check for OMP_CLAUSE_SCHEDULE_MASTER.
* toplev.c (lang_dependent_init) [!EXTRA_TARGET]:
Do an EXTRA_TARGETS_CALL of initialize_sizetypes.
(lang_dependent_init) [EXTRA_TARGET]: Fix up size_type_node.
* tree-ssa-loop-ivopts.c (produce_memory_decl_rtl):
Use *targetm.get_pmode.
(computation_cost): Use tree_expand_expr.
(force_expr_to_var_cost): Use targetm.sizetype.
(rewrite_use_address ): Use tree_create_mem_ref.
* expr.c [!EXTRA_TARGET] (tree_expand_expr): New function.
expr.h (tree_expand_expr): Declare.
* tree-parloops.c (separate_decls_in_region_name): New parameter
new_target. Changed all callers.
(separate_decls_in_region_stmt): Likewise.
(add_size_for_param_array): New function.
(struct clsn_data): New members result_seq and loop.
(create_loads_and_stores_for_name): If array contents have to be
copied, insert statements to copy to/from the callee target.
(separate_decls_in_region): Likewise. Emit statements to allocate
parameter array area for this purpose.
Change last parameter from usinged to loop. Changed caller.
(canonicalize_loop_ivs): Use sizetype for the callee target.
(create_parallel_loop): If target has data memory separate from
caller, use OMP_CLAUSE_SCHEDULE_MASTER.
(gen_parallel_loop): Set targetm_pnt to the callee target during
the canonicalize_loop_ivs call.
* tree-ssa-address.c (target.h): Include.
[!EXTRA_TARGET] (tree_mem_ref_addr, tree_create_mem_ref): New functions.
* function.c (lookup_attr_target): New function, broken out of:
(allocate_struct_function).
* tree-affine.c (target.h): Include.
(add_elt_to_tree): Use targetm.sizetype.
(aff_combination_to_tree): Likewise.
* target-def.h (TARGET_GET_PMODE): Define.
(TARGET_COMMON_DATA_WITH_TARGET, TARGET_COPY_TO_TARGET): Likewise.
(TARGET_COPY_FROM_TARGET, TARGET_BUILD_CALL_ON_TARGET): Likewise.
(TARGET_INITIALIZER): Initialize new members.
* tree-vect-transform.c (vect_decompose_addr_base_for_vector_ref):
New function.
(param_array_hash, param_array_eq): Likewise.
(vect_create_data_ref_ptr): If target has data memory separate from
caller, create hash table of parameter arrays with information on
accesses.
* cfgloop.h (struct tree_range, struct param_array_d): New struct.
(param_array): New typedef.
(struct loop): New members param_arrays, vect_vars.
* tree-flow.h (tree_create_mem_ref): Declare.
* gimple.h (struct gimple_stmt_iterator_d): New struct tag.
* Makefile.in (tree-ssa-address.o): Depend on $(TARGET_H).
(tree-affine.o): Likewise.
* config/arc/predicates.md (simd_arg_vector): New predicate.
* config/arc/arc.c (gimple.h, tree-flow.h): Include.
(enum arc_builtins): Reduce value range.
New values ARC_SIMD_BUILTIN_CALL, ARC_SIMD_BUILTIN_DMA_IN,
ARC_SIMD_BUILTIN_DMA_OUT.
New tag ARC_BUILTIN_END.
(arc_copy_to_target, arc_copy_from_target): New functions.
(arc_build_call_on_target): New function.
(TARGET_COPY_TO_TARGET, TARGET_COPY_FROM_TARGET): Override.
(TARGET_BUILD_CALL_ON_TARGET): Likewise.
(arc_builtin_decls): New array.
(def_mbuiltin): Update arc_builtin_decls.
(arc_expand_builtin): Handle ARC_SIMD_BUILTIN_CALL.
(enum simd_insn_args_type): Add void_Ra_Rb_Rc and void_Ra.
(arc_simd_builtin_desc_list): Add simd_dma_in, simd_dma_out, simd_call.
(arc_init_simd_builtins): Process void_Ra_Rb_Rc and void_Ra.
(arc_expand_simd_builtin): Handle void_Ra_Rb_Rc and void_Ra.
* config/arc/arc.h: (FIRST_PSEUDO_REGISTER): Change to 147.
(FIXED_REGISTERS): SDM is fixed.
(CALL_USED_REGISTERS): SDM is call used.
(REGISTER_NAMES): Add SDM name.
* config/arc/arc.md (SDM): Define as 146.
(*movhi_insn): Add v/c alternative.
* config/arc/t-arc ($(out_object_file)):
Depend on $(GIMPLE_H) and $(TREE_FLOW_H).
* config/arc/arc-modes.def: Add CC_BLK.
* config/arc/simdext.md (UNSPEC_ARC_SIMD_DMA): Define.
(simd_dma_in, simd_dma_out, simd_call): New patterns.
Index: targhooks.c
===================================================================
--- targhooks.c (revision 148225)
+++ targhooks.c (working copy)
@@ -874,6 +874,18 @@ default_vectype_for_scalar_type (tree sc
return vectype;
}
+bool
+default_common_data_with_target (struct gcc_target *other)
+{
+ return &this_targetm == other;
+}
+
+enum machine_mode
+default_get_pmode (void)
+{
+ return Pmode;
+}
+
#include "gt-targhooks.h"
END_TARGET_SPECIFIC
Index: targhooks.h
===================================================================
--- targhooks.h (revision 148225)
+++ targhooks.h (working copy)
@@ -119,6 +119,8 @@ extern int /*enum reg_class*/ default_se
secondary_reload_info *);
extern bool default_override_options (bool);
extern tree default_vectype_for_scalar_type (tree, FILE *);
+extern bool default_common_data_with_target (struct gcc_target *);
+extern enum machine_mode default_get_pmode (void);
END_TARGET_SPECIFIC
extern void hook_void_bitmap (bitmap);
extern bool default_handle_c_option (size_t, const char *, int);
Index: tree.c
===================================================================
--- tree.c (revision 148225)
+++ tree.c (working copy)
@@ -3300,7 +3300,8 @@ build2_stat (enum tree_code code, tree t
if (code == POINTER_PLUS_EXPR && arg0 && arg1 && tt)
gcc_assert (POINTER_TYPE_P (tt) && POINTER_TYPE_P (TREE_TYPE (arg0))
&& INTEGRAL_TYPE_P (TREE_TYPE (arg1))
- && useless_type_conversion_p (sizetype, TREE_TYPE (arg1)));
+ && useless_type_conversion_p (targetm.sizetype,
+ TREE_TYPE (arg1)));
t = make_node_stat (code PASS_MEM_STAT);
TREE_TYPE (t) = tt;
@@ -5547,7 +5548,7 @@ build_pointer_type_for_mode (tree to_typ
tree
build_pointer_type (tree to_type)
{
- return build_pointer_type_for_mode (to_type, ptr_mode, false);
+ return build_pointer_type_for_mode (to_type, *targetm.ptr_mode, false);
}
/* Same as build_pointer_type_for_mode, but for REFERENCE_TYPE. */
@@ -8984,6 +8985,28 @@ get_name (tree t)
}
}
+/* Return the declaration belonging to the return value of decl_name. */
+tree
+get_get_name_decl (tree t)
+{
+ tree stripped_decl;
+
+ stripped_decl = t;
+ STRIP_NOPS (stripped_decl);
+ if (DECL_P (stripped_decl) && DECL_NAME (stripped_decl))
+ return stripped_decl;
+ else
+ {
+ switch (TREE_CODE (stripped_decl))
+ {
+ case ADDR_EXPR:
+ return get_get_name_decl (TREE_OPERAND (stripped_decl, 0));
+ default:
+ return NULL_TREE;
+ }
+ }
+}
+
/* Return true if TYPE has a variable argument list. */
bool
Index: tree.h
===================================================================
--- tree.h (revision 148225)
+++ tree.h (working copy)
@@ -1807,7 +1807,9 @@ enum omp_clause_schedule_kind
OMP_CLAUSE_SCHEDULE_DYNAMIC,
OMP_CLAUSE_SCHEDULE_GUIDED,
OMP_CLAUSE_SCHEDULE_AUTO,
- OMP_CLAUSE_SCHEDULE_RUNTIME
+ OMP_CLAUSE_SCHEDULE_RUNTIME,
+ /* Used internally for NUMA targets to schedule on the main processor. */
+ OMP_CLAUSE_SCHEDULE_MASTER
};
#define OMP_CLAUSE_SCHEDULE_KIND(NODE) \
@@ -4340,7 +4342,9 @@ enum size_type_kind
SBITSIZETYPE, /* Signed representation of sizes in bits. */
TYPE_KIND_LAST};
+START_TARGET_SPECIFIC
extern GTY(()) tree sizetype_tab[(int) TYPE_KIND_LAST];
+END_TARGET_SPECIFIC
#define sizetype sizetype_tab[(int) SIZETYPE]
#define bitsizetype sizetype_tab[(int) BITSIZETYPE]
@@ -4717,6 +4721,7 @@ extern tree *call_expr_argp (tree, int);
extern tree call_expr_arglist (tree);
extern tree create_artificial_label (void);
extern const char *get_name (tree);
+extern tree get_get_name_decl (tree);
extern bool stdarg_p (tree);
extern bool prototype_p (tree);
extern int function_args_count (tree);
@@ -4980,6 +4985,7 @@ extern void expand_dummy_function_end (v
extern unsigned int init_function_for_compilation (void);
END_TARGET_SPECIFIC
/* Allocate_struct_function uses targetm->name. */
+extern int lookup_attr_target (tree);
extern void allocate_struct_function (tree, bool);
START_TARGET_SPECIFIC
extern void push_struct_function (tree fndecl);
Index: target.h
===================================================================
--- target.h (revision 148225)
+++ target.h (working copy)
@@ -688,6 +688,8 @@ struct target_option_hooks
bool (*override) (bool main_target);
};
+struct gimple_stmt_iterator_d;
+
/* ??? the use of the target vector makes it necessary to cast
target-specific enums from/to int, since we expose the function
signatures of target specific hooks that operate e.g. on enum reg_class
@@ -705,6 +707,11 @@ struct gcc_target
/* Points to the ptr_mode variable for this target. */
enum machine_mode *ptr_mode;
+ enum machine_mode (*get_pmode) (void);
+
+ /* The sizetype table for this target. */
+ tree *sizetype_tab;
+
/* Functions that output assembler for the target. */
struct asm_out asm_out;
@@ -884,6 +891,21 @@ struct gcc_target
/* Undo the effects of encode_section_info on the symbol string. */
const char * (* strip_name_encoding) (const char *);
+ /* Say if the target OTHER shares its data memory with this target. */
+ bool (*common_data_with_target) (struct gcc_target *other);
+ /* Emit gimple to copy SIZE bytes from SRC on this target to DEST on
+ TARGET. */
+ void (*copy_to_target) (struct gimple_stmt_iterator_d *,
+ struct gcc_target *, tree, tree, tree);
+ /* Emit gimple to copy SIZE bytes from SRC on TARGET to DEST on this
+ target. */
+ void (*copy_from_target) (struct gimple_stmt_iterator_d *,
+ struct gcc_target *, tree, tree, tree);
+ /* Generate gimple for a call to fn with NARGS arguments ARGS
+ on target OTHER. */
+ void (*build_call_on_target) (struct gimple_stmt_iterator_d *,
+ struct gcc_target *, int nargs, tree *args);
+
/* If shift optabs for MODE are known to always truncate the shift count,
return the mask that they apply. Return 0 otherwise. */
unsigned HOST_WIDE_INT (* shift_truncation_mask) (enum machine_mode mode);
Index: omp-low.c
===================================================================
--- omp-low.c (revision 148225)
+++ omp-low.c (working copy)
@@ -2867,6 +2867,7 @@ expand_parallel_call (struct omp_region
gimple_stmt_iterator gsi;
gimple stmt;
int start_ix;
+ tree child_fn, attr;
clauses = gimple_omp_parallel_clauses (entry_stmt);
@@ -2989,7 +2990,21 @@ expand_parallel_call (struct omp_region
t1 = null_pointer_node;
else
t1 = build_fold_addr_expr (t);
- t2 = build_fold_addr_expr (gimple_omp_parallel_child_fn (entry_stmt));
+ child_fn = gimple_omp_parallel_child_fn (entry_stmt);
+ t2 = build_fold_addr_expr (child_fn);
+
+ attr = lookup_attribute ("target_arch", DECL_ATTRIBUTES (child_fn));
+ if (attr)
+ {
+ tree args[2];
+
+ args[0] = t2;
+ args[1] = force_gimple_operand_gsi (&gsi, t1, true, NULL_TREE, false,
+ GSI_CONTINUE_LINKING);
+ struct gcc_target *tgt = targetm_array[lookup_attr_target (child_fn)];
+ targetm.build_call_on_target (&gsi, tgt, 2, args);
+ return;
+ }
if (ws_args)
{
@@ -3004,12 +3019,7 @@ expand_parallel_call (struct omp_region
force_gimple_operand_gsi (&gsi, t, true, NULL_TREE,
false, GSI_CONTINUE_LINKING);
- t = gimple_omp_parallel_data_arg (entry_stmt);
- if (t == NULL)
- t = null_pointer_node;
- else
- t = build_fold_addr_expr (t);
- t = build_call_expr (gimple_omp_parallel_child_fn (entry_stmt), 1, t);
+ t = build_call_expr (child_fn, 1, t1);
force_gimple_operand_gsi (&gsi, t, true, NULL_TREE,
false, GSI_CONTINUE_LINKING);
@@ -3344,7 +3354,9 @@ expand_omp_taskreg (struct omp_region *r
a function call that has been inlined, the original PARM_DECL
.OMP_DATA_I may have been converted into a different local
variable. In which case, we need to keep the assignment. */
- if (gimple_omp_taskreg_data_arg (entry_stmt))
+ tree data_arg = gimple_omp_taskreg_data_arg (entry_stmt);
+
+ if (data_arg)
{
basic_block entry_succ_bb = single_succ (entry_bb);
gimple_stmt_iterator gsi;
@@ -3367,9 +3379,10 @@ expand_omp_taskreg (struct omp_region *r
/* We're ignore the subcode because we're
effectively doing a STRIP_NOPS. */
- if (TREE_CODE (arg) == ADDR_EXPR
- && TREE_OPERAND (arg, 0)
- == gimple_omp_taskreg_data_arg (entry_stmt))
+ if ((TREE_CODE (arg) == ADDR_EXPR
+ && TREE_OPERAND (arg, 0) == data_arg)
+ || (TREE_CODE (data_arg) == INDIRECT_REF
+ && TREE_OPERAND (data_arg, 0) == arg))
{
parcopy_stmt = stmt;
break;
@@ -4202,6 +4215,170 @@ expand_omp_for_static_nochunk (struct om
recompute_dominator (CDI_DOMINATORS, fin_bb));
}
+/* Like expand_omp_for_static_nochunk, but don't emit code for iteration
+ space partitioning - that is supposed to be done on the main processor. */
+static void
+expand_numa_for_static_nochunk (struct omp_region *region,
+ struct omp_for_data *fd)
+{
+ tree n, q, s0, e0, e, t, nthreads, threadid;
+ tree type, itype, vmain, vback;
+ basic_block entry_bb, exit_bb, seq_start_bb, body_bb, cont_bb;
+ basic_block fin_bb;
+ gimple_stmt_iterator gsi;
+ gimple stmt;
+
+ itype = type = TREE_TYPE (fd->loop.v);
+ if (POINTER_TYPE_P (type))
+ itype = lang_hooks.types.type_for_size (TYPE_PRECISION (type), 0);
+
+ entry_bb = region->entry;
+ cont_bb = region->cont;
+ gcc_assert (EDGE_COUNT (entry_bb->succs) == 2);
+ gcc_assert (BRANCH_EDGE (entry_bb)->dest == FALLTHRU_EDGE (cont_bb)->dest);
+ seq_start_bb = split_edge (FALLTHRU_EDGE (entry_bb));
+ body_bb = single_succ (seq_start_bb);
+ gcc_assert (BRANCH_EDGE (cont_bb)->dest == body_bb);
+ gcc_assert (EDGE_COUNT (cont_bb->succs) == 2);
+ fin_bb = FALLTHRU_EDGE (cont_bb)->dest;
+ exit_bb = region->exit;
+
+ /* Iteration space partitioning goes in ENTRY_BB. */
+ gsi = gsi_last_bb (entry_bb);
+ gcc_assert (gimple_code (gsi_stmt (gsi)) == GIMPLE_OMP_FOR);
+
+#if 0
+ t = build_call_expr (built_in_decls[BUILT_IN_OMP_GET_NUM_THREADS], 0);
+#else
+ t = size_one_node;
+#endif
+ t = fold_convert (itype, t);
+ nthreads = force_gimple_operand_gsi (&gsi, t, true, NULL_TREE,
+ true, GSI_SAME_STMT);
+
+#if 0
+ t = build_call_expr (built_in_decls[BUILT_IN_OMP_GET_THREAD_NUM], 0);
+#else
+ t = size_zero_node;
+#endif
+ t = fold_convert (itype, t);
+ threadid = force_gimple_operand_gsi (&gsi, t, true, NULL_TREE,
+ true, GSI_SAME_STMT);
+
+ fd->loop.n1
+ = force_gimple_operand_gsi (&gsi, fold_convert (type, fd->loop.n1),
+ true, NULL_TREE, true, GSI_SAME_STMT);
+ fd->loop.n2
+ = force_gimple_operand_gsi (&gsi, fold_convert (itype, fd->loop.n2),
+ true, NULL_TREE, true, GSI_SAME_STMT);
+ fd->loop.step
+ = force_gimple_operand_gsi (&gsi, fold_convert (itype, fd->loop.step),
+ true, NULL_TREE, true, GSI_SAME_STMT);
+
+ t = build_int_cst (itype, (fd->loop.cond_code == LT_EXPR ? -1 : 1));
+ t = fold_build2 (PLUS_EXPR, itype, fd->loop.step, t);
+ t = fold_build2 (PLUS_EXPR, itype, t, fd->loop.n2);
+ t = fold_build2 (MINUS_EXPR, itype, t, fold_convert (itype, fd->loop.n1));
+ if (TYPE_UNSIGNED (itype) && fd->loop.cond_code == GT_EXPR)
+ t = fold_build2 (TRUNC_DIV_EXPR, itype,
+ fold_build1 (NEGATE_EXPR, itype, t),
+ fold_build1 (NEGATE_EXPR, itype, fd->loop.step));
+ else
+ t = fold_build2 (TRUNC_DIV_EXPR, itype, t, fd->loop.step);
+ t = fold_convert (itype, t);
+ n = force_gimple_operand_gsi (&gsi, t, true, NULL_TREE, true, GSI_SAME_STMT);
+
+ t = fold_build2 (TRUNC_DIV_EXPR, itype, n, nthreads);
+ q = force_gimple_operand_gsi (&gsi, t, true, NULL_TREE, true, GSI_SAME_STMT);
+
+ t = fold_build2 (MULT_EXPR, itype, q, nthreads);
+ t = fold_build2 (NE_EXPR, itype, t, n);
+ t = fold_build2 (PLUS_EXPR, itype, q, t);
+ q = force_gimple_operand_gsi (&gsi, t, true, NULL_TREE, true, GSI_SAME_STMT);
+
+ t = build2 (MULT_EXPR, itype, q, threadid);
+ s0 = force_gimple_operand_gsi (&gsi, t, true, NULL_TREE, true, GSI_SAME_STMT);
+
+ t = fold_build2 (PLUS_EXPR, itype, s0, q);
+ t = fold_build2 (MIN_EXPR, itype, t, n);
+ e0 = force_gimple_operand_gsi (&gsi, t, true, NULL_TREE, true, GSI_SAME_STMT);
+
+ t = build2 (GE_EXPR, boolean_type_node, s0, e0);
+ gsi_insert_before (&gsi, gimple_build_cond_empty (t), GSI_SAME_STMT);
+
+ /* Remove the GIMPLE_OMP_FOR statement. */
+ gsi_remove (&gsi, true);
+
+ /* Setup code for sequential iteration goes in SEQ_START_BB. */
+ gsi = gsi_start_bb (seq_start_bb);
+
+ t = fold_convert (itype, s0);
+ t = fold_build2 (MULT_EXPR, itype, t, fd->loop.step);
+ if (POINTER_TYPE_P (type))
+ t = fold_build2 (POINTER_PLUS_EXPR, type, fd->loop.n1,
+ fold_convert (sizetype, t));
+ else
+ t = fold_build2 (PLUS_EXPR, type, t, fd->loop.n1);
+ t = force_gimple_operand_gsi (&gsi, t, false, NULL_TREE,
+ false, GSI_CONTINUE_LINKING);
+ stmt = gimple_build_assign (fd->loop.v, t);
+ gsi_insert_after (&gsi, stmt, GSI_CONTINUE_LINKING);
+
+ t = fold_convert (itype, e0);
+ t = fold_build2 (MULT_EXPR, itype, t, fd->loop.step);
+ if (POINTER_TYPE_P (type))
+ t = fold_build2 (POINTER_PLUS_EXPR, type, fd->loop.n1,
+ fold_convert (sizetype, t));
+ else
+ t = fold_build2 (PLUS_EXPR, type, t, fd->loop.n1);
+ e = force_gimple_operand_gsi (&gsi, t, true, NULL_TREE,
+ false, GSI_CONTINUE_LINKING);
+
+ /* The code controlling the sequential loop replaces the
+ GIMPLE_OMP_CONTINUE. */
+ gsi = gsi_last_bb (cont_bb);
+ stmt = gsi_stmt (gsi);
+ gcc_assert (gimple_code (stmt) == GIMPLE_OMP_CONTINUE);
+ vmain = gimple_omp_continue_control_use (stmt);
+ vback = gimple_omp_continue_control_def (stmt);
+
+ if (POINTER_TYPE_P (type))
+ t = fold_build2 (POINTER_PLUS_EXPR, type, vmain,
+ fold_convert (sizetype, fd->loop.step));
+ else
+ t = fold_build2 (PLUS_EXPR, type, vmain, fd->loop.step);
+ t = force_gimple_operand_gsi (&gsi, t, false, NULL_TREE,
+ true, GSI_SAME_STMT);
+ stmt = gimple_build_assign (vback, t);
+ gsi_insert_before (&gsi, stmt, GSI_SAME_STMT);
+
+ t = build2 (fd->loop.cond_code, boolean_type_node, vback, e);
+ gsi_insert_before (&gsi, gimple_build_cond_empty (t), GSI_SAME_STMT);
+
+ /* Remove the GIMPLE_OMP_CONTINUE statement. */
+ gsi_remove (&gsi, true);
+
+ /* Replace the GIMPLE_OMP_RETURN with a barrier, or nothing. */
+ gsi = gsi_last_bb (exit_bb);
+ if (!gimple_omp_return_nowait_p (gsi_stmt (gsi)))
+ force_gimple_operand_gsi (&gsi, build_omp_barrier (), false, NULL_TREE,
+ false, GSI_SAME_STMT);
+ gsi_remove (&gsi, true);
+
+ /* Connect all the blocks. */
+ find_edge (entry_bb, seq_start_bb)->flags = EDGE_FALSE_VALUE;
+ find_edge (entry_bb, fin_bb)->flags = EDGE_TRUE_VALUE;
+
+ find_edge (cont_bb, body_bb)->flags = EDGE_TRUE_VALUE;
+ find_edge (cont_bb, fin_bb)->flags = EDGE_FALSE_VALUE;
+
+ set_immediate_dominator (CDI_DOMINATORS, seq_start_bb, entry_bb);
+ set_immediate_dominator (CDI_DOMINATORS, body_bb,
+ recompute_dominator (CDI_DOMINATORS, body_bb));
+ set_immediate_dominator (CDI_DOMINATORS, fin_bb,
+ recompute_dominator (CDI_DOMINATORS, fin_bb));
+}
+
/* A subroutine of expand_omp_for. Generate code for a parallel
loop with static schedule and a specified chunk size. Given
@@ -4533,6 +4710,8 @@ expand_omp_for (struct omp_region *regio
else
expand_omp_for_static_chunk (region, &fd);
}
+ else if (fd.sched_kind == OMP_CLAUSE_SCHEDULE_MASTER)
+ expand_numa_for_static_nochunk (region, &fd);
else
{
int fn_index, start_ix, next_ix;
Index: toplev.c
===================================================================
--- toplev.c (revision 148227)
+++ toplev.c (working copy)
@@ -2117,12 +2117,14 @@ lang_dependent_init_target (void)
}
EXTRA_TARGETS_DECL (int lang_dependent_init (const char *));
+EXTRA_TARGETS_DECL (int initialize_sizetypes (bool));
/* Language-dependent initialization. Returns nonzero on success. */
int
lang_dependent_init (const char *name)
{
location_t save_loc ATTRIBUTE_UNUSED;
+ bool signed_sizetype ATTRIBUTE_UNUSED;
targetm_pnt = &this_targetm;
#ifndef EXTRA_TARGET
@@ -2135,11 +2137,18 @@ lang_dependent_init (const char *name)
if (lang_hooks.init () == 0)
return 0;
input_location = save_loc;
+ signed_sizetype = !TYPE_UNSIGNED (sizetype);
+ EXTRA_TARGETS_CALL (initialize_sizetypes (signed_sizetype));
EXTRA_TARGETS_CALL (lang_dependent_init (name));
targetm_pnt = &this_targetm;
init_asm_output (name);
-#endif /* !EXTRA_TARGET */
+#else /* EXTRA_TARGET */
+ if (TYPE_MODE (sizetype) != ptr_mode)
+ sizetype
+ = lang_hooks.types.type_for_mode (ptr_mode, TYPE_UNSIGNED (sizetype));
+ set_sizetype (size_type_node);
+#endif /* EXTRA_TARGET */
/* This creates various _DECL nodes, so needs to be called after the
front end is initialized. */
Index: tree-ssa-loop-ivopts.c
===================================================================
--- tree-ssa-loop-ivopts.c (revision 148225)
+++ tree-ssa-loop-ivopts.c (working copy)
@@ -2582,7 +2582,7 @@ produce_memory_decl_rtl (tree obj, int *
if (TREE_STATIC (obj) || DECL_EXTERNAL (obj))
{
const char *name = IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (obj));
- x = gen_rtx_SYMBOL_REF (Pmode, name);
+ x = gen_rtx_SYMBOL_REF ((*targetm.get_pmode) (), name);
SET_SYMBOL_REF_DECL (x, obj);
x = gen_rtx_MEM (DECL_MODE (obj), x);
targetm.encode_section_info (obj, x, true);
@@ -2670,7 +2670,7 @@ computation_cost (tree expr, bool speed)
crtl->maybe_hot_insn_p = speed;
walk_tree (&expr, prepare_decl_rtl, ®no, NULL);
start_sequence ();
- rslt = expand_expr (expr, NULL_RTX, TYPE_MODE (type), EXPAND_NORMAL);
+ rslt = tree_expand_expr (expr, NULL_RTX, TYPE_MODE (type), EXPAND_NORMAL);
seq = get_insns ();
end_sequence ();
default_rtl_profile ();
@@ -3280,9 +3280,9 @@ force_expr_to_var_cost (tree expr, bool
symbol_cost[i] = computation_cost (addr, i) + 1;
address_cost[i]
- = computation_cost (build2 (POINTER_PLUS_EXPR, type,
- addr,
- build_int_cst (sizetype, 2000)), i) + 1;
+ = computation_cost (build2 (POINTER_PLUS_EXPR, type, addr,
+ build_int_cst (targetm.sizetype, 2000)),
+ i) + 1;
if (dump_file && (dump_flags & TDF_DETAILS))
{
fprintf (dump_file, "force_expr_to_var_cost %s costs:\n", i ? "speed" : "size");
@@ -5487,7 +5487,7 @@ rewrite_use_address (struct ivopts_data
gcc_assert (ok);
unshare_aff_combination (&aff);
- ref = create_mem_ref (&bsi, TREE_TYPE (*use->op_p), &aff, data->speed);
+ ref = tree_create_mem_ref (&bsi, TREE_TYPE (*use->op_p), &aff, data->speed);
copy_ref_info (ref, *use->op_p);
*use->op_p = ref;
}
Index: expr.c
===================================================================
--- expr.c (revision 148225)
+++ expr.c (working copy)
@@ -9499,6 +9499,26 @@ expand_expr_real_1 (tree exp, rtx target
return REDUCE_BIT_FIELD (temp);
}
#undef REDUCE_BIT_FIELD
+
+#ifndef EXTRA_TARGET
+EXTRA_TARGETS_DECL (rtx expand_expr_real (tree, rtx, enum machine_mode,
+ enum expand_modifier, rtx *));
+/* Like expand_expr, but dispatch according to targetm, so this is suitable
+ for tree optimizers that don't have target-specific variants. */
+rtx
+tree_expand_expr (tree exp, rtx target, enum machine_mode mode,
+ enum expand_modifier modifier)
+{
+
+ rtx (*expand_expr_array[]) (tree, rtx, enum machine_mode,
+ enum expand_modifier, rtx *)
+ = { &expand_expr_real, EXTRA_TARGETS_EXPAND_COMMA (&,expand_expr_real) };
+
+ return ((*expand_expr_array[targetm.target_arch])
+ (exp, target, mode, modifier, NULL));
+}
+
+#endif /* EXTRA_TARGET */
/* Subroutine of above: reduce EXP to the precision of TYPE (in the
signedness of TYPE), possibly returning the result in TARGET. */
Index: expr.h
===================================================================
--- expr.h (revision 148225)
+++ expr.h (working copy)
@@ -561,6 +561,9 @@ expand_expr (tree exp, rtx target, enum
return expand_expr_real (exp, target, mode, modifier, NULL);
}
+extern rtx tree_expand_expr (tree, rtx, enum machine_mode,
+ enum expand_modifier);
+
static inline rtx
expand_normal (tree exp)
{
Index: tree-parloops.c
===================================================================
--- tree-parloops.c (revision 148488)
+++ tree-parloops.c (working copy)
@@ -736,7 +736,7 @@ expr_invariant_in_region_p (edge entry,
static tree
separate_decls_in_region_name (tree name,
htab_t name_copies, htab_t decl_copies,
- bool copy_name_p)
+ bool copy_name_p, int new_target)
{
tree copy, var, var_copy;
unsigned idx, uid, nuid;
@@ -760,7 +760,14 @@ separate_decls_in_region_name (tree name
dslot = htab_find_slot_with_hash (decl_copies, &ielt, uid, INSERT);
if (!*dslot)
{
- var_copy = create_tmp_var (TREE_TYPE (var), get_name (var));
+ tree type = TREE_TYPE (var);
+
+ if (new_target != targetm.target_arch && POINTER_TYPE_P (type))
+ type
+ = ((TREE_CODE (type) == POINTER_TYPE
+ ? build_pointer_type_for_mode : build_reference_type_for_mode)
+ (TREE_TYPE (type), *targetm_array[new_target]->ptr_mode, false));
+ var_copy = create_tmp_var (type, get_name (var));
DECL_GIMPLE_REG_P (var_copy) = DECL_GIMPLE_REG_P (var);
add_referenced_var (var_copy);
nielt = XNEW (struct int_tree_map);
@@ -810,7 +817,8 @@ separate_decls_in_region_name (tree name
static void
separate_decls_in_region_stmt (edge entry, edge exit, gimple stmt,
- htab_t name_copies, htab_t decl_copies)
+ htab_t name_copies, htab_t decl_copies,
+ unsigned new_target)
{
use_operand_p use;
def_operand_p def;
@@ -825,7 +833,7 @@ separate_decls_in_region_stmt (edge entr
name = DEF_FROM_PTR (def);
gcc_assert (TREE_CODE (name) == SSA_NAME);
copy = separate_decls_in_region_name (name, name_copies, decl_copies,
- false);
+ false, new_target);
gcc_assert (copy == name);
}
@@ -837,7 +845,7 @@ separate_decls_in_region_stmt (edge entr
copy_name_p = expr_invariant_in_region_p (entry, exit, name);
copy = separate_decls_in_region_name (name, name_copies, decl_copies,
- copy_name_p);
+ copy_name_p, new_target);
SET_USE (use, copy);
}
}
@@ -879,6 +887,41 @@ add_field_for_name (void **slot, void *d
return 1;
}
+/* Called by the NUMA case of separate_decls_in_region via htab_traverse.
+ Computes the callee target start address and size of a parameter array
+ described by *SLOT and updates the size description of the parameter area;
+ DATA points to the parameter area description SIZES_ADDR[0..2].
+ SIZES_ADDR[0] tallies the per-iteration size, and SIZES_ADDR[1] the
+ iteration-independent constant size.
+ SIZES_ADDR[2] contains the callee target start address of the parameter
+ area. */
+static int
+add_size_for_param_array (void **slot, void *data)
+{
+ param_array elt = (param_array) *slot;
+ tree *sizes_addr = (tree *) data;
+ tree min, max, offset, size, stride_tree;
+
+ stride_tree = build_int_cst (size_type_node, elt->stride);
+ min = elt->read_offset.min ? elt->read_offset.min : elt->write_offset.min;
+ if (elt->write_offset.min && tree_int_cst_lt (elt->write_offset.min, min))
+ min = elt->write_offset.min;
+ max = elt->read_offset.max ? elt->read_offset.max : elt->write_offset.max;
+ if (elt->write_offset.max && tree_int_cst_lt (max, elt->write_offset.max))
+ max = elt->write_offset.max;
+ offset = size_binop (MULT_EXPR, sizes_addr[2], sizes_addr[0]);
+ offset = size_binop (PLUS_EXPR, offset, sizes_addr[1]);
+ sizes_addr[0] = size_binop (PLUS_EXPR, sizes_addr[0], stride_tree);
+ sizes_addr[1]
+ = size_binop (PLUS_EXPR, sizes_addr[1], size_binop (MINUS_EXPR, max, min));
+ elt->callee_base
+ = fold_build2 (POINTER_PLUS_EXPR, TREE_TYPE (sizes_addr[3]), sizes_addr[3],
+ size_binop (MINUS_EXPR, offset, min));
+ size = size_binop (MULT_EXPR, sizes_addr[2], stride_tree);
+ elt->size = size_binop (PLUS_EXPR, size, size_binop (MINUS_EXPR, max, min));
+ return 1;
+}
+
/* Callback for htab_traverse. A local result is the intermediate result
computed by a single
thread, or the initial value in case no iteration was executed.
@@ -930,6 +973,8 @@ struct clsn_data
basic_block store_bb;
basic_block load_bb;
+ gimple_seq result_seq;
+ struct loop *loop;
};
/* Callback for htab_traverse. Create an atomic instruction for the
@@ -1102,10 +1147,67 @@ create_loads_and_stores_for_name (void *
tree type = TREE_TYPE (elt->new_name);
tree struct_type = TREE_TYPE (TREE_TYPE (clsn_data->load));
tree load_struct;
+ tree src;
+ struct loop *loop = clsn_data->loop;
gsi = gsi_last_bb (clsn_data->store_bb);
t = build3 (COMPONENT_REF, type, clsn_data->store, elt->field, NULL_TREE);
- stmt = gimple_build_assign (t, ssa_name (elt->version));
+ src = ssa_name (elt->version);
+ if (loop->param_arrays)
+ {
+ tree var = SSA_NAME_VAR (src), dst;
+ struct tree_map *m, m_in;
+ param_array a;
+ struct gcc_target *loop_target = targetm_array[loop->target_arch];
+ tree src_param = src;
+
+ m_in.base.from = var;
+ m = (struct tree_map *) htab_find_with_hash (loop->vect_vars, &m_in,
+ DECL_UID (var));
+ if (m)
+ var = m->to;
+ a = (param_array) htab_find (loop->param_arrays, &var);
+
+ gcc_assert (a);
+ gcc_assert (operand_equal_p (a->caller_base, var, 0));
+ dst = a->callee_base;
+ gcc_assert (TYPE_MODE (TREE_TYPE (src)) == TYPE_MODE (TREE_TYPE (dst)));
+ if (TREE_CODE (TREE_TYPE (src)) == POINTER_TYPE
+ && TYPE_MODE (TREE_TYPE (src)) != ptr_mode)
+ {
+ tree param_type;
+
+ param_type = build_pointer_type (TREE_TYPE (TREE_TYPE (src)));
+ src_param = fold_convert (param_type, src);
+ param_type = build_pointer_type (TREE_TYPE (TREE_TYPE (dst)));
+ dst = fold_convert (param_type, dst);
+ }
+ if (!is_gimple_val (src))
+ {
+ var = create_tmp_var (TREE_TYPE (src_param),
+ IDENTIFIER_POINTER (DECL_NAME (var)));
+ var = make_ssa_name (var, NULL);
+ stmt = gimple_build_assign (var, src_param);
+ SSA_NAME_DEF_STMT (var) = stmt;
+ mark_virtual_ops_for_renaming (stmt);
+ gsi_insert_after (&gsi, stmt, GSI_CONTINUE_LINKING);
+ src_param = var;
+ }
+ if (a->read_offset.min)
+ (*targetm.copy_to_target) (&gsi, loop_target, dst, src_param, a->size);
+ if (a->write_offset.min)
+ {
+ gimple_stmt_iterator i;
+
+ if (!clsn_data->result_seq)
+ clsn_data->result_seq = gimple_seq_alloc ();
+ i = gsi_last (clsn_data->result_seq);
+ (*targetm.copy_from_target) (&i, loop_target, dst, src_param,
+ a->size);
+ }
+ src = dst;
+ }
+ stmt = gimple_build_assign (t, src);
mark_virtual_ops_for_renaming (stmt);
gsi_insert_after (&gsi, stmt, GSI_NEW_STMT);
@@ -1156,9 +1258,10 @@ create_loads_and_stores_for_name (void *
static void
separate_decls_in_region (edge entry, edge exit, htab_t reduction_list,
tree *arg_struct, tree *new_arg_struct,
- struct clsn_data *ld_st_data, unsigned new_target)
+ struct clsn_data *ld_st_data, struct loop *loop)
{
+ int new_target = loop->target_arch;
basic_block bb1 = split_edge (entry);
basic_block bb0 = single_pred (bb1);
htab_t name_copies = htab_create (10, name_to_copy_elt_hash,
@@ -1173,6 +1276,7 @@ separate_decls_in_region (edge entry, ed
basic_block bb;
basic_block entry_bb = bb1;
basic_block exit_bb = exit->dest;
+ tree copy_base_var, copy_base;
entry = single_succ_edge (entry_bb);
gather_blocks_in_sese_region (entry_bb, exit_bb, &body);
@@ -1183,11 +1287,13 @@ separate_decls_in_region (edge entry, ed
{
for (gsi = gsi_start_phis (bb); !gsi_end_p (gsi); gsi_next (&gsi))
separate_decls_in_region_stmt (entry, exit, gsi_stmt (gsi),
- name_copies, decl_copies);
+ name_copies, decl_copies,
+ new_target);
for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
separate_decls_in_region_stmt (entry, exit, gsi_stmt (gsi),
- name_copies, decl_copies);
+ name_copies, decl_copies,
+ new_target);
}
}
@@ -1208,6 +1314,9 @@ separate_decls_in_region (edge entry, ed
type);
TYPE_NAME (type) = type_name;
+ /* ??? For ARCompact / mxp, we should be able to transfer most or all
+ values directly from ARCompact core to mxp vector register.
+ OTOH, who is willing to fund the development work? */
htab_traverse (name_copies, add_field_for_name, type);
if (reduction_list && htab_elements (reduction_list) > 0)
{
@@ -1216,6 +1325,47 @@ separate_decls_in_region (edge entry, ed
type);
}
layout_type (type);
+ bool numa
+ = !(*targetm.common_data_with_target) (targetm_array[new_target]);
+
+ if (numa)
+ {
+ /* Calculate how much memory we need on the new_target side. */
+ tree sizes_addr[4];
+ tree size;
+ tree niter;
+ tree ptype, fn_type, fn;
+ gimple_stmt_iterator gsi;
+ gimple stmt;
+
+ ptype = (build_pointer_type_for_mode
+ (void_type_node, *targetm_array[new_target]->ptr_mode,
+ false));
+ copy_base_var = create_tmp_var (ptype, "copy_base");
+ add_referenced_var (copy_base_var);
+ niter = number_of_latch_executions (loop);
+ sizes_addr[0] = size_zero_node;
+ sizes_addr[1] = size_in_bytes (type);
+ sizes_addr[2] = niter;
+ copy_base = make_ssa_name (copy_base_var, 0);
+ sizes_addr[3] = copy_base;
+ htab_traverse (loop->param_arrays, add_size_for_param_array,
+ sizes_addr);
+ size = size_binop (PLUS_EXPR,
+ size_binop (MULT_EXPR, niter, sizes_addr[0]),
+ sizes_addr[1]);
+ /* Emit gimple to allocate SIZE bytes, assign to copy_base */
+ fn_type = build_function_type_list (integer_type_node,
+ integer_type_node, NULL_TREE);
+ fn = get_identifier ("__simd_malloc");
+ fn = build_decl (FUNCTION_DECL, fn, fn_type);
+ stmt = gimple_build_call (fn, 1, size);
+ SSA_NAME_DEF_STMT (copy_base) = stmt;
+ gimple_call_set_lhs (stmt, copy_base);
+ mark_virtual_ops_for_renaming (stmt);
+ gsi = gsi_last_bb (bb0);
+ gsi_insert_after (&gsi, stmt, GSI_NEW_STMT);
+ }
/* Create the loads and stores. */
*arg_struct = create_tmp_var (type, ".paral_data_store");
@@ -1231,9 +1381,19 @@ separate_decls_in_region (edge entry, ed
ld_st_data->load = *new_arg_struct;
ld_st_data->store_bb = bb0;
ld_st_data->load_bb = bb1;
+ ld_st_data->result_seq = NULL;
+ ld_st_data->loop = loop;
htab_traverse (name_copies, create_loads_and_stores_for_name,
ld_st_data);
+ if (numa)
+ {
+ gsi = gsi_last_bb (bb0);
+ (*targetm.copy_to_target) (&gsi, targetm_array[new_target], copy_base,
+ build_fold_addr_expr (*arg_struct),
+ size_in_bytes (type));
+ *arg_struct = build_fold_indirect_ref (copy_base);
+ }
/* Load the calculation from memory (after the join of the threads). */
@@ -1242,10 +1402,12 @@ separate_decls_in_region (edge entry, ed
htab_traverse (reduction_list, create_stores_for_reduction,
ld_st_data);
clsn_data.load = make_ssa_name (nvar, NULL);
- clsn_data.load_bb = exit->dest;
+ clsn_data.load_bb = exit_bb;
clsn_data.store = ld_st_data->store;
create_final_loads_for_reduction (reduction_list, &clsn_data);
}
+ gsi = gsi_after_labels (split_edge (exit));
+ gsi_insert_seq_before (&gsi, ld_st_data->result_seq, GSI_NEW_STMT);
}
htab_delete (decl_copies);
@@ -1410,7 +1572,9 @@ canonicalize_loop_ivs (struct loop *loop
remove_phi_node (&psi, false);
atype = TREE_TYPE (res);
- mtype = POINTER_TYPE_P (atype) ? sizetype : atype;
+ mtype = (POINTER_TYPE_P (atype)
+ ? targetm_array[loop->target_arch]->sizetype_tab[SIZETYPE]
+ : atype);
val = fold_build2 (MULT_EXPR, mtype, unshare_expr (iv.step),
fold_convert (mtype, var_before));
val = fold_build2 (POINTER_TYPE_P (atype)
@@ -1572,6 +1736,8 @@ create_parallel_loop (struct loop *loop,
gimple stmt, for_stmt, phi, cond_stmt;
tree cvar, cvar_init, initvar, cvar_next, cvar_base, type;
edge exit, nexit, guard, end, e;
+ bool numa
+ = !(*targetm.common_data_with_target) (targetm_array[loop->target_arch]);
/* Prepare the GIMPLE_OMP_PARALLEL statement. */
bb = loop_preheader_edge (loop)->src;
@@ -1650,7 +1816,10 @@ create_parallel_loop (struct loop *loop,
gimple_cond_set_lhs (cond_stmt, cvar_base);
type = TREE_TYPE (cvar);
t = build_omp_clause (OMP_CLAUSE_SCHEDULE);
- OMP_CLAUSE_SCHEDULE_KIND (t) = OMP_CLAUSE_SCHEDULE_STATIC;
+ if (numa)
+ OMP_CLAUSE_SCHEDULE_KIND (t) = OMP_CLAUSE_SCHEDULE_MASTER;
+ else
+ OMP_CLAUSE_SCHEDULE_KIND (t) = OMP_CLAUSE_SCHEDULE_STATIC;
for_stmt = gimple_build_omp_for (NULL, t, 1, NULL);
gimple_omp_for_set_index (for_stmt, 0, initvar);
@@ -1697,6 +1866,7 @@ gen_parallel_loop (struct loop *loop, ht
unsigned prob;
bool arch_change = loop->target_arch != cfun->target_arch;
bool parallelize_all = arch_change;
+ struct gcc_target *save_target;
/* From
@@ -1794,7 +1964,10 @@ gen_parallel_loop (struct loop *loop, ht
free_original_copy_tables ();
/* Base all the induction variables in LOOP on a single control one. */
+ save_target = targetm_pnt;
+ targetm_pnt = targetm_array[loop->target_arch];
canonicalize_loop_ivs (loop, reduction_list, &nit);
+ targetm_pnt = save_target;
/* Ensure that the exit condition is the first statement in the loop. */
if (!parallelize_all)
@@ -1813,7 +1986,7 @@ gen_parallel_loop (struct loop *loop, ht
/* In the old loop, move all variables non-local to the loop to a structure
and back, and create separate decls for the variables used in loop. */
separate_decls_in_region (entry, exit, reduction_list, &arg_struct,
- &new_arg_struct, &clsn_data, loop->target_arch);
+ &new_arg_struct, &clsn_data, loop);
/* Create the parallel constructs. */
parallel_head
Index: tree-ssa-address.c
===================================================================
--- tree-ssa-address.c (revision 148225)
+++ tree-ssa-address.c (working copy)
@@ -42,6 +42,7 @@ along with GCC; see the file COPYING3.
#include "expr.h"
#include "ggc.h"
#include "tree-affine.h"
+#include "target.h"
#include "multi-target.h"
/* TODO -- handling of symbols (according to Richard Hendersons
@@ -247,6 +248,20 @@ addr_for_mem_ref (struct mem_address *ad
}
#ifndef EXTRA_TARGET
+EXTRA_TARGETS_DECL (rtx addr_for_mem_ref (struct mem_address *, bool));
+
+/* Like addr_for_mem_ref, but dispatch according to targetm, so this is
+ suitable for tree optimizers that don't have target-specific variants. */
+
+rtx
+tree_addr_for_mem_ref (struct mem_address *addr, bool really_expand)
+{
+ rtx (*addr_for_mem_ref_array[]) (struct mem_address *, bool)
+ = { &addr_for_mem_ref, EXTRA_TARGETS_EXPAND_COMMA (&,addr_for_mem_ref) };
+
+ return (*addr_for_mem_ref_array[targetm.target_arch]) (addr, really_expand);
+}
+
/* Returns address of MEM_REF in TYPE. */
tree
@@ -698,6 +713,23 @@ create_mem_ref (gimple_stmt_iterator *gs
}
#ifndef EXTRA_TARGET
+EXTRA_TARGETS_DECL (tree create_mem_ref (gimple_stmt_iterator *gsi, tree type,
+ aff_tree *addr, bool speed));
+
+/* Like create_mem_ref, but dispatch according to targetm, so this is
+ suitable for tree optimizers that don't have target-specific variants. */
+
+tree
+tree_create_mem_ref (gimple_stmt_iterator *gsi, tree type, aff_tree *addr,
+ bool speed)
+{
+ tree (*create_mem_ref_array[]) (gimple_stmt_iterator *, tree, aff_tree *,
+ bool)
+ = { &create_mem_ref, EXTRA_TARGETS_EXPAND_COMMA (&,create_mem_ref) };
+
+ return (*create_mem_ref_array[targetm.target_arch]) (gsi, type, addr, speed);
+}
+
/* Copies components of the address from OP to ADDR. */
void
Index: function.c
===================================================================
--- function.c (revision 148225)
+++ function.c (working copy)
@@ -4083,6 +4083,27 @@ static void (* const allocate_struct_fun
EXTRA_TARGETS_EXPAND_COMMA (&,allocate_struct_function_1)
};
+/* If FNDECL has a target _arch attribute, return the index of that target
+ architecture in targetm_array; otherwise, return 0. */
+int
+lookup_attr_target (tree fndecl)
+{
+ int i = 0;
+#if NUM_TARGETS > 1
+ const char *arch_name = targetm.name;
+ tree attr = NULL_TREE;
+
+ if (fndecl)
+ attr = lookup_attribute ("target_arch", DECL_ATTRIBUTES (fndecl));
+ if (attr)
+ arch_name = TREE_STRING_POINTER (TREE_VALUE (TREE_VALUE (attr)));
+ for (; targetm_array[i]; i++)
+ if (strcmp (targetm_array[i]->name, arch_name) == 0)
+ break;
+#endif
+ return i;
+}
+
/* Allocate a function structure for FNDECL and set its contents
to the defaults. Set cfun to the newly-allocated object.
Some of the helper functions invoked during initialization assume
@@ -4099,19 +4120,7 @@ static void (* const allocate_struct_fun
void
allocate_struct_function (tree fndecl, bool abstract_p)
{
- int i = 0;
-#if NUM_TARGETS > 1
- const char *arch_name = targetm.name;
- tree attr = NULL_TREE;
-
- if (fndecl)
- attr = lookup_attribute ("target_arch", DECL_ATTRIBUTES (fndecl));
- if (attr)
- arch_name = TREE_STRING_POINTER (TREE_VALUE (TREE_VALUE (attr)));
- for (; targetm_array[i]; i++)
- if (strcmp (targetm_array[i]->name, arch_name) == 0)
- break;
-#endif
+ int i = lookup_attr_target (fndecl);
cfun = GGC_CNEW (struct function);
cfun->target_arch = i;
targetm_pnt = targetm_array[i];
Index: tree-affine.c
===================================================================
--- tree-affine.c (revision 148213)
+++ tree-affine.c (working copy)
@@ -32,6 +32,7 @@ along with GCC; see the file COPYING3.
#include "tree-affine.h"
#include "gimple.h"
#include "flags.h"
+#include "target.h"
/* Extends CST as appropriate for the affine combinations COMB. */
@@ -352,7 +353,7 @@ add_elt_to_tree (tree expr, tree type, t
enum tree_code code;
tree type1 = type;
if (POINTER_TYPE_P (type))
- type1 = sizetype;
+ type1 = targetm.sizetype;
scale = double_int_ext_for_comb (scale, comb);
elt = fold_convert (type1, elt);
@@ -415,7 +416,7 @@ aff_combination_to_tree (aff_tree *comb)
double_int off, sgn;
tree type1 = type;
if (POINTER_TYPE_P (type))
- type1 = sizetype;
+ type1 = targetm.sizetype;
gcc_assert (comb->n == MAX_AFF_ELTS || comb->rest == NULL_TREE);
Index: target-def.h
===================================================================
--- target-def.h (revision 148225)
+++ target-def.h (working copy)
@@ -32,6 +32,8 @@
/* TARGET_NAME is defined by the Makefile. */
+#define TARGET_GET_PMODE default_get_pmode
+
/* Assembler output. */
#ifndef TARGET_ASM_OPEN_PAREN
#define TARGET_ASM_OPEN_PAREN "("
@@ -444,6 +446,22 @@
#define TARGET_STRIP_NAME_ENCODING default_strip_name_encoding
#endif
+#ifndef TARGET_COMMON_DATA_WITH_TARGET
+#define TARGET_COMMON_DATA_WITH_TARGET default_common_data_with_target
+#endif
+
+#ifndef TARGET_COPY_TO_TARGET
+#define TARGET_COPY_TO_TARGET 0
+#endif
+
+#ifndef TARGET_COPY_FROM_TARGET
+#define TARGET_COPY_FROM_TARGET 0
+#endif
+
+#ifndef TARGET_BUILD_CALL_ON_TARGET
+#define TARGET_BUILD_CALL_ON_TARGET 0
+#endif
+
#ifndef TARGET_BINDS_LOCAL_P
#define TARGET_BINDS_LOCAL_P default_binds_local_p
#endif
@@ -847,6 +865,8 @@
TARGET_NAME, \
TARGET_NUM, \
&ptr_mode, \
+ TARGET_GET_PMODE, \
+ &sizetype_tab[0], \
TARGET_ASM_OUT, \
TARGET_SCHED, \
TARGET_VECTORIZE, \
@@ -896,6 +916,10 @@
TARGET_MANGLE_DECL_ASSEMBLER_NAME, \
TARGET_ENCODE_SECTION_INFO, \
TARGET_STRIP_NAME_ENCODING, \
+ TARGET_COMMON_DATA_WITH_TARGET, \
+ TARGET_COPY_TO_TARGET, \
+ TARGET_COPY_FROM_TARGET, \
+ TARGET_BUILD_CALL_ON_TARGET, \
TARGET_SHIFT_TRUNCATION_MASK, \
TARGET_MIN_DIVISIONS_FOR_RECIP_MUL, \
TARGET_MODE_REP_EXTENDED, \
Index: tree-vect-transform.c
===================================================================
--- tree-vect-transform.c (revision 148488)
+++ tree-vect-transform.c (working copy)
@@ -969,6 +969,108 @@ vect_create_addr_base_for_vector_ref (gi
return vec_stmt;
}
+/* Function vect_decompose_addr_base_for_vector_ref.
+
+ Decompose the address of the first memory location
+ that will be accessed for a data reference.
+
+ Input:
+ STMT: The statement containing the data reference.
+ OFFSET: Optional. If supplied, it is be added to the initial address.
+ LOOP: Specify relative to which loop-nest should the address be computed.
+ For example, when the dataref is in an inner-loop nested in an
+ outer-loop that is now being vectorized, LOOP can be either the
+ outer-loop, or the inner-loop. The first memory location accessed
+ by the following dataref ('in' points to short):
+
+ for (i=0; i<N; i++)
+ for (j=0; j<M; j++)
+ s += in[i+j]
+
+ is as follows:
+ if LOOP=i_loop: &in (relative to i_loop)
+ if LOOP=j_loop: &in+i*2B (relative to j_loop)
+
+ Output:
+ 1. Return a GENERIC expression whose value is the address derived from the
+ base address derived from the declaration of the array / variable in
+ the memory access.
+ 2. Decompose the offset from there to the address of the memory
+ location of the first vector of the data reference into a constant part
+ *coffset and a variable part *voffset.
+
+ FORNOW: We are only handling array accesses with step 1. */
+
+static tree
+vect_decompose_addr_base_for_vector_ref (gimple stmt, tree offset,
+ struct loop *loop,
+ tree *coffset, tree *voffset)
+{
+ stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
+ struct data_reference *dr = STMT_VINFO_DATA_REF (stmt_info);
+ struct loop *containing_loop = (gimple_bb (stmt))->loop_father;
+ tree data_ref_base = unshare_expr (DR_BASE_ADDRESS (dr));
+ tree base_name;
+ tree base_offset = unshare_expr (DR_OFFSET (dr));
+ tree init = unshare_expr (DR_INIT (dr));
+ tree step = TYPE_SIZE_UNIT (TREE_TYPE (DR_REF (dr)));
+ tree tmp_var, tmp_off;
+
+ gcc_assert (loop);
+ if (loop != containing_loop)
+ {
+ loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
+ struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
+
+ gcc_assert (nested_in_vect_loop_p (loop, stmt));
+
+ data_ref_base = unshare_expr (STMT_VINFO_DR_BASE_ADDRESS (stmt_info));
+ base_offset = unshare_expr (STMT_VINFO_DR_OFFSET (stmt_info));
+ init = unshare_expr (STMT_VINFO_DR_INIT (stmt_info));
+ }
+
+ /* Create data_ref_base */
+ base_name = build_fold_indirect_ref (data_ref_base);
+
+ init = fold_convert (sizetype, init);
+ if (offset)
+ {
+ gcc_assert (really_constant_p (offset));
+ offset = fold_build2 (MULT_EXPR, sizetype,
+ fold_convert (sizetype, offset), step);
+ init = size_binop (PLUS_EXPR, init, offset);
+ }
+
+ split_constant_offset (base_offset, &tmp_var, &tmp_off);
+ base_offset = fold_convert (sizetype, tmp_var);
+ init = size_binop (PLUS_EXPR, init, fold_convert (sizetype, tmp_off));
+
+ *coffset = init;
+ *voffset = base_offset;
+
+ /* We rely here on get_name only accepting a variable declaration or its
+ address, not any PLUS_EXPR with some other offset. */
+ gcc_assert (get_name (base_name));
+
+ return base_name;
+}
+
+static unsigned int
+param_array_hash (const void *p)
+{
+ param_array elem = (param_array) p;
+
+ return htab_hash_pointer (elem->decl);
+}
+
+static int
+param_array_eq (const void *p0, const void *p1)
+{
+ param_array e0 = (param_array) p0;
+ param_array e1 = (param_array) p1;
+
+ return e0->decl == e1->decl;
+}
/* Function vect_create_data_ref_ptr.
@@ -1044,6 +1146,8 @@ vect_create_data_ref_ptr (gimple stmt, s
gimple incr;
tree step;
alias_set_type ptr_alias_set = 0;
+ bool numa = !((*targetm_array[cfun->target_arch]->common_data_with_target)
+ (targetm_array[loop->target_arch]));
enum machine_mode tptrmode = *targetm_array[loop->target_arch]->ptr_mode;
/* Check the step (evolution) of the load in LOOP, and record
@@ -1115,6 +1219,74 @@ vect_create_data_ref_ptr (gimple stmt, s
mark_sym_for_renaming (tag);
}
+ if (numa)
+ {
+ tree decl;
+ void **slot;
+ param_array new_a;
+ int stride;
+ bool load_p = at_loop != NULL;
+ tree base, coffset, voffset;
+ struct tree_range *rangep;
+ struct tree_map *m;
+
+ if (!loop->param_arrays)
+ {
+ loop->param_arrays
+ = htab_create (10, param_array_hash, param_array_eq, free);
+ loop->vect_vars
+ = htab_create (10, tree_map_hash, tree_map_eq, free);
+ }
+
+ /* If we want to handle vectorizing outer loops, we need a more
+ complex model of the to-be-transferred arrays than an index range
+ and a single stride. I.e. we'd have to consider that the entire
+ access range of the inner loop must be present, and write overlap with
+ a following simulatanously processed range must be avoided. */
+ gcc_assert (!nested_in_vect_loop);
+ /* Moreover, if the address is initialized inside the loop (in the
+ preheader of the inner loop), we'd need to arrange for the DMA
+ to be somewhere else. */
+ gcc_assert (!at_loop || at_loop == loop);
+ gcc_assert (!*inv_p);
+ stride = tree_low_cst (TYPE_SIZE_UNIT (vectype), 1);
+ base = vect_decompose_addr_base_for_vector_ref (stmt, offset, loop,
+ &coffset, &voffset);
+ decl = get_get_name_decl (base);
+ slot = htab_find_slot (loop->param_arrays, &decl, INSERT);
+ new_a = *(param_array *) slot;
+ if (!new_a)
+ {
+ new_a = XCNEW (struct param_array_d);
+ *slot = new_a;
+ new_a->decl = decl;
+ new_a->caller_base = base;
+ new_a->stride = stride;
+ new_a->invar_offset = voffset;
+ rangep = load_p ? &new_a->read_offset : &new_a->write_offset;
+ rangep->min = rangep->max = coffset;
+ }
+ else
+ {
+ gcc_assert (operand_equal_p (new_a->caller_base, base, 0));
+ gcc_assert (new_a->stride == stride);
+ gcc_assert (operand_equal_p (new_a->invar_offset, voffset, 0));
+
+ rangep = load_p ? &new_a->read_offset : &new_a->write_offset;
+ if (!rangep->min || tree_int_cst_lt (coffset, rangep->min))
+ rangep->min = coffset;
+ if (!rangep->max || tree_int_cst_lt (rangep->max, coffset))
+ rangep->max = coffset;
+ }
+ m = XCNEW (struct tree_map);
+ m->hash = DECL_UID (vect_ptr);
+ m->base.from = vect_ptr;
+ m->to = decl;
+ slot = htab_find_slot_with_hash (loop->vect_vars, m, m->hash, INSERT);
+ gcc_assert (*slot == NULL);
+ *slot = m;
+ }
+
/** Note: If the dataref is in an inner-loop nested in LOOP, and we are
vectorizing LOOP (i.e. outer-loop vectorization), we need to create two
def-use update cycles for the pointer: One relative to the outer-loop
Index: cfgloop.h
===================================================================
--- cfgloop.h (revision 148225)
+++ cfgloop.h (working copy)
@@ -100,6 +100,41 @@ enum loop_estimation
EST_AVAILABLE
};
+struct tree_range GTY (()) { tree min, max; } ;
+
+typedef struct param_array_d GTY (())
+{
+ /* The declaration of the base variable, as obtained with get_name_decl.
+ Its name is used to compute the hash key. */
+ tree decl;
+ /* The expression how this variable actually forms the base address for
+ the access, in the calling context. */
+ tree caller_base;
+ /* The expression to initialize the variable on the callee side. */
+ tree callee_base;
+ /* All accesses to this array should agree on stride, because otherwise it
+ is not straightforward to slice this array into separate index ranges. */
+ int stride;
+ /* Likewise, all accesses should agree on the non-constant offset. */
+ tree invar_offset;
+ /* max_{read,write}_offset includes the size of the access mode, and thus
+ points to the first not-accessed byte.
+ max_write_offset - min_write_offset must not be larger than stride to
+ allow vectorized operation.
+ forward iteration: the index range is divided into monotonically
+ increasing slices such that the inputs of a slice and all preceding slices
+ have been fully read before its output is written; when operating on a
+ slice, the biv is incremented.
+ backward iteration: likewise, with decreasing index ranges and bivs.
+ max_write_offset - min_read_offset must not be larger than stride to
+ allow forward iteration.
+ max_read_offset - min_write_offset must not be larger than stride to
+ allow backward iteration. */
+ struct tree_range read_offset;
+ struct tree_range write_offset;
+ tree size;
+} *param_array;
+
/* Structure to hold information for each natural loop. */
struct loop GTY ((chain_next ("%h.next")))
{
@@ -164,6 +199,16 @@ struct loop GTY ((chain_next ("%h.next")
/* Head of the cyclic list of the exits of the loop. */
struct loop_exit *exits;
+
+ /* arrays that are passed from a calling context left on another target
+ architecture.
+ We could use a separate array, hash table or similar to map loop index
+ to the relevant param_array pointer to save compile-time space when
+ this feature is not used (e.g. only a single architecture configured),
+ but that'll require some care to keep keep the mapping in sync when
+ the loop array is resized. */
+ htab_t GTY ((param_is (struct param_array_d))) param_arrays;
+ htab_t GTY ((param_is (struct tree_map))) vect_vars;
};
/* Flags for state of loop structure. */
Index: tree-flow.h
===================================================================
--- tree-flow.h (revision 148225)
+++ tree-flow.h (working copy)
@@ -1176,6 +1176,8 @@ tree create_mem_ref (gimple_stmt_iterato
rtx addr_for_mem_ref (struct mem_address *, bool);
tree maybe_fold_tmr (tree);
END_TARGET_SPECIFIC
+tree tree_create_mem_ref (gimple_stmt_iterator *, tree,
+ struct affine_tree_combination *, bool);
void get_address_description (tree, struct mem_address *);
void init_alias_heapvars (void);
Index: Makefile.in
===================================================================
--- Makefile.in (revision 149001)
+++ Makefile.in (working copy)
@@ -2251,7 +2251,7 @@ tree-ssa-loop-unswitch.o : tree-ssa-loop
coretypes.h $(TREE_DUMP_H) $(TREE_PASS_H) $(BASIC_BLOCK_H) hard-reg-set.h \
$(TREE_INLINE_H)
tree-ssa-address.o : tree-ssa-address.c $(TREE_FLOW_H) $(CONFIG_H) \
- $(SYSTEM_H) $(RTL_H) $(TREE_H) $(TM_P_H) \
+ $(SYSTEM_H) $(RTL_H) $(TREE_H) $(TM_P_H) $(TARGET_H) \
output.h $(DIAGNOSTIC_H) $(TIMEVAR_H) $(TM_H) coretypes.h $(TREE_DUMP_H) \
$(TREE_PASS_H) $(FLAGS_H) $(TREE_INLINE_H) $(RECOG_H) insn-config.h \
$(EXPR_H) gt-tree-ssa-address.h $(GGC_H) tree-affine.h
@@ -2289,7 +2289,8 @@ tree-ssa-loop-ivopts.o : tree-ssa-loop-i
gt-tree-ssa-loop-ivopts.h
tree-affine.o : tree-affine.c tree-affine.h $(CONFIG_H) pointer-set.h \
$(SYSTEM_H) $(RTL_H) $(TREE_H) $(TM_P_H) hard-reg-set.h $(GIMPLE_H) \
- output.h $(DIAGNOSTIC_H) $(TM_H) coretypes.h $(TREE_DUMP_H) $(FLAGS_H)
+ output.h $(DIAGNOSTIC_H) $(TM_H) coretypes.h $(TREE_DUMP_H) $(FLAGS_H) \
+ $(TARGET_H)
tree-ssa-loop-manip.o : tree-ssa-loop-manip.c $(TREE_FLOW_H) $(CONFIG_H) \
$(SYSTEM_H) coretypes.h $(TM_H) $(TREE_H) $(RTL_H) $(TM_P_H) hard-reg-set.h \
$(BASIC_BLOCK_H) output.h $(DIAGNOSTIC_H) $(TREE_FLOW_H) $(TREE_DUMP_H) \
Index: gimple.h
===================================================================
--- gimple.h (revision 148225)
+++ gimple.h (working copy)
@@ -241,7 +241,7 @@ set_bb_seq (basic_block bb, gimple_seq s
/* Iterator object for GIMPLE statement sequences. */
-typedef struct
+typedef struct gimple_stmt_iterator_d
{
/* Sequence node holding the current statement. */
gimple_seq_node ptr;
Index: config/arc/predicates.md
===================================================================
--- config/arc/predicates.md (revision 148226)
+++ config/arc/predicates.md (working copy)
@@ -758,3 +758,18 @@ (define_special_predicate "immediate_usi
(match_test "INTVAL (op) >= 0")
(and (match_test "const_double_operand (op, mode)")
(match_test "CONST_DOUBLE_HIGH (op) == 0"))))
+
+(define_predicate "simd_arg_vector"
+ (match_code "parallel")
+{
+ int i = XVECLEN (op, 0) - 1;
+
+ for (;i >= 0; i--)
+ {
+ rtx arg = XVECEXP (op, 0, i);
+
+ if (!REG_P (arg) || REGNO (arg) < 66 || REGNO (arg) >= 66 + 8)
+ return false;
+ }
+ return true;
+})
Index: config/arc/arc.c
===================================================================
--- config/arc/arc.c (revision 148226)
+++ config/arc/arc.c (working copy)
@@ -58,6 +58,8 @@ along with GCC; see the file COPYING3.
#include "tm-constrs.h"
#include "reload.h" /* For operands_match_p */
#include "df.h"
+#include "gimple.h"
+#include "tree-flow.h"
#include "multi-target.h"
START_TARGET_SPECIFIC
@@ -181,131 +183,136 @@ enum arc_builtins {
ARC_BUILTIN_TRAP_S = 20,
ARC_BUILTIN_UNIMP_S = 21,
+ ARC_SIMD_BUILTIN_CALL,
/* Sentinel to mark start of simd builtins */
- ARC_SIMD_BUILTIN_BEGIN = 1000,
+ ARC_SIMD_BUILTIN_BEGIN = 100,
- ARC_SIMD_BUILTIN_VADDAW = 1001,
- ARC_SIMD_BUILTIN_VADDW = 1002,
- ARC_SIMD_BUILTIN_VAVB = 1003,
- ARC_SIMD_BUILTIN_VAVRB = 1004,
- ARC_SIMD_BUILTIN_VDIFAW = 1005,
- ARC_SIMD_BUILTIN_VDIFW = 1006,
- ARC_SIMD_BUILTIN_VMAXAW = 1007,
- ARC_SIMD_BUILTIN_VMAXW = 1008,
- ARC_SIMD_BUILTIN_VMINAW = 1009,
- ARC_SIMD_BUILTIN_VMINW = 1010,
- ARC_SIMD_BUILTIN_VMULAW = 1011,
- ARC_SIMD_BUILTIN_VMULFAW = 1012,
- ARC_SIMD_BUILTIN_VMULFW = 1013,
- ARC_SIMD_BUILTIN_VMULW = 1014,
- ARC_SIMD_BUILTIN_VSUBAW = 1015,
- ARC_SIMD_BUILTIN_VSUBW = 1016,
- ARC_SIMD_BUILTIN_VSUMMW = 1017,
- ARC_SIMD_BUILTIN_VAND = 1018,
- ARC_SIMD_BUILTIN_VANDAW = 1019,
- ARC_SIMD_BUILTIN_VBIC = 1020,
- ARC_SIMD_BUILTIN_VBICAW = 1021,
- ARC_SIMD_BUILTIN_VOR = 1022,
- ARC_SIMD_BUILTIN_VXOR = 1023,
- ARC_SIMD_BUILTIN_VXORAW = 1024,
- ARC_SIMD_BUILTIN_VEQW = 1025,
- ARC_SIMD_BUILTIN_VLEW = 1026,
- ARC_SIMD_BUILTIN_VLTW = 1027,
- ARC_SIMD_BUILTIN_VNEW = 1028,
- ARC_SIMD_BUILTIN_VMR1AW = 1029,
- ARC_SIMD_BUILTIN_VMR1W = 1030,
- ARC_SIMD_BUILTIN_VMR2AW = 1031,
- ARC_SIMD_BUILTIN_VMR2W = 1032,
- ARC_SIMD_BUILTIN_VMR3AW = 1033,
- ARC_SIMD_BUILTIN_VMR3W = 1034,
- ARC_SIMD_BUILTIN_VMR4AW = 1035,
- ARC_SIMD_BUILTIN_VMR4W = 1036,
- ARC_SIMD_BUILTIN_VMR5AW = 1037,
- ARC_SIMD_BUILTIN_VMR5W = 1038,
- ARC_SIMD_BUILTIN_VMR6AW = 1039,
- ARC_SIMD_BUILTIN_VMR6W = 1040,
- ARC_SIMD_BUILTIN_VMR7AW = 1041,
- ARC_SIMD_BUILTIN_VMR7W = 1042,
- ARC_SIMD_BUILTIN_VMRB = 1043,
- ARC_SIMD_BUILTIN_VH264F = 1044,
- ARC_SIMD_BUILTIN_VH264FT = 1045,
- ARC_SIMD_BUILTIN_VH264FW = 1046,
- ARC_SIMD_BUILTIN_VVC1F = 1047,
- ARC_SIMD_BUILTIN_VVC1FT = 1048,
+ ARC_SIMD_BUILTIN_VADDAW = 101,
+ ARC_SIMD_BUILTIN_VADDW = 102,
+ ARC_SIMD_BUILTIN_VAVB = 103,
+ ARC_SIMD_BUILTIN_VAVRB = 104,
+ ARC_SIMD_BUILTIN_VDIFAW = 105,
+ ARC_SIMD_BUILTIN_VDIFW = 106,
+ ARC_SIMD_BUILTIN_VMAXAW = 107,
+ ARC_SIMD_BUILTIN_VMAXW = 108,
+ ARC_SIMD_BUILTIN_VMINAW = 109,
+ ARC_SIMD_BUILTIN_VMINW = 110,
+ ARC_SIMD_BUILTIN_VMULAW = 111,
+ ARC_SIMD_BUILTIN_VMULFAW = 112,
+ ARC_SIMD_BUILTIN_VMULFW = 113,
+ ARC_SIMD_BUILTIN_VMULW = 114,
+ ARC_SIMD_BUILTIN_VSUBAW = 115,
+ ARC_SIMD_BUILTIN_VSUBW = 116,
+ ARC_SIMD_BUILTIN_VSUMMW = 117,
+ ARC_SIMD_BUILTIN_VAND = 118,
+ ARC_SIMD_BUILTIN_VANDAW = 119,
+ ARC_SIMD_BUILTIN_VBIC = 120,
+ ARC_SIMD_BUILTIN_VBICAW = 121,
+ ARC_SIMD_BUILTIN_VOR = 122,
+ ARC_SIMD_BUILTIN_VXOR = 123,
+ ARC_SIMD_BUILTIN_VXORAW = 124,
+ ARC_SIMD_BUILTIN_VEQW = 125,
+ ARC_SIMD_BUILTIN_VLEW = 126,
+ ARC_SIMD_BUILTIN_VLTW = 127,
+ ARC_SIMD_BUILTIN_VNEW = 128,
+ ARC_SIMD_BUILTIN_VMR1AW = 129,
+ ARC_SIMD_BUILTIN_VMR1W = 130,
+ ARC_SIMD_BUILTIN_VMR2AW = 131,
+ ARC_SIMD_BUILTIN_VMR2W = 132,
+ ARC_SIMD_BUILTIN_VMR3AW = 133,
+ ARC_SIMD_BUILTIN_VMR3W = 134,
+ ARC_SIMD_BUILTIN_VMR4AW = 135,
+ ARC_SIMD_BUILTIN_VMR4W = 136,
+ ARC_SIMD_BUILTIN_VMR5AW = 137,
+ ARC_SIMD_BUILTIN_VMR5W = 138,
+ ARC_SIMD_BUILTIN_VMR6AW = 139,
+ ARC_SIMD_BUILTIN_VMR6W = 140,
+ ARC_SIMD_BUILTIN_VMR7AW = 141,
+ ARC_SIMD_BUILTIN_VMR7W = 142,
+ ARC_SIMD_BUILTIN_VMRB = 143,
+ ARC_SIMD_BUILTIN_VH264F = 144,
+ ARC_SIMD_BUILTIN_VH264FT = 145,
+ ARC_SIMD_BUILTIN_VH264FW = 146,
+ ARC_SIMD_BUILTIN_VVC1F = 147,
+ ARC_SIMD_BUILTIN_VVC1FT = 148,
/* Va, Vb, rlimm instructions */
- ARC_SIMD_BUILTIN_VBADDW = 1050,
- ARC_SIMD_BUILTIN_VBMAXW = 1051,
- ARC_SIMD_BUILTIN_VBMINW = 1052,
- ARC_SIMD_BUILTIN_VBMULAW = 1053,
- ARC_SIMD_BUILTIN_VBMULFW = 1054,
- ARC_SIMD_BUILTIN_VBMULW = 1055,
- ARC_SIMD_BUILTIN_VBRSUBW = 1056,
- ARC_SIMD_BUILTIN_VBSUBW = 1057,
+ ARC_SIMD_BUILTIN_VBADDW = 150,
+ ARC_SIMD_BUILTIN_VBMAXW = 151,
+ ARC_SIMD_BUILTIN_VBMINW = 152,
+ ARC_SIMD_BUILTIN_VBMULAW = 153,
+ ARC_SIMD_BUILTIN_VBMULFW = 154,
+ ARC_SIMD_BUILTIN_VBMULW = 155,
+ ARC_SIMD_BUILTIN_VBRSUBW = 156,
+ ARC_SIMD_BUILTIN_VBSUBW = 157,
/* Va, Vb, Ic instructions */
- ARC_SIMD_BUILTIN_VASRW = 1060,
- ARC_SIMD_BUILTIN_VSR8 = 1061,
- ARC_SIMD_BUILTIN_VSR8AW = 1062,
+ ARC_SIMD_BUILTIN_VASRW = 160,
+ ARC_SIMD_BUILTIN_VSR8 = 161,
+ ARC_SIMD_BUILTIN_VSR8AW = 162,
/* Va, Vb, u6 instructions */
- ARC_SIMD_BUILTIN_VASRRWi = 1065,
- ARC_SIMD_BUILTIN_VASRSRWi = 1066,
- ARC_SIMD_BUILTIN_VASRWi = 1067,
- ARC_SIMD_BUILTIN_VASRPWBi = 1068,
- ARC_SIMD_BUILTIN_VASRRPWBi = 1069,
- ARC_SIMD_BUILTIN_VSR8AWi = 1070,
- ARC_SIMD_BUILTIN_VSR8i = 1071,
+ ARC_SIMD_BUILTIN_VASRRWi = 165,
+ ARC_SIMD_BUILTIN_VASRSRWi = 166,
+ ARC_SIMD_BUILTIN_VASRWi = 167,
+ ARC_SIMD_BUILTIN_VASRPWBi = 168,
+ ARC_SIMD_BUILTIN_VASRRPWBi = 169,
+ ARC_SIMD_BUILTIN_VSR8AWi = 170,
+ ARC_SIMD_BUILTIN_VSR8i = 171,
/* Va, Vb, u8 (simm) instructions*/
- ARC_SIMD_BUILTIN_VMVAW = 1075,
- ARC_SIMD_BUILTIN_VMVW = 1076,
- ARC_SIMD_BUILTIN_VMVZW = 1077,
- ARC_SIMD_BUILTIN_VD6TAPF = 1078,
+ ARC_SIMD_BUILTIN_VMVAW = 175,
+ ARC_SIMD_BUILTIN_VMVW = 176,
+ ARC_SIMD_BUILTIN_VMVZW = 177,
+ ARC_SIMD_BUILTIN_VD6TAPF = 178,
/* Va, rlimm, u8 (simm) instructions*/
- ARC_SIMD_BUILTIN_VMOVAW = 1080,
- ARC_SIMD_BUILTIN_VMOVW = 1081,
- ARC_SIMD_BUILTIN_VMOVZW = 1082,
+ ARC_SIMD_BUILTIN_VMOVAW = 180,
+ ARC_SIMD_BUILTIN_VMOVW = 181,
+ ARC_SIMD_BUILTIN_VMOVZW = 182,
/* Va, Vb instructions */
- ARC_SIMD_BUILTIN_VABSAW = 1085,
- ARC_SIMD_BUILTIN_VABSW = 1086,
- ARC_SIMD_BUILTIN_VADDSUW = 1087,
- ARC_SIMD_BUILTIN_VSIGNW = 1088,
- ARC_SIMD_BUILTIN_VEXCH1 = 1089,
- ARC_SIMD_BUILTIN_VEXCH2 = 1090,
- ARC_SIMD_BUILTIN_VEXCH4 = 1091,
- ARC_SIMD_BUILTIN_VUPBAW = 1092,
- ARC_SIMD_BUILTIN_VUPBW = 1093,
- ARC_SIMD_BUILTIN_VUPSBAW = 1094,
- ARC_SIMD_BUILTIN_VUPSBW = 1095,
-
- ARC_SIMD_BUILTIN_VDIRUN = 1100,
- ARC_SIMD_BUILTIN_VDORUN = 1101,
- ARC_SIMD_BUILTIN_VDIWR = 1102,
- ARC_SIMD_BUILTIN_VDOWR = 1103,
-
- ARC_SIMD_BUILTIN_VREC = 1105,
- ARC_SIMD_BUILTIN_VRUN = 1106,
- ARC_SIMD_BUILTIN_VRECRUN = 1107,
- ARC_SIMD_BUILTIN_VENDREC = 1108,
-
- ARC_SIMD_BUILTIN_VLD32WH = 1110,
- ARC_SIMD_BUILTIN_VLD32WL = 1111,
- ARC_SIMD_BUILTIN_VLD64 = 1112,
- ARC_SIMD_BUILTIN_VLD32 = 1113,
- ARC_SIMD_BUILTIN_VLD64W = 1114,
- ARC_SIMD_BUILTIN_VLD128 = 1115,
- ARC_SIMD_BUILTIN_VST128 = 1116,
- ARC_SIMD_BUILTIN_VST64 = 1117,
+ ARC_SIMD_BUILTIN_VABSAW = 185,
+ ARC_SIMD_BUILTIN_VABSW = 186,
+ ARC_SIMD_BUILTIN_VADDSUW = 187,
+ ARC_SIMD_BUILTIN_VSIGNW = 188,
+ ARC_SIMD_BUILTIN_VEXCH1 = 189,
+ ARC_SIMD_BUILTIN_VEXCH2 = 190,
+ ARC_SIMD_BUILTIN_VEXCH4 = 191,
+ ARC_SIMD_BUILTIN_VUPBAW = 192,
+ ARC_SIMD_BUILTIN_VUPBW = 193,
+ ARC_SIMD_BUILTIN_VUPSBAW = 194,
+ ARC_SIMD_BUILTIN_VUPSBW = 195,
+
+ ARC_SIMD_BUILTIN_VDIRUN = 200,
+ ARC_SIMD_BUILTIN_VDORUN = 201,
+ ARC_SIMD_BUILTIN_VDIWR = 202,
+ ARC_SIMD_BUILTIN_VDOWR = 203,
+
+ ARC_SIMD_BUILTIN_VREC = 205,
+ ARC_SIMD_BUILTIN_VRUN = 206,
+ ARC_SIMD_BUILTIN_VRECRUN = 207,
+ ARC_SIMD_BUILTIN_VENDREC = 208,
+
+ ARC_SIMD_BUILTIN_VLD32WH = 210,
+ ARC_SIMD_BUILTIN_VLD32WL = 211,
+ ARC_SIMD_BUILTIN_VLD64 = 212,
+ ARC_SIMD_BUILTIN_VLD32 = 213,
+ ARC_SIMD_BUILTIN_VLD64W = 214,
+ ARC_SIMD_BUILTIN_VLD128 = 215,
+ ARC_SIMD_BUILTIN_VST128 = 216,
+ ARC_SIMD_BUILTIN_VST64 = 217,
+
+ ARC_SIMD_BUILTIN_VST16_N = 220,
+ ARC_SIMD_BUILTIN_VST32_N = 221,
- ARC_SIMD_BUILTIN_VST16_N = 1120,
- ARC_SIMD_BUILTIN_VST32_N = 1121,
+ ARC_SIMD_BUILTIN_VINTI,
- ARC_SIMD_BUILTIN_VINTI = 1201,
+ ARC_SIMD_BUILTIN_DMA_IN,
+ ARC_SIMD_BUILTIN_DMA_OUT,
- ARC_SIMD_BUILTIN_END
+ ARC_SIMD_BUILTIN_END,
+ ARC_BUILTIN_END = ARC_SIMD_BUILTIN_END
};
/* A nop is needed between a 4 byte insn that sets the condition codes and
@@ -401,6 +408,13 @@ static bool arc_preserve_reload_p (rtx i
static rtx arc_delegitimize_address (rtx);
static bool arc_can_follow_jump (const_rtx follower, const_rtx followee);
+static void arc_copy_to_target (gimple_stmt_iterator *, struct gcc_target *,
+ tree, tree, tree);
+static void arc_copy_from_target (gimple_stmt_iterator *, struct gcc_target *,
+ tree, tree, tree);
+static void arc_build_call_on_target (gimple_stmt_iterator *,
+ struct gcc_target *, int, tree *);
+
static rtx frame_insn (rtx);
/* initialize the GCC target structure. */
@@ -517,6 +531,13 @@ static rtx frame_insn (rtx);
#undef TARGET_MAX_ANCHOR_OFFSET
#define TARGET_MAX_ANCHOR_OFFSET (1020)
+#undef TARGET_COPY_TO_TARGET
+#define TARGET_COPY_TO_TARGET arc_copy_to_target
+#undef TARGET_COPY_FROM_TARGET
+#define TARGET_COPY_FROM_TARGET arc_copy_from_target
+#undef TARGET_BUILD_CALL_ON_TARGET
+#define TARGET_BUILD_CALL_ON_TARGET arc_build_call_on_target
+
extern enum reg_class arc_secondary_reload (bool, rtx, enum reg_class,
enum machine_mode,
struct secondary_reload_info *);
@@ -5544,12 +5565,16 @@ arc_cannot_force_const_mem (rtx x)
}
+static tree arc_builtin_decls[ARC_BUILTIN_END];
+
/* Generic function to define a builtin */
#define def_mbuiltin(MASK, NAME, TYPE, CODE) \
do \
{ \
if (MASK) \
- add_builtin_function ((NAME), (TYPE), (CODE), BUILT_IN_MD, NULL, NULL_TREE); \
+ arc_builtin_decls[(CODE)] \
+ = add_builtin_function ((NAME), (TYPE), (CODE), BUILT_IN_MD, \
+ NULL, NULL_TREE); \
} \
while (0)
@@ -5871,6 +5896,37 @@ arc_expand_builtin (tree exp,
emit_insn (gen_unimp_s (const1_rtx));
return NULL_RTX;
+ case ARC_SIMD_BUILTIN_CALL:
+ int nargs, i;
+
+ icode = CODE_FOR_simd_call;
+ arg0 = CALL_EXPR_ARG (exp, 0); /* Ra */
+ mode0 = insn_data[icode].operand[0].mode;
+ op0 = expand_expr (arg0, NULL_RTX, mode0, EXPAND_NORMAL);
+ if (mode0 == VOIDmode)
+ mode0 = GET_MODE (op0);
+
+ if (! (*insn_data[icode].operand[0].predicate) (op0, mode0))
+ op0 = copy_to_mode_reg (mode0, op0);
+ nargs = call_expr_nargs (exp) - 1;
+ op1 = gen_rtx_PARALLEL (VOIDmode, rtvec_alloc (nargs));
+ for (i = 0; i < nargs; i++)
+ {
+ rtx reg;
+
+ arg0 = CALL_EXPR_ARG (exp, 1+i);
+ op0 = expand_expr (arg0, NULL_RTX, VOIDmode, EXPAND_NORMAL);
+ mode0 = GET_MODE (op0);
+ if (mode0 == VOIDmode)
+ mode0 = SImode;
+ reg = gen_rtx_REG (mode0, 66+i);
+ emit_move_insn (reg, op0);
+ XVECEXP (op1, 0, i) = reg;
+ }
+
+ emit_insn (gen_simd_call (op0, op1));
+ return NULL_RTX;
+
default:
break;
}
@@ -6905,7 +6961,10 @@ enum simd_insn_args_type {
void_Va_Ib_u8,
Va_Vb_Ic_u8,
- void_Va_u3_Ib_u8
+ void_Va_u3_Ib_u8,
+
+ void_Ra_Rb_Rc,
+ void_Ra
};
struct builtin_description
@@ -6914,8 +6973,6 @@ struct builtin_description
const enum insn_code icode;
const char * const name;
const enum arc_builtins code;
- const enum rtx_code comparison;
- const unsigned int flag;
};
static const struct builtin_description arc_simd_builtin_desc_list[] =
@@ -6923,7 +6980,7 @@ static const struct builtin_description
/* VVV builtins go first */
#define SIMD_BUILTIN(type,code, string, builtin) \
{ type,CODE_FOR_##code, "__builtin_arc_" string, \
- ARC_SIMD_BUILTIN_##builtin, UNKNOWN, 0 },
+ ARC_SIMD_BUILTIN_##builtin, },
SIMD_BUILTIN (Va_Vb_Vc, vaddaw_insn, "vaddaw", VADDAW)
SIMD_BUILTIN (Va_Vb_Vc, vaddw_insn, "vaddw", VADDW)
@@ -7051,6 +7108,10 @@ static const struct builtin_description
SIMD_BUILTIN (void_Va_u3_Ib_u8, vst32_n_insn, "vst32_n", VST32_N)
SIMD_BUILTIN (void_u6, vinti_insn, "vinti", VINTI)
+
+ SIMD_BUILTIN (void_Ra_Rb_Rc, simd_dma_in, "simd_dma_in", DMA_IN)
+ SIMD_BUILTIN (void_Ra_Rb_Rc, simd_dma_out, "simd_dma_out", DMA_OUT)
+ SIMD_BUILTIN (void_Ra, simd_call, "simd_call", CALL)
};
static void
@@ -7105,6 +7166,17 @@ arc_init_simd_builtins (void)
tree v8hi_ftype_v8hi
= build_function_type (V8HI_type_node, tree_cons (NULL_TREE, V8HI_type_node,endlink));
+ tree void_ftype_ptr_ptr_int
+ = build_function_type (void_type_node,
+ tree_cons (NULL_TREE, ptr_type_node,
+ tree_cons (NULL_TREE, ptr_type_node,
+ tree_cons (NULL_TREE,
+ integer_type_node,
+ endlink))));
+ tree void_ftype_fn
+ = build_function_type (void_type_node,
+ tree_cons (NULL_TREE, ptr_type_node, endlink));
+
/* These asserts have been introduced to ensure that the order of builtins
does not get messed up, else the initialization goes wrong */
gcc_assert (arc_simd_builtin_desc_list [0].args_type == Va_Vb_Vc);
@@ -7167,6 +7239,16 @@ arc_init_simd_builtins (void)
for (; arc_simd_builtin_desc_list [i].args_type == void_u6; i++)
def_mbuiltin (TARGET_SIMD_SET, arc_simd_builtin_desc_list [i].name, void_ftype_int, arc_simd_builtin_desc_list [i].code);
+ gcc_assert (arc_simd_builtin_desc_list [i].args_type == void_Ra_Rb_Rc);
+ for (; arc_simd_builtin_desc_list [i].args_type == void_Ra_Rb_Rc; i++)
+ def_mbuiltin (TARGET_SIMD_SET, arc_simd_builtin_desc_list[i].name,
+ void_ftype_ptr_ptr_int, arc_simd_builtin_desc_list[i].code);
+
+ gcc_assert (arc_simd_builtin_desc_list [i].args_type == void_Ra);
+ for (; arc_simd_builtin_desc_list [i].args_type == void_Ra; i++)
+ def_mbuiltin (TARGET_SIMD_SET, arc_simd_builtin_desc_list[i].name,
+ void_ftype_fn, arc_simd_builtin_desc_list[i].code);
+
gcc_assert(i == ARRAY_SIZE (arc_simd_builtin_desc_list));
}
@@ -7618,6 +7700,60 @@ arc_expand_simd_builtin (tree exp,
emit_insn (pat);
return NULL_RTX;
+ case void_Ra_Rb_Rc:
+ icode = d->icode;
+ arg0 = CALL_EXPR_ARG (exp, 0); /* Ra */
+ arg1 = CALL_EXPR_ARG (exp, 1); /* Rb */
+ arg2 = CALL_EXPR_ARG (exp, 2); /* Rc */
+
+ mode0 = insn_data[icode].operand[0].mode;
+ mode1 = insn_data[icode].operand[1].mode;
+ mode2 = insn_data[icode].operand[2].mode;
+
+ op0 = expand_expr (arg0, NULL_RTX, mode0, EXPAND_NORMAL);
+ if (mode0 == VOIDmode)
+ mode0 = GET_MODE (op0);
+ op1 = expand_expr (arg1, NULL_RTX, mode1, EXPAND_NORMAL);
+ if (mode1 == VOIDmode)
+ mode1 = GET_MODE (op1);
+ op2 = expand_expr (arg2, NULL_RTX, mode2, EXPAND_NORMAL);
+ if (mode2 == VOIDmode)
+ mode2 = GET_MODE (op2);
+
+ if (! (*insn_data[icode].operand[0].predicate) (op0, mode0))
+ op0 = copy_to_mode_reg (mode0, op0);
+ if (! (*insn_data[icode].operand[1].predicate) (op1, mode1))
+ op1 = copy_to_mode_reg (mode1, op1);
+ if (! (*insn_data[icode].operand[2].predicate) (op2, mode2))
+ op2 = copy_to_mode_reg (mode2, op2);
+
+ pat = GEN_FCN (icode) (op0, op1, op2);
+ if (! pat)
+ return 0;
+
+ emit_insn (pat);
+ return NULL_RTX;
+
+ case void_Ra:
+ icode = d->icode;
+ arg0 = CALL_EXPR_ARG (exp, 0); /* Ra */
+
+ mode0 = insn_data[icode].operand[0].mode;
+
+ op0 = expand_expr (arg0, NULL_RTX, mode0, EXPAND_NORMAL);
+ if (mode0 == VOIDmode)
+ mode0 = GET_MODE (op0);
+
+ if (! (*insn_data[icode].operand[0].predicate) (op0, mode0))
+ op0 = copy_to_mode_reg (mode0, op0);
+
+ pat = GEN_FCN (icode) (op0);
+ if (! pat)
+ return 0;
+
+ emit_insn (pat);
+ return NULL_RTX;
+
default:
gcc_unreachable ();
}
@@ -8886,6 +9022,45 @@ arc_dead_or_set_postreload_p (const_rtx
return 1;
}
+static void
+arc_copy_to_target (gimple_stmt_iterator *gsi, struct gcc_target *target,
+ tree dst, tree src, tree size)
+{
+ tree fn, t;
+
+ gcc_assert (strcmp (target->name, "mxp-elf") == 0);
+ fn = build_fold_addr_expr (arc_builtin_decls[ARC_SIMD_BUILTIN_DMA_IN]);
+ t = build_call_nary (void_type_node, fn, 3, dst, src, size);
+ force_gimple_operand_gsi (gsi, t, true, NULL_TREE, false,
+ GSI_CONTINUE_LINKING);
+}
+
+static void
+arc_copy_from_target (gimple_stmt_iterator *gsi, struct gcc_target *target,
+ tree dst, tree src, tree size)
+{
+ tree fn, t;
+
+ gcc_assert (strcmp (target->name, "mxp-elf") == 0);
+ fn = build_fold_addr_expr (arc_builtin_decls[ARC_SIMD_BUILTIN_DMA_OUT]);
+ t = build_call_nary (void_type_node, fn, 3, dst, src, size);
+ force_gimple_operand_gsi (gsi, t, true, NULL_TREE, false,
+ GSI_CONTINUE_LINKING);
+}
+
+static void
+arc_build_call_on_target (gimple_stmt_iterator *gsi, struct gcc_target *target,
+ int nargs, tree *args)
+{
+ tree fn, t;
+
+ gcc_assert (strcmp (target->name, "mxp-elf") == 0);
+ fn = build_fold_addr_expr (arc_builtin_decls[ARC_SIMD_BUILTIN_CALL]);
+ t = build_call_array (void_type_node, fn, nargs, args);
+ force_gimple_operand_gsi (gsi, t, true, NULL_TREE, false,
+ GSI_CONTINUE_LINKING);
+}
+
#include "gt-arc.h"
END_TARGET_SPECIFIC
Index: config/arc/arc.h
===================================================================
--- config/arc/arc.h (revision 148226)
+++ config/arc/arc.h (working copy)
@@ -467,8 +467,10 @@ if (GET_MODE_CLASS (MODE) == MODE_INT \
/* r63 is pc, r64-r127 = simd vregs, r128-r143 = simd dma config regs
r144, r145 = lp_start, lp_end
- and therefore the pseudo registers start from r146 */
-#define FIRST_PSEUDO_REGISTER 146
+ r146 = SDM (not really a register, but we pretend it is for dam_in / dma_out
+ patterns)
+ and therefore the pseudo registers start from r147 */
+#define FIRST_PSEUDO_REGISTER 147
/* 1 for registers that have pervasive standard uses
and are not available for the register allocator.
@@ -529,7 +531,7 @@ if (GET_MODE_CLASS (MODE) == MODE_INT \
\
0, 0, 0, 0, 0, 0, 0, 0, \
0, 0, 0, 0, 0, 0, 0, 0, \
- 1, 1}
+ 1, 1, 1}
/* 1 for registers not available across function calls.
These must include the FIXED_REGISTERS and also any
@@ -565,7 +567,7 @@ if (GET_MODE_CLASS (MODE) == MODE_INT \
\
0, 0, 0, 0, 0, 0, 0, 0, \
0, 0, 0, 0, 0, 0, 0, 0, \
- 1, 1}
+ 1, 1, 1}
/* Macro to conditionally modify fixed_regs/call_used_regs. */
@@ -1654,7 +1656,7 @@ extern char rname56[], rname57[], rname5
"vr56", "vr57", "vr58", "vr59", "vr60", "vr61", "vr62", "vr63", \
"dr0", "dr1", "dr2", "dr3", "dr4", "dr5", "dr6", "dr7", \
"dr0", "dr1", "dr2", "dr3", "dr4", "dr5", "dr6", "dr7", \
- "lp_start", "lp_end" \
+ "lp_start", "lp_end", "SDM" \
}
/* Entry to the insn conditionalizer. */
Index: config/arc/arc.md
===================================================================
--- config/arc/arc.md (revision 148226)
+++ config/arc/arc.md (working copy)
@@ -145,6 +145,7 @@ (define_constants
(CC_REG 61)
(LP_START 144)
(LP_END 145)
+ (SDM 146)
]
)
@@ -667,8 +668,8 @@ (define_expand "movhi"
"if (prepare_move_operands (operands, HImode)) DONE;")
(define_insn "*movhi_insn"
- [(set (match_operand:HI 0 "move_dest_operand" "=Rcq,Rcq#q,w, w,w,???w,Rcq#q,w,Rcq,S,r,m,???m,VUsc")
- (match_operand:HI 1 "move_src_operand" "cL,cP,Rcq#q,cL,I,?Rac, ?i,?i,T,Rcq,m,c,?Rac,i"))]
+ [(set (match_operand:HI 0 "move_dest_operand" "=Rcq,Rcq#q,w, w,w,???w,Rcq#q,w,Rcq,S,r,m,???m,VUsc,v")
+ (match_operand:HI 1 "move_src_operand" "cL,cP,Rcq#q,cL,I,?Rac, ?i,?i,T,Rcq,m,c,?Rac,i,c"))]
"register_operand (operands[0], HImode)
|| register_operand (operands[1], HImode)
|| (CONSTANT_P (operands[1])
@@ -690,10 +691,11 @@ (define_insn "*movhi_insn"
ldw%U1%V1 %0,%1
stw%U0%V0 %1,%0
stw%U0%V0 %1,%0
- stw%U0%V0 %S1,%0"
- [(set_attr "type" "move,move,move,move,move,move,move,move,load,store,load,store,store,store")
- (set_attr "iscompact" "maybe,maybe,maybe,false,false,false,maybe_limm,false,true,true,false,false,false,false")
- (set_attr "cond" "canuse,canuse_limm,canuse,canuse,canuse_limm,canuse,canuse,canuse,nocond,nocond,nocond,nocond,nocond,nocond")])
+ stw%U0%V0 %S1,%0
+ vmovw %0,%1,1"
+ [(set_attr "type" "move,move,move,move,move,move,move,move,load,store,load,store,store,store,move")
+ (set_attr "iscompact" "maybe,maybe,maybe,false,false,false,maybe_limm,false,true,true,false,false,false,false,false")
+ (set_attr "cond" "canuse,canuse_limm,canuse,canuse,canuse_limm,canuse,canuse,canuse,nocond,nocond,nocond,nocond,nocond,nocond,nocond")])
(define_expand "movsi"
[(set (match_operand:SI 0 "move_dest_operand" "")
Index: config/arc/t-arc
===================================================================
--- config/arc/t-arc (revision 148226)
+++ config/arc/t-arc (working copy)
@@ -84,6 +84,6 @@ $(T)profil-uclibc.o: $(srcdir)/config/ar
$(T)libgmon.a: $(T)mcount.o $(T)gmon.o $(T)dcache_linesz.o $(PROFILE_OSDEP)
$(AR_CREATE_FOR_TARGET) $@ $^
-$(out_object_file): gt-arc.h
+$(out_object_file): gt-arc.h $(GIMPLE_H) $(TREE_FLOW_H)
EXTRA_MULTILIB_PARTS = crtend.o crtbegin.o crtendS.o crtbeginS.o crti.o crtn.o libgmon.a crtg.o crtgend.o
Index: config/arc/arc-modes.def
===================================================================
--- config/arc/arc-modes.def (revision 148226)
+++ config/arc/arc-modes.def (working copy)
@@ -28,6 +28,7 @@ CC_MODE (CC_FP_GE);
CC_MODE (CC_FP_ORD);
CC_MODE (CC_FP_UNEQ);
CC_MODE (CC_FPX);
+CC_MODE (CC_BLK); /* BLKmode is not tracked by data flow... */
/* Vector modes. */
VECTOR_MODES (INT, 4); /* V4QI V2HI */
Index: config/arc/simdext.md
===================================================================
--- config/arc/simdext.md (revision 148226)
+++ config/arc/simdext.md (working copy)
@@ -131,6 +131,8 @@ (define_constants
(UNSPEC_ARC_SIMD_VCAST 1200)
(UNSPEC_ARC_SIMD_VINTI 1201)
+
+ (UNSPEC_ARC_SIMD_DMA 1202)
]
)
@@ -1311,3 +1313,41 @@ (define_insn "vinti_insn"
[(set_attr "type" "simd_vcontrol")
(set_attr "length" "4")
(set_attr "cond" "nocond")])
+
+;; DMA in/out for ARCompact / mxp interworking
+;; These are emitted on the ARCompact side.
+
+;; copy main memory starting at operand 0 to SDM starting at operand 1;
+;; transfer size is operand 2.
+(define_insn "simd_dma_in"
+ [(set (reg:CC_BLK SDM)
+ (unspec [(reg:CC_BLK SDM)
+ (mem:BLK (match_operand:SI 1 "nonmemory_operand"))
+ (match_operand 0 "nonmemory_operand")
+ (match_operand:SI 2 "nonmemory_operand")]
+ UNSPEC_ARC_SIMD_DMA))]
+ "TARGET_SIMD_SET"
+ "` dma_in %0 %1 %2"
+ [(set_attr "length" "42")])
+
+;; copy SDM starting at operand 0 to main memory starting at operand 1;
+;; transfer size is operand 2.
+(define_insn "simd_dma_out"
+ [(set (mem:BLK (match_operand:SI 1 "nonmemory_operand"))
+ (unspec [(reg:CC_BLK SDM)
+ (match_operand 0 "nonmemory_operand")
+ (match_operand:SI 2 "nonmemory_operand")]
+ UNSPEC_ARC_SIMD_DMA))]
+ "TARGET_SIMD_SET"
+ "` dma_out %0 %1 %2"
+ [(set_attr "length" "42")])
+
+(define_insn "simd_call"
+ [(set (reg:CC_BLK SDM)
+ (unspec [(match_operand 0 "nonmemory_operand")
+ (match_operand 1 "simd_arg_vector")
+ (reg:CC_BLK SDM)]
+ UNSPEC_ARC_SIMD_DMA))]
+ "TARGET_SIMD_SET"
+ "` simd call"
+ [(set_attr "length" "42")])
-------------- next part --------------
A non-text attachment was scrubbed...
Name: vloop.c
Type: text/x-csrc
Size: 188 bytes
Desc: not available
URL: <http://gcc.gnu.org/pipermail/gcc-patches/attachments/20090716/73e1f318/attachment.bin>
-------------- next part --------------
.file "vloop.c"
.cpu A5
.section .text
.align 4
.global f
.type f, @function
f:
.LFB0:
push_s blink
.LCFI0:
mov r0,1542
.LCFI1:
bl.d @__simd_malloc;1
sub_s sp,sp,8
extw r2,r0
mov_s r5,r2
add r5,r5,518
` dma_in r5 @c 512
mov_s r5,r0
add r5,r5,518
add_s r3,sp,8
stw r5,[sp,6]
mov_s r4,r0
vmovw vr2,r0,1
add r6,r2,6
add r5,r0,6
add r2,r2,1030
add r0,r0,1030
` dma_in r2 @b 512
stw.a r0,[r3,-6]
stw r5,[sp,4]
ld.a blink,[sp,8]
.LCFI2:
` dma_in r4 r3 6
` simd call
` dma_out r6 @a 512
.LCFI3:
j_s.d [blink]
add_s sp,sp,4
.LFE0:
.size f, .-f
.arch "mxp-elf"
.text
.balign 4
.type &f._loopfn.0, @function
&f._loopfn.0:
viv.1 i1,vr2
vmvw.3 vr5,vr62
vld16_2 vr5,[i1,0]
vld16_3 vr4,[i1,4]
vld16_2 vr4,[i1,2]
vmov.3 vr7,512
vmvw.2 vr4,vr62
.L8:
vmvw.1 vr4,vr5
vxsumwi.1 vr3,vr4,8
vxsumwi.1 vr2,vr5,4
vmvw.2 vr3,vr4
vmvw.2 vr2,vr4
vaddnaw.3 vr3,vr4,vr3
vaddnaw.3 vr2,vr4,vr2
viv.1 i2,vr3
viv.1 i1,vr2
vld128 vr3,[i2,0]
vld128 vr2,[i1,0]
vxsumwi.1 vr6,vr4,4
vmvw.2 vr6,vr4
vmov.3 vr8,16
vaddnaw.3 vr6,vr4,vr6
vaddnaw.3 vr5,vr5,vr8
vne.2 vr0,vr5,vr7
vjp.i1 @.L8
vaddnaw.255 vr2,vr3,vr2
viv.1 i2,vr6
vst128 vr2,[i2,0]
vjb vr31,pcl
vnop
vnop
vnop
.size &f._loopfn.0, .-&f._loopfn.0
.global a
.section .bss
.align 128
.type a, @object
.size a, 512
a:
.zero 512
.global b
.align 128
.type b, @object
.size b, 512
b:
.zero 512
.global c
.align 128
.type c, @object
.size c, 512
c:
.zero 512
.section .debug_frame,"",@progbits
.Lframe0:
.4byte @.LECIE0-@.LSCIE0
.LSCIE0:
.4byte 0xffffffff
.byte 0x1
.string ""
.uleb128 0x1
.sleb128 -4
.byte 0x1f
.byte 0xc
.uleb128 0x1c
.uleb128 0x0
.align 4
.LECIE0:
.LSFDE0:
.4byte @.LEFDE0-@.LASFDE0
.LASFDE0:
.4byte @.Lframe0
.4byte @.LFB0
.4byte @.LFE0-@.LFB0
.byte 0x4
.4byte @.LCFI0-@.LFB0
.byte 0xe
.uleb128 0x4
.byte 0x4
.4byte @.LCFI1-@.LCFI0
.byte 0xe
.uleb128 0xc
.byte 0x11
.uleb128 0x1f
.sleb128 1
.byte 0x4
.4byte @.LCFI3-@.LCFI1
.byte 0xe
.uleb128 0x8
.align 4
.LEFDE0:
.ident "GCC: (GNU) 4.4.0"
More information about the Gcc-patches
mailing list