This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
[PATCH] Masked load/store vectorization (take 6)
- From: Jakub Jelinek <jakub at redhat dot com>
- To: Richard Biener <rguenther at suse dot de>
- Cc: Sergey Ostanevich <sergos dot gnu at gmail dot com>, Richard Henderson <rth at redhat dot com>, gcc-patches at gcc dot gnu dot org
- Date: Fri, 29 Nov 2013 00:09:06 +0100
- Subject: [PATCH] Masked load/store vectorization (take 6)
- Authentication-results: sourceware.org; auth=none
- References: <20131022105614 dot GK30970 at tucnak dot zalov dot cz> <CAGYS_TLTdBm6d-iL=-Nex4SrPTni6Q6Z+MDuS7fjnCKiUM4=7A at mail dot gmail dot com> <CAGYS_TLjBfd6bLLh5Tc8Bysa9OXYdxfKKf_PgrqnPLpqJrrv2w at mail dot gmail dot com> <20131022132658 dot GM30970 at tucnak dot zalov dot cz> <CAGYS_T+UvX8YBUEpd5=vf8Xm5+kFDfY9pgsruiJ=6y=h_qEdnw at mail dot gmail dot com> <CAGYS_TJmppbJs-kbuxHnEr_swG+c1hDRg=wBywohFW=jBH8SEQ at mail dot gmail dot com> <20131023172220 dot GW30970 at tucnak dot zalov dot cz> <20131024111439 dot GZ30970 at tucnak dot zalov dot cz> <20131112142930 dot GT27813 at tucnak dot zalov dot cz> <alpine dot LNX dot 2 dot 00 dot 1311271608170 dot 8615 at zhemvz dot fhfr dot qr>
- Reply-to: Jakub Jelinek <jakub at redhat dot com>
On Wed, Nov 27, 2013 at 04:10:16PM +0100, Richard Biener wrote:
> As you pinged this ... can you re-post a patch with changelog that
> includes the followups as we decided?
Ok, here is the updated patch against latest trunk with the follow-ups
incorporated. Bootstrapped/regtested on x86_64-linux and i686-linux,
ok for trunk?
2013-11-28 Jakub Jelinek <jakub@redhat.com>
* tree-vectorizer.h (struct _loop_vec_info): Add scalar_loop field.
(LOOP_VINFO_SCALAR_LOOP): Define.
(slpeel_tree_duplicate_loop_to_edge_cfg): Add scalar_loop argument.
* config/i386/sse.md (maskload<mode>, maskstore<mode>): New expanders.
* tree-data-ref.c (struct data_ref_loc_d): Replace pos field with ref.
(get_references_in_stmt): Don't record operand addresses, but
operands themselves. Handle MASK_LOAD and MASK_STORE.
(find_data_references_in_stmt, graphite_find_data_references_in_stmt):
Adjust for the pos -> ref change.
* internal-fn.def (LOOP_VECTORIZED, MASK_LOAD, MASK_STORE): New
internal fns.
* tree-if-conv.c: Include target.h, expr.h, optabs.h and
tree-ssa-address.h.
(release_bb_predicate): New function.
(free_bb_predicate): Use it.
(reset_bb_predicate): Likewise. Don't unallocate bb->aux
just to immediately allocate it again.
(if_convertible_phi_p): Add any_mask_load_store argument, if true,
handle it like flag_tree_loop_if_convert_stores.
(insert_gimplified_predicates): Likewise. If bb dominates
loop->latch, call reset_bb_predicate.
(ifcvt_can_use_mask_load_store): New function.
(if_convertible_gimple_assign_stmt_p): Add any_mask_load_store
argument, check if some conditional loads or stores can't be
converted into MASK_LOAD or MASK_STORE.
(if_convertible_stmt_p): Add any_mask_load_store argument,
pass it down to if_convertible_gimple_assign_stmt_p.
(predicate_bbs): Don't return bool, only check if the last stmt
of a basic block is GIMPLE_COND and handle that. For basic blocks
that dominate loop->latch assume they don't need to be predicated.
(if_convertible_loop_p_1): Only call predicate_bbs if
flag_tree_loop_if_convert_stores and free_bb_predicate in that case
afterwards, check gimple_code of stmts here. Replace is_predicated
check with dominance check. Add any_mask_load_store argument,
pass it down to if_convertible_stmt_p and if_convertible_phi_p,
call if_convertible_phi_p only after all if_convertible_stmt_p
calls.
(if_convertible_loop_p): Add any_mask_load_store argument,
pass it down to if_convertible_loop_p_1.
(predicate_mem_writes): Emit MASK_LOAD and/or MASK_STORE calls.
(combine_blocks): Add any_mask_load_store argument, pass
it down to insert_gimplified_predicates and call predicate_mem_writes
if it is set. Call predicate_bbs.
(version_loop_for_if_conversion): New function.
(tree_if_conversion): Adjust if_convertible_loop_p and combine_blocks
calls. Return todo flags instead of bool, call
version_loop_for_if_conversion if if-conversion should be just
for the vectorized loops and nothing else.
(main_tree_if_conversion): Adjust caller. Don't call
tree_if_conversion for dont_vectorize loops if if-conversion
isn't explicitly enabled.
* tree-vect-data-refs.c (vect_check_gather): Handle
MASK_LOAD/MASK_STORE.
(vect_analyze_data_refs, vect_supportable_dr_alignment): Likewise.
* gimple.h (gimple_expr_type): Handle MASK_STORE.
* internal-fn.c (expand_LOOP_VECTORIZED, expand_MASK_LOAD,
expand_MASK_STORE): New functions.
* tree-vectorizer.c: Include tree-cfg.h and gimple-fold.h.
(vect_loop_vectorized_call, vect_loop_select): New functions.
(vectorize_loops): Don't try to vectorize loops with
loop->dont_vectorize set. Set LOOP_VINFO_SCALAR_LOOP for if-converted
loops, fold LOOP_VECTORIZED internal call depending on if loop
has been vectorized or not. Use vect_loop_select to attempt to
vectorize an if-converted loop before it's non-if-converted
counterpart. If outer loop vectorization is successful in that
case, ensure the loop in the soon to be dead non-if-converted loop
is not vectorized.
* tree-vect-loop-manip.c (slpeel_duplicate_current_defs_from_edges):
New function.
(slpeel_tree_duplicate_loop_to_edge_cfg): Add scalar_loop argument.
If non-NULL, copy basic blocks from scalar_loop instead of loop, but
still to loop's entry or exit edge.
(slpeel_tree_peel_loop_to_edge): Add scalar_loop argument, pass it
down to slpeel_tree_duplicate_loop_to_edge_cfg.
(vect_do_peeling_for_loop_bound, vect_do_peeling_for_loop_alignment):
Adjust callers.
(vect_loop_versioning): If LOOP_VINFO_SCALAR_LOOP, perform loop
versioning from that loop instead of LOOP_VINFO_LOOP, move it to the
right place in the CFG afterwards.
* tree-vect-loop.c (vect_determine_vectorization_factor): Handle
MASK_STORE.
* cfgloop.h (struct loop): Add dont_vectorize field.
* tree-loop-distribution.c (copy_loop_before): Adjust
slpeel_tree_duplicate_loop_to_edge_cfg caller.
* optabs.def (maskload_optab, maskstore_optab): New optabs.
* passes.def: Add a note that pass_vectorize must immediately follow
pass_if_conversion.
* tree-predcom.c (split_data_refs_to_components): Give up if
DR_STMT is a call.
* tree-vect-stmts.c (vect_mark_relevant): Don't crash if lhs
is NULL.
(exist_non_indexing_operands_for_use_p): Handle MASK_LOAD
and MASK_STORE.
(vectorizable_mask_load_store): New function.
(vectorizable_call): Call it for MASK_LOAD or MASK_STORE.
(vect_transform_stmt): Handle MASK_STORE.
* tree-ssa-phiopt.c (cond_if_else_store_replacement): Ignore
DR_STMT where lhs is NULL.
* gcc.dg/vect/vect-cond-11.c: New test.
* gcc.target/i386/vect-cond-1.c: New test.
* gcc.target/i386/avx2-gather-5.c: New test.
* gcc.target/i386/avx2-gather-6.c: New test.
* gcc.dg/vect/vect-mask-loadstore-1.c: New test.
* gcc.dg/vect/vect-mask-load-1.c: New test.
--- gcc/tree-vectorizer.h.jj 2013-11-28 09:18:11.771774932 +0100
+++ gcc/tree-vectorizer.h 2013-11-28 14:14:35.827362293 +0100
@@ -344,6 +344,10 @@ typedef struct _loop_vec_info {
fix it up. */
bool operands_swapped;
+ /* If if-conversion versioned this loop before conversion, this is the
+ loop version without if-conversion. */
+ struct loop *scalar_loop;
+
} *loop_vec_info;
/* Access Functions. */
@@ -376,6 +380,7 @@ typedef struct _loop_vec_info {
#define LOOP_VINFO_PEELING_FOR_GAPS(L) (L)->peeling_for_gaps
#define LOOP_VINFO_OPERANDS_SWAPPED(L) (L)->operands_swapped
#define LOOP_VINFO_PEELING_FOR_NITER(L) (L)->peeling_for_niter
+#define LOOP_VINFO_SCALAR_LOOP(L) (L)->scalar_loop
#define LOOP_REQUIRES_VERSIONING_FOR_ALIGNMENT(L) \
(L)->may_misalign_stmts.length () > 0
@@ -934,7 +939,8 @@ extern source_location vect_location;
in tree-vect-loop-manip.c. */
extern void slpeel_make_loop_iterate_ntimes (struct loop *, tree);
extern bool slpeel_can_duplicate_loop_p (const struct loop *, const_edge);
-struct loop *slpeel_tree_duplicate_loop_to_edge_cfg (struct loop *, edge);
+struct loop *slpeel_tree_duplicate_loop_to_edge_cfg (struct loop *,
+ struct loop *, edge);
extern void vect_loop_versioning (loop_vec_info, unsigned int, bool);
extern void vect_do_peeling_for_loop_bound (loop_vec_info, tree, tree,
unsigned int, bool);
--- gcc/config/i386/sse.md.jj 2013-11-23 15:20:47.452606456 +0100
+++ gcc/config/i386/sse.md 2013-11-28 14:13:57.562572366 +0100
@@ -14218,6 +14218,23 @@ (define_insn "<avx_avx2>_maskstore<ssemo
(set_attr "btver2_decode" "vector")
(set_attr "mode" "<sseinsnmode>")])
+(define_expand "maskload<mode>"
+ [(set (match_operand:V48_AVX2 0 "register_operand")
+ (unspec:V48_AVX2
+ [(match_operand:<sseintvecmode> 2 "register_operand")
+ (match_operand:V48_AVX2 1 "memory_operand")]
+ UNSPEC_MASKMOV))]
+ "TARGET_AVX")
+
+(define_expand "maskstore<mode>"
+ [(set (match_operand:V48_AVX2 0 "memory_operand")
+ (unspec:V48_AVX2
+ [(match_operand:<sseintvecmode> 2 "register_operand")
+ (match_operand:V48_AVX2 1 "register_operand")
+ (match_dup 0)]
+ UNSPEC_MASKMOV))]
+ "TARGET_AVX")
+
(define_insn_and_split "avx_<castmode><avxsizesuffix>_<castmode>"
[(set (match_operand:AVX256MODE2P 0 "nonimmediate_operand" "=x,m")
(unspec:AVX256MODE2P
--- gcc/tree-data-ref.c.jj 2013-11-27 18:02:48.050814182 +0100
+++ gcc/tree-data-ref.c 2013-11-28 14:13:57.592572476 +0100
@@ -4320,8 +4320,8 @@ compute_all_dependences (vec<data_refere
typedef struct data_ref_loc_d
{
- /* Position of the memory reference. */
- tree *pos;
+ /* The memory reference. */
+ tree ref;
/* True if the memory reference is read. */
bool is_read;
@@ -4336,7 +4336,7 @@ get_references_in_stmt (gimple stmt, vec
{
bool clobbers_memory = false;
data_ref_loc ref;
- tree *op0, *op1;
+ tree op0, op1;
enum gimple_code stmt_code = gimple_code (stmt);
/* ASM_EXPR and CALL_EXPR may embed arbitrary side effects.
@@ -4346,16 +4346,26 @@ get_references_in_stmt (gimple stmt, vec
&& !(gimple_call_flags (stmt) & ECF_CONST))
{
/* Allow IFN_GOMP_SIMD_LANE in their own loops. */
- if (gimple_call_internal_p (stmt)
- && gimple_call_internal_fn (stmt) == IFN_GOMP_SIMD_LANE)
- {
- struct loop *loop = gimple_bb (stmt)->loop_father;
- tree uid = gimple_call_arg (stmt, 0);
- gcc_assert (TREE_CODE (uid) == SSA_NAME);
- if (loop == NULL
- || loop->simduid != SSA_NAME_VAR (uid))
+ if (gimple_call_internal_p (stmt))
+ switch (gimple_call_internal_fn (stmt))
+ {
+ case IFN_GOMP_SIMD_LANE:
+ {
+ struct loop *loop = gimple_bb (stmt)->loop_father;
+ tree uid = gimple_call_arg (stmt, 0);
+ gcc_assert (TREE_CODE (uid) == SSA_NAME);
+ if (loop == NULL
+ || loop->simduid != SSA_NAME_VAR (uid))
+ clobbers_memory = true;
+ break;
+ }
+ case IFN_MASK_LOAD:
+ case IFN_MASK_STORE:
+ break;
+ default:
clobbers_memory = true;
- }
+ break;
+ }
else
clobbers_memory = true;
}
@@ -4369,15 +4379,15 @@ get_references_in_stmt (gimple stmt, vec
if (stmt_code == GIMPLE_ASSIGN)
{
tree base;
- op0 = gimple_assign_lhs_ptr (stmt);
- op1 = gimple_assign_rhs1_ptr (stmt);
+ op0 = gimple_assign_lhs (stmt);
+ op1 = gimple_assign_rhs1 (stmt);
- if (DECL_P (*op1)
- || (REFERENCE_CLASS_P (*op1)
- && (base = get_base_address (*op1))
+ if (DECL_P (op1)
+ || (REFERENCE_CLASS_P (op1)
+ && (base = get_base_address (op1))
&& TREE_CODE (base) != SSA_NAME))
{
- ref.pos = op1;
+ ref.ref = op1;
ref.is_read = true;
references->safe_push (ref);
}
@@ -4386,16 +4396,35 @@ get_references_in_stmt (gimple stmt, vec
{
unsigned i, n;
- op0 = gimple_call_lhs_ptr (stmt);
+ ref.is_read = false;
+ if (gimple_call_internal_p (stmt))
+ switch (gimple_call_internal_fn (stmt))
+ {
+ case IFN_MASK_LOAD:
+ ref.is_read = true;
+ case IFN_MASK_STORE:
+ ref.ref = build2 (MEM_REF,
+ ref.is_read
+ ? TREE_TYPE (gimple_call_lhs (stmt))
+ : TREE_TYPE (gimple_call_arg (stmt, 3)),
+ gimple_call_arg (stmt, 0),
+ gimple_call_arg (stmt, 1));
+ references->safe_push (ref);
+ return false;
+ default:
+ break;
+ }
+
+ op0 = gimple_call_lhs (stmt);
n = gimple_call_num_args (stmt);
for (i = 0; i < n; i++)
{
- op1 = gimple_call_arg_ptr (stmt, i);
+ op1 = gimple_call_arg (stmt, i);
- if (DECL_P (*op1)
- || (REFERENCE_CLASS_P (*op1) && get_base_address (*op1)))
+ if (DECL_P (op1)
+ || (REFERENCE_CLASS_P (op1) && get_base_address (op1)))
{
- ref.pos = op1;
+ ref.ref = op1;
ref.is_read = true;
references->safe_push (ref);
}
@@ -4404,11 +4433,11 @@ get_references_in_stmt (gimple stmt, vec
else
return clobbers_memory;
- if (*op0
- && (DECL_P (*op0)
- || (REFERENCE_CLASS_P (*op0) && get_base_address (*op0))))
+ if (op0
+ && (DECL_P (op0)
+ || (REFERENCE_CLASS_P (op0) && get_base_address (op0))))
{
- ref.pos = op0;
+ ref.ref = op0;
ref.is_read = false;
references->safe_push (ref);
}
@@ -4435,7 +4464,7 @@ find_data_references_in_stmt (struct loo
FOR_EACH_VEC_ELT (references, i, ref)
{
dr = create_data_ref (nest, loop_containing_stmt (stmt),
- *ref->pos, stmt, ref->is_read);
+ ref->ref, stmt, ref->is_read);
gcc_assert (dr != NULL);
datarefs->safe_push (dr);
}
@@ -4464,7 +4493,7 @@ graphite_find_data_references_in_stmt (l
FOR_EACH_VEC_ELT (references, i, ref)
{
- dr = create_data_ref (nest, loop, *ref->pos, stmt, ref->is_read);
+ dr = create_data_ref (nest, loop, ref->ref, stmt, ref->is_read);
gcc_assert (dr != NULL);
datarefs->safe_push (dr);
}
--- gcc/internal-fn.def.jj 2013-11-26 21:36:14.018329932 +0100
+++ gcc/internal-fn.def 2013-11-28 14:13:57.517569949 +0100
@@ -43,5 +43,8 @@ DEF_INTERNAL_FN (STORE_LANES, ECF_CONST
DEF_INTERNAL_FN (GOMP_SIMD_LANE, ECF_NOVOPS | ECF_LEAF | ECF_NOTHROW)
DEF_INTERNAL_FN (GOMP_SIMD_VF, ECF_CONST | ECF_LEAF | ECF_NOTHROW)
DEF_INTERNAL_FN (GOMP_SIMD_LAST_LANE, ECF_CONST | ECF_LEAF | ECF_NOTHROW)
+DEF_INTERNAL_FN (LOOP_VECTORIZED, ECF_NOVOPS | ECF_LEAF | ECF_NOTHROW)
+DEF_INTERNAL_FN (MASK_LOAD, ECF_PURE | ECF_LEAF)
+DEF_INTERNAL_FN (MASK_STORE, ECF_LEAF)
DEF_INTERNAL_FN (ANNOTATE, ECF_CONST | ECF_LEAF | ECF_NOTHROW)
DEF_INTERNAL_FN (UBSAN_NULL, ECF_LEAF | ECF_NOTHROW)
--- gcc/tree-if-conv.c.jj 2013-11-22 21:03:14.527852266 +0100
+++ gcc/tree-if-conv.c 2013-11-28 14:13:57.668572084 +0100
@@ -110,8 +110,12 @@ along with GCC; see the file COPYING3.
#include "tree-chrec.h"
#include "tree-data-ref.h"
#include "tree-scalar-evolution.h"
+#include "tree-ssa-address.h"
#include "tree-pass.h"
#include "dbgcnt.h"
+#include "target.h"
+#include "expr.h"
+#include "optabs.h"
/* List of basic blocks in if-conversion-suitable order. */
static basic_block *ifc_bbs;
@@ -194,39 +198,48 @@ init_bb_predicate (basic_block bb)
set_bb_predicate (bb, boolean_true_node);
}
-/* Free the predicate of basic block BB. */
+/* Release the SSA_NAMEs associated with the predicate of basic block BB,
+ but don't actually free it. */
static inline void
-free_bb_predicate (basic_block bb)
+release_bb_predicate (basic_block bb)
{
- gimple_seq stmts;
-
- if (!bb_has_predicate (bb))
- return;
-
- /* Release the SSA_NAMEs created for the gimplification of the
- predicate. */
- stmts = bb_predicate_gimplified_stmts (bb);
+ gimple_seq stmts = bb_predicate_gimplified_stmts (bb);
if (stmts)
{
gimple_stmt_iterator i;
for (i = gsi_start (stmts); !gsi_end_p (i); gsi_next (&i))
free_stmt_operands (gsi_stmt (i));
+ set_bb_predicate_gimplified_stmts (bb, NULL);
}
+}
+/* Free the predicate of basic block BB. */
+
+static inline void
+free_bb_predicate (basic_block bb)
+{
+ if (!bb_has_predicate (bb))
+ return;
+
+ release_bb_predicate (bb);
free (bb->aux);
bb->aux = NULL;
}
-/* Free the predicate of BB and reinitialize it with the true
- predicate. */
+/* Reinitialize predicate of BB with the true predicate. */
static inline void
reset_bb_predicate (basic_block bb)
{
- free_bb_predicate (bb);
- init_bb_predicate (bb);
+ if (!bb_has_predicate (bb))
+ init_bb_predicate (bb);
+ else
+ {
+ release_bb_predicate (bb);
+ set_bb_predicate (bb, boolean_true_node);
+ }
}
/* Returns a new SSA_NAME of type TYPE that is assigned the value of
@@ -464,7 +477,8 @@ bb_with_exit_edge_p (struct loop *loop,
- there is a virtual PHI in a BB other than the loop->header. */
static bool
-if_convertible_phi_p (struct loop *loop, basic_block bb, gimple phi)
+if_convertible_phi_p (struct loop *loop, basic_block bb, gimple phi,
+ bool any_mask_load_store)
{
if (dump_file && (dump_flags & TDF_DETAILS))
{
@@ -479,7 +493,7 @@ if_convertible_phi_p (struct loop *loop,
return false;
}
- if (flag_tree_loop_if_convert_stores)
+ if (flag_tree_loop_if_convert_stores || any_mask_load_store)
return true;
/* When the flag_tree_loop_if_convert_stores is not set, check
@@ -695,6 +709,78 @@ ifcvt_could_trap_p (gimple stmt, vec<dat
return gimple_could_trap_p (stmt);
}
+/* Return true if STMT could be converted into a masked load or store
+ (conditional load or store based on a mask computed from bb predicate). */
+
+static bool
+ifcvt_can_use_mask_load_store (gimple stmt)
+{
+ tree lhs, ref;
+ enum machine_mode mode, vmode;
+ optab op;
+ basic_block bb = gimple_bb (stmt);
+ unsigned int vector_sizes;
+
+ if (!(flag_tree_loop_vectorize || bb->loop_father->force_vect)
+ || bb->loop_father->dont_vectorize
+ || !gimple_assign_single_p (stmt)
+ || gimple_has_volatile_ops (stmt))
+ return false;
+
+ /* Check whether this is a load or store. */
+ lhs = gimple_assign_lhs (stmt);
+ if (TREE_CODE (lhs) != SSA_NAME)
+ {
+ if (!is_gimple_val (gimple_assign_rhs1 (stmt)))
+ return false;
+ op = maskstore_optab;
+ ref = lhs;
+ }
+ else if (gimple_assign_load_p (stmt))
+ {
+ op = maskload_optab;
+ ref = gimple_assign_rhs1 (stmt);
+ }
+ else
+ return false;
+
+ /* And whether REF isn't a MEM_REF with non-addressable decl. */
+ if (TREE_CODE (ref) == MEM_REF
+ && TREE_CODE (TREE_OPERAND (ref, 0)) == ADDR_EXPR
+ && DECL_P (TREE_OPERAND (TREE_OPERAND (ref, 0), 0))
+ && !TREE_ADDRESSABLE (TREE_OPERAND (TREE_OPERAND (ref, 0), 0)))
+ return false;
+
+ /* Mask should be integer mode of the same size as the load/store
+ mode. */
+ mode = TYPE_MODE (TREE_TYPE (lhs));
+ if (int_mode_for_mode (mode) == BLKmode)
+ return false;
+
+ /* See if there is any chance the mask load or store might be
+ vectorized. If not, punt. */
+ vmode = targetm.vectorize.preferred_simd_mode (mode);
+ if (!VECTOR_MODE_P (vmode))
+ return false;
+
+ if (optab_handler (op, vmode) != CODE_FOR_nothing)
+ return true;
+
+ vector_sizes = targetm.vectorize.autovectorize_vector_sizes ();
+ while (vector_sizes != 0)
+ {
+ unsigned int cur = 1 << floor_log2 (vector_sizes);
+ vector_sizes &= ~cur;
+ if (cur <= GET_MODE_SIZE (mode))
+ continue;
+ vmode = mode_for_vector (mode, cur / GET_MODE_SIZE (mode));
+ if (VECTOR_MODE_P (vmode)
+ && optab_handler (op, vmode) != CODE_FOR_nothing)
+ return true;
+ }
+ return false;
+}
+
/* Return true when STMT is if-convertible.
GIMPLE_ASSIGN statement is not if-convertible if,
@@ -704,7 +790,8 @@ ifcvt_could_trap_p (gimple stmt, vec<dat
static bool
if_convertible_gimple_assign_stmt_p (gimple stmt,
- vec<data_reference_p> refs)
+ vec<data_reference_p> refs,
+ bool *any_mask_load_store)
{
tree lhs = gimple_assign_lhs (stmt);
basic_block bb;
@@ -730,10 +817,21 @@ if_convertible_gimple_assign_stmt_p (gim
return false;
}
+ /* tree-into-ssa.c uses GF_PLF_1, so avoid it, because
+ in between if_convertible_loop_p and combine_blocks
+ we can perform loop versioning. */
+ gimple_set_plf (stmt, GF_PLF_2, false);
+
if (flag_tree_loop_if_convert_stores)
{
if (ifcvt_could_trap_p (stmt, refs))
{
+ if (ifcvt_can_use_mask_load_store (stmt))
+ {
+ gimple_set_plf (stmt, GF_PLF_2, true);
+ *any_mask_load_store = true;
+ return true;
+ }
if (dump_file && (dump_flags & TDF_DETAILS))
fprintf (dump_file, "tree could trap...\n");
return false;
@@ -743,6 +841,12 @@ if_convertible_gimple_assign_stmt_p (gim
if (gimple_assign_rhs_could_trap_p (stmt))
{
+ if (ifcvt_can_use_mask_load_store (stmt))
+ {
+ gimple_set_plf (stmt, GF_PLF_2, true);
+ *any_mask_load_store = true;
+ return true;
+ }
if (dump_file && (dump_flags & TDF_DETAILS))
fprintf (dump_file, "tree could trap...\n");
return false;
@@ -754,6 +858,12 @@ if_convertible_gimple_assign_stmt_p (gim
&& bb != bb->loop_father->header
&& !bb_with_exit_edge_p (bb->loop_father, bb))
{
+ if (ifcvt_can_use_mask_load_store (stmt))
+ {
+ gimple_set_plf (stmt, GF_PLF_2, true);
+ *any_mask_load_store = true;
+ return true;
+ }
if (dump_file && (dump_flags & TDF_DETAILS))
{
fprintf (dump_file, "LHS is not var\n");
@@ -772,7 +882,8 @@ if_convertible_gimple_assign_stmt_p (gim
- it is a GIMPLE_LABEL or a GIMPLE_COND. */
static bool
-if_convertible_stmt_p (gimple stmt, vec<data_reference_p> refs)
+if_convertible_stmt_p (gimple stmt, vec<data_reference_p> refs,
+ bool *any_mask_load_store)
{
switch (gimple_code (stmt))
{
@@ -782,7 +893,8 @@ if_convertible_stmt_p (gimple stmt, vec<
return true;
case GIMPLE_ASSIGN:
- return if_convertible_gimple_assign_stmt_p (stmt, refs);
+ return if_convertible_gimple_assign_stmt_p (stmt, refs,
+ any_mask_load_store);
case GIMPLE_CALL:
{
@@ -984,7 +1096,7 @@ get_loop_body_in_if_conv_order (const st
S1 will be predicated with "x", and
S2 will be predicated with "!x". */
-static bool
+static void
predicate_bbs (loop_p loop)
{
unsigned int i;
@@ -996,7 +1108,7 @@ predicate_bbs (loop_p loop)
{
basic_block bb = ifc_bbs[i];
tree cond;
- gimple_stmt_iterator itr;
+ gimple stmt;
/* The loop latch is always executed and has no extra conditions
to be processed: skip it. */
@@ -1006,53 +1118,38 @@ predicate_bbs (loop_p loop)
continue;
}
+ /* If dominance tells us this basic block is always executed, force
+ the condition to be true, this might help simplify other
+ conditions. */
+ if (dominated_by_p (CDI_DOMINATORS, loop->latch, bb))
+ reset_bb_predicate (bb);
cond = bb_predicate (bb);
-
- for (itr = gsi_start_bb (bb); !gsi_end_p (itr); gsi_next (&itr))
+ stmt = last_stmt (bb);
+ if (stmt && gimple_code (stmt) == GIMPLE_COND)
{
- gimple stmt = gsi_stmt (itr);
-
- switch (gimple_code (stmt))
- {
- case GIMPLE_LABEL:
- case GIMPLE_ASSIGN:
- case GIMPLE_CALL:
- case GIMPLE_DEBUG:
- break;
-
- case GIMPLE_COND:
- {
- tree c2;
- edge true_edge, false_edge;
- location_t loc = gimple_location (stmt);
- tree c = fold_build2_loc (loc, gimple_cond_code (stmt),
- boolean_type_node,
- gimple_cond_lhs (stmt),
- gimple_cond_rhs (stmt));
-
- /* Add new condition into destination's predicate list. */
- extract_true_false_edges_from_block (gimple_bb (stmt),
- &true_edge, &false_edge);
-
- /* If C is true, then TRUE_EDGE is taken. */
- add_to_dst_predicate_list (loop, true_edge,
- unshare_expr (cond),
- unshare_expr (c));
-
- /* If C is false, then FALSE_EDGE is taken. */
- c2 = build1_loc (loc, TRUTH_NOT_EXPR,
- boolean_type_node, unshare_expr (c));
- add_to_dst_predicate_list (loop, false_edge,
- unshare_expr (cond), c2);
-
- cond = NULL_TREE;
- break;
- }
+ tree c2;
+ edge true_edge, false_edge;
+ location_t loc = gimple_location (stmt);
+ tree c = fold_build2_loc (loc, gimple_cond_code (stmt),
+ boolean_type_node,
+ gimple_cond_lhs (stmt),
+ gimple_cond_rhs (stmt));
+
+ /* Add new condition into destination's predicate list. */
+ extract_true_false_edges_from_block (gimple_bb (stmt),
+ &true_edge, &false_edge);
+
+ /* If C is true, then TRUE_EDGE is taken. */
+ add_to_dst_predicate_list (loop, true_edge, unshare_expr (cond),
+ unshare_expr (c));
+
+ /* If C is false, then FALSE_EDGE is taken. */
+ c2 = build1_loc (loc, TRUTH_NOT_EXPR, boolean_type_node,
+ unshare_expr (c));
+ add_to_dst_predicate_list (loop, false_edge,
+ unshare_expr (cond), c2);
- default:
- /* Not handled yet in if-conversion. */
- return false;
- }
+ cond = NULL_TREE;
}
/* If current bb has only one successor, then consider it as an
@@ -1075,8 +1172,6 @@ predicate_bbs (loop_p loop)
reset_bb_predicate (loop->header);
gcc_assert (bb_predicate_gimplified_stmts (loop->header) == NULL
&& bb_predicate_gimplified_stmts (loop->latch) == NULL);
-
- return true;
}
/* Return true when LOOP is if-convertible. This is a helper function
@@ -1087,7 +1182,7 @@ static bool
if_convertible_loop_p_1 (struct loop *loop,
vec<loop_p> *loop_nest,
vec<data_reference_p> *refs,
- vec<ddr_p> *ddrs)
+ vec<ddr_p> *ddrs, bool *any_mask_load_store)
{
bool res;
unsigned int i;
@@ -1121,9 +1216,24 @@ if_convertible_loop_p_1 (struct loop *lo
exit_bb = bb;
}
- res = predicate_bbs (loop);
- if (!res)
- return false;
+ for (i = 0; i < loop->num_nodes; i++)
+ {
+ basic_block bb = ifc_bbs[i];
+ gimple_stmt_iterator gsi;
+
+ for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
+ switch (gimple_code (gsi_stmt (gsi)))
+ {
+ case GIMPLE_LABEL:
+ case GIMPLE_ASSIGN:
+ case GIMPLE_CALL:
+ case GIMPLE_DEBUG:
+ case GIMPLE_COND:
+ break;
+ default:
+ return false;
+ }
+ }
if (flag_tree_loop_if_convert_stores)
{
@@ -1135,6 +1245,7 @@ if_convertible_loop_p_1 (struct loop *lo
DR_WRITTEN_AT_LEAST_ONCE (dr) = -1;
DR_RW_UNCONDITIONALLY (dr) = -1;
}
+ predicate_bbs (loop);
}
for (i = 0; i < loop->num_nodes; i++)
@@ -1142,17 +1253,31 @@ if_convertible_loop_p_1 (struct loop *lo
basic_block bb = ifc_bbs[i];
gimple_stmt_iterator itr;
- for (itr = gsi_start_phis (bb); !gsi_end_p (itr); gsi_next (&itr))
- if (!if_convertible_phi_p (loop, bb, gsi_stmt (itr)))
- return false;
-
/* Check the if-convertibility of statements in predicated BBs. */
- if (is_predicated (bb))
+ if (!dominated_by_p (CDI_DOMINATORS, loop->latch, bb))
for (itr = gsi_start_bb (bb); !gsi_end_p (itr); gsi_next (&itr))
- if (!if_convertible_stmt_p (gsi_stmt (itr), *refs))
+ if (!if_convertible_stmt_p (gsi_stmt (itr), *refs,
+ any_mask_load_store))
return false;
}
+ if (flag_tree_loop_if_convert_stores)
+ for (i = 0; i < loop->num_nodes; i++)
+ free_bb_predicate (ifc_bbs[i]);
+
+ /* Checking PHIs needs to be done after stmts, as the fact whether there
+ are any masked loads or stores affects the tests. */
+ for (i = 0; i < loop->num_nodes; i++)
+ {
+ basic_block bb = ifc_bbs[i];
+ gimple_stmt_iterator itr;
+
+ for (itr = gsi_start_phis (bb); !gsi_end_p (itr); gsi_next (&itr))
+ if (!if_convertible_phi_p (loop, bb, gsi_stmt (itr),
+ *any_mask_load_store))
+ return false;
+ }
+
if (dump_file)
fprintf (dump_file, "Applying if-conversion\n");
@@ -1168,7 +1293,7 @@ if_convertible_loop_p_1 (struct loop *lo
- if its basic blocks and phi nodes are if convertible. */
static bool
-if_convertible_loop_p (struct loop *loop)
+if_convertible_loop_p (struct loop *loop, bool *any_mask_load_store)
{
edge e;
edge_iterator ei;
@@ -1209,7 +1334,8 @@ if_convertible_loop_p (struct loop *loop
refs.create (5);
ddrs.create (25);
stack_vec<loop_p, 3> loop_nest;
- res = if_convertible_loop_p_1 (loop, &loop_nest, &refs, &ddrs);
+ res = if_convertible_loop_p_1 (loop, &loop_nest, &refs, &ddrs,
+ any_mask_load_store);
if (flag_tree_loop_if_convert_stores)
{
@@ -1395,7 +1521,7 @@ predicate_all_scalar_phis (struct loop *
gimplification of the predicates. */
static void
-insert_gimplified_predicates (loop_p loop)
+insert_gimplified_predicates (loop_p loop, bool any_mask_load_store)
{
unsigned int i;
@@ -1404,7 +1530,8 @@ insert_gimplified_predicates (loop_p loo
basic_block bb = ifc_bbs[i];
gimple_seq stmts;
- if (!is_predicated (bb))
+ if (!is_predicated (bb)
+ || dominated_by_p (CDI_DOMINATORS, loop->latch, bb))
{
/* Do not insert statements for a basic block that is not
predicated. Also make sure that the predicate of the
@@ -1416,7 +1543,8 @@ insert_gimplified_predicates (loop_p loo
stmts = bb_predicate_gimplified_stmts (bb);
if (stmts)
{
- if (flag_tree_loop_if_convert_stores)
+ if (flag_tree_loop_if_convert_stores
+ || any_mask_load_store)
{
/* Insert the predicate of the BB just after the label,
as the if-conversion of memory writes will use this
@@ -1575,9 +1703,49 @@ predicate_mem_writes (loop_p loop)
}
for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
- if ((stmt = gsi_stmt (gsi))
- && gimple_assign_single_p (stmt)
- && gimple_vdef (stmt))
+ if ((stmt = gsi_stmt (gsi)) == NULL
+ || !gimple_assign_single_p (stmt))
+ continue;
+ else if (gimple_plf (stmt, GF_PLF_2))
+ {
+ tree lhs = gimple_assign_lhs (stmt);
+ tree rhs = gimple_assign_rhs1 (stmt);
+ tree ref, addr, ptr, masktype, mask_op0, mask_op1, mask;
+ gimple new_stmt;
+ int bitsize = GET_MODE_BITSIZE (TYPE_MODE (TREE_TYPE (lhs)));
+
+ masktype = build_nonstandard_integer_type (bitsize, 1);
+ mask_op0 = build_int_cst (masktype, swap ? 0 : -1);
+ mask_op1 = build_int_cst (masktype, swap ? -1 : 0);
+ ref = TREE_CODE (lhs) == SSA_NAME ? rhs : lhs;
+ addr = force_gimple_operand_gsi (&gsi, build_fold_addr_expr (ref),
+ true, NULL_TREE, true,
+ GSI_SAME_STMT);
+ cond = force_gimple_operand_gsi_1 (&gsi, unshare_expr (cond),
+ is_gimple_condexpr, NULL_TREE,
+ true, GSI_SAME_STMT);
+ mask = fold_build_cond_expr (masktype, unshare_expr (cond),
+ mask_op0, mask_op1);
+ mask = ifc_temp_var (masktype, mask, &gsi);
+ ptr = build_int_cst (reference_alias_ptr_type (ref), 0);
+ /* Copy points-to info if possible. */
+ if (TREE_CODE (addr) == SSA_NAME && !SSA_NAME_PTR_INFO (addr))
+ copy_ref_info (build2 (MEM_REF, TREE_TYPE (ref), addr, ptr),
+ ref);
+ if (TREE_CODE (lhs) == SSA_NAME)
+ {
+ new_stmt
+ = gimple_build_call_internal (IFN_MASK_LOAD, 3, addr,
+ ptr, mask);
+ gimple_call_set_lhs (new_stmt, lhs);
+ }
+ else
+ new_stmt
+ = gimple_build_call_internal (IFN_MASK_STORE, 4, addr, ptr,
+ mask, rhs);
+ gsi_replace (&gsi, new_stmt, false);
+ }
+ else if (gimple_vdef (stmt))
{
tree lhs = gimple_assign_lhs (stmt);
tree rhs = gimple_assign_rhs1 (stmt);
@@ -1647,7 +1815,7 @@ remove_conditions_and_labels (loop_p loo
blocks. Replace PHI nodes with conditional modify expressions. */
static void
-combine_blocks (struct loop *loop)
+combine_blocks (struct loop *loop, bool any_mask_load_store)
{
basic_block bb, exit_bb, merge_target_bb;
unsigned int orig_loop_num_nodes = loop->num_nodes;
@@ -1655,11 +1823,12 @@ combine_blocks (struct loop *loop)
edge e;
edge_iterator ei;
+ predicate_bbs (loop);
remove_conditions_and_labels (loop);
- insert_gimplified_predicates (loop);
+ insert_gimplified_predicates (loop, any_mask_load_store);
predicate_all_scalar_phis (loop);
- if (flag_tree_loop_if_convert_stores)
+ if (flag_tree_loop_if_convert_stores || any_mask_load_store)
predicate_mem_writes (loop);
/* Merge basic blocks: first remove all the edges in the loop,
@@ -1749,28 +1918,146 @@ combine_blocks (struct loop *loop)
ifc_bbs = NULL;
}
-/* If-convert LOOP when it is legal. For the moment this pass has no
- profitability analysis. Returns true when something changed. */
+/* Version LOOP before if-converting it, the original loop
+ will be then if-converted, the new copy of the loop will not,
+ and the LOOP_VECTORIZED internal call will be guarding which
+ loop to execute. The vectorizer pass will fold this
+ internal call into either true or false. */
static bool
+version_loop_for_if_conversion (struct loop *loop, bool *do_outer)
+{
+ struct loop *outer = loop_outer (loop);
+ basic_block cond_bb;
+ tree cond = make_ssa_name (boolean_type_node, NULL);
+ struct loop *new_loop;
+ gimple g;
+ gimple_stmt_iterator gsi;
+
+ if (do_outer)
+ {
+ *do_outer = false;
+ if (loop->inner == NULL
+ && outer->inner == loop
+ && loop->next == NULL
+ && loop_outer (outer)
+ && outer->num_nodes == 3 + loop->num_nodes
+ && loop_preheader_edge (loop)->src == outer->header
+ && single_exit (loop)
+ && outer->latch
+ && single_exit (loop)->dest == EDGE_PRED (outer->latch, 0)->src)
+ *do_outer = true;
+ }
+
+ g = gimple_build_call_internal (IFN_LOOP_VECTORIZED, 2,
+ build_int_cst (integer_type_node, loop->num),
+ integer_zero_node);
+ gimple_call_set_lhs (g, cond);
+
+ initialize_original_copy_tables ();
+ new_loop = loop_version (loop, cond, &cond_bb,
+ REG_BR_PROB_BASE, REG_BR_PROB_BASE,
+ REG_BR_PROB_BASE, true);
+ free_original_copy_tables ();
+ if (new_loop == NULL)
+ return false;
+ new_loop->dont_vectorize = true;
+ new_loop->force_vect = false;
+ gsi = gsi_last_bb (cond_bb);
+ gimple_call_set_arg (g, 1, build_int_cst (integer_type_node, new_loop->num));
+ gsi_insert_before (&gsi, g, GSI_SAME_STMT);
+ update_ssa (TODO_update_ssa);
+ if (do_outer == NULL)
+ {
+ gcc_assert (single_succ_p (loop->header));
+ gsi = gsi_last_bb (single_succ (loop->header));
+ gimple cond_stmt = gsi_stmt (gsi);
+ gsi_prev (&gsi);
+ g = gsi_stmt (gsi);
+ gcc_assert (gimple_code (cond_stmt) == GIMPLE_COND
+ && is_gimple_call (g)
+ && gimple_call_internal_p (g)
+ && gimple_call_internal_fn (g) == IFN_LOOP_VECTORIZED
+ && gimple_cond_lhs (cond_stmt) == gimple_call_lhs (g));
+ gimple_cond_set_lhs (cond_stmt, boolean_true_node);
+ update_stmt (cond_stmt);
+ gcc_assert (has_zero_uses (gimple_call_lhs (g)));
+ gsi_remove (&gsi, false);
+ gcc_assert (single_succ_p (new_loop->header));
+ gsi = gsi_last_bb (single_succ (new_loop->header));
+ cond_stmt = gsi_stmt (gsi);
+ gsi_prev (&gsi);
+ g = gsi_stmt (gsi);
+ gcc_assert (gimple_code (cond_stmt) == GIMPLE_COND
+ && is_gimple_call (g)
+ && gimple_call_internal_p (g)
+ && gimple_call_internal_fn (g) == IFN_LOOP_VECTORIZED
+ && gimple_cond_lhs (cond_stmt) == gimple_call_lhs (g)
+ && new_loop->inner
+ && new_loop->inner->next
+ && new_loop->inner->next->next == NULL);
+ struct loop *inner = new_loop->inner;
+ basic_block empty_bb = loop_preheader_edge (inner)->src;
+ gcc_assert (empty_block_p (empty_bb)
+ && single_pred_p (empty_bb)
+ && single_succ_p (empty_bb)
+ && single_pred (empty_bb) == single_succ (new_loop->header));
+ if (single_pred_edge (empty_bb)->flags & EDGE_TRUE_VALUE)
+ {
+ gimple_call_set_arg (g, 0, build_int_cst (integer_type_node,
+ inner->num));
+ gimple_call_set_arg (g, 0, build_int_cst (integer_type_node,
+ inner->next->num));
+ inner->next->dont_vectorize = true;
+ }
+ else
+ {
+ gimple_call_set_arg (g, 0, build_int_cst (integer_type_node,
+ inner->next->num));
+ gimple_call_set_arg (g, 0, build_int_cst (integer_type_node,
+ inner->num));
+ inner->dont_vectorize = true;
+ }
+ }
+ return true;
+}
+
+/* If-convert LOOP when it is legal. For the moment this pass has no
+ profitability analysis. Returns non-zero todo flags when something
+ changed. */
+
+static unsigned int
tree_if_conversion (struct loop *loop)
{
- bool changed = false;
+ unsigned int todo = 0;
+ bool version_outer_loop = false;
ifc_bbs = NULL;
+ bool any_mask_load_store = false;
- if (!if_convertible_loop_p (loop)
+ if (!if_convertible_loop_p (loop, &any_mask_load_store)
|| !dbg_cnt (if_conversion_tree))
goto cleanup;
+ if (any_mask_load_store
+ && ((!flag_tree_loop_vectorize && !loop->force_vect)
+ || loop->dont_vectorize))
+ goto cleanup;
+
+ if (any_mask_load_store
+ && !version_loop_for_if_conversion (loop, &version_outer_loop))
+ goto cleanup;
+
/* Now all statements are if-convertible. Combine all the basic
blocks into one huge basic block doing the if-conversion
on-the-fly. */
- combine_blocks (loop);
-
- if (flag_tree_loop_if_convert_stores)
- mark_virtual_operands_for_renaming (cfun);
+ combine_blocks (loop, any_mask_load_store);
- changed = true;
+ todo |= TODO_cleanup_cfg;
+ if (flag_tree_loop_if_convert_stores || any_mask_load_store)
+ {
+ mark_virtual_operands_for_renaming (cfun);
+ todo |= TODO_update_ssa_only_virtuals;
+ }
cleanup:
if (ifc_bbs)
@@ -1784,7 +2071,16 @@ tree_if_conversion (struct loop *loop)
ifc_bbs = NULL;
}
- return changed;
+ if (todo && version_outer_loop)
+ {
+ if (todo & TODO_update_ssa_only_virtuals)
+ {
+ update_ssa (TODO_update_ssa_only_virtuals);
+ todo &= ~TODO_update_ssa_only_virtuals;
+ }
+ version_loop_for_if_conversion (loop_outer (loop), NULL);
+ }
+ return todo;
}
/* Tree if-conversion pass management. */
@@ -1793,7 +2089,6 @@ static unsigned int
main_tree_if_conversion (void)
{
struct loop *loop;
- bool changed = false;
unsigned todo = 0;
if (number_of_loops (cfun) <= 1)
@@ -1802,15 +2097,9 @@ main_tree_if_conversion (void)
FOR_EACH_LOOP (loop, 0)
if (flag_tree_loop_if_convert == 1
|| flag_tree_loop_if_convert_stores == 1
- || flag_tree_loop_vectorize
- || loop->force_vect)
- changed |= tree_if_conversion (loop);
-
- if (changed)
- todo |= TODO_cleanup_cfg;
-
- if (changed && flag_tree_loop_if_convert_stores)
- todo |= TODO_update_ssa_only_virtuals;
+ || ((flag_tree_loop_vectorize || loop->force_vect)
+ && !loop->dont_vectorize))
+ todo |= tree_if_conversion (loop);
#ifdef ENABLE_CHECKING
{
--- gcc/tree-vect-data-refs.c.jj 2013-11-28 09:18:11.784774865 +0100
+++ gcc/tree-vect-data-refs.c 2013-11-28 14:13:57.617572349 +0100
@@ -2959,6 +2959,24 @@ vect_check_gather (gimple stmt, loop_vec
enum machine_mode pmode;
int punsignedp, pvolatilep;
+ base = DR_REF (dr);
+ /* For masked loads/stores, DR_REF (dr) is an artificial MEM_REF,
+ see if we can use the def stmt of the address. */
+ if (is_gimple_call (stmt)
+ && gimple_call_internal_p (stmt)
+ && (gimple_call_internal_fn (stmt) == IFN_MASK_LOAD
+ || gimple_call_internal_fn (stmt) == IFN_MASK_STORE)
+ && TREE_CODE (base) == MEM_REF
+ && TREE_CODE (TREE_OPERAND (base, 0)) == SSA_NAME
+ && integer_zerop (TREE_OPERAND (base, 1))
+ && !expr_invariant_in_loop_p (loop, TREE_OPERAND (base, 0)))
+ {
+ gimple def_stmt = SSA_NAME_DEF_STMT (TREE_OPERAND (base, 0));
+ if (is_gimple_assign (def_stmt)
+ && gimple_assign_rhs_code (def_stmt) == ADDR_EXPR)
+ base = TREE_OPERAND (gimple_assign_rhs1 (def_stmt), 0);
+ }
+
/* The gather builtins need address of the form
loop_invariant + vector * {1, 2, 4, 8}
or
@@ -2971,7 +2989,7 @@ vect_check_gather (gimple stmt, loop_vec
vectorized. The following code attempts to find such a preexistng
SSA_NAME OFF and put the loop invariants into a tree BASE
that can be gimplified before the loop. */
- base = get_inner_reference (DR_REF (dr), &pbitsize, &pbitpos, &off,
+ base = get_inner_reference (base, &pbitsize, &pbitpos, &off,
&pmode, &punsignedp, &pvolatilep, false);
gcc_assert (base != NULL_TREE && (pbitpos % BITS_PER_UNIT) == 0);
@@ -3468,7 +3486,10 @@ again:
offset = unshare_expr (DR_OFFSET (dr));
init = unshare_expr (DR_INIT (dr));
- if (is_gimple_call (stmt))
+ if (is_gimple_call (stmt)
+ && (!gimple_call_internal_p (stmt)
+ || (gimple_call_internal_fn (stmt) != IFN_MASK_LOAD
+ && gimple_call_internal_fn (stmt) != IFN_MASK_STORE)))
{
if (dump_enabled_p ())
{
@@ -5119,6 +5140,14 @@ vect_supportable_dr_alignment (struct da
if (aligned_access_p (dr) && !check_aligned_accesses)
return dr_aligned;
+ /* For now assume all conditional loads/stores support unaligned
+ access without any special code. */
+ if (is_gimple_call (stmt)
+ && gimple_call_internal_p (stmt)
+ && (gimple_call_internal_fn (stmt) == IFN_MASK_LOAD
+ || gimple_call_internal_fn (stmt) == IFN_MASK_STORE))
+ return dr_unaligned_supported;
+
if (loop_vinfo)
{
vect_loop = LOOP_VINFO_LOOP (loop_vinfo);
--- gcc/gimple.h.jj 2013-11-27 12:10:46.932896086 +0100
+++ gcc/gimple.h 2013-11-28 14:13:57.603572422 +0100
@@ -5670,7 +5670,13 @@ gimple_expr_type (const_gimple stmt)
useless conversion involved. That means returning the
original RHS type as far as we can reconstruct it. */
if (code == GIMPLE_CALL)
- type = gimple_call_return_type (stmt);
+ {
+ if (gimple_call_internal_p (stmt)
+ && gimple_call_internal_fn (stmt) == IFN_MASK_STORE)
+ type = TREE_TYPE (gimple_call_arg (stmt, 3));
+ else
+ type = gimple_call_return_type (stmt);
+ }
else
switch (gimple_assign_rhs_code (stmt))
{
--- gcc/internal-fn.c.jj 2013-11-26 21:36:14.218328913 +0100
+++ gcc/internal-fn.c 2013-11-28 14:13:57.661572121 +0100
@@ -153,6 +153,60 @@ expand_UBSAN_NULL (gimple stmt ATTRIBUTE
gcc_unreachable ();
}
+/* This should get folded in tree-vectorizer.c. */
+
+static void
+expand_LOOP_VECTORIZED (gimple stmt ATTRIBUTE_UNUSED)
+{
+ gcc_unreachable ();
+}
+
+static void
+expand_MASK_LOAD (gimple stmt)
+{
+ struct expand_operand ops[3];
+ tree type, lhs, rhs, maskt;
+ rtx mem, target, mask;
+
+ maskt = gimple_call_arg (stmt, 2);
+ lhs = gimple_call_lhs (stmt);
+ type = TREE_TYPE (lhs);
+ rhs = build2 (MEM_REF, type, gimple_call_arg (stmt, 0),
+ gimple_call_arg (stmt, 1));
+
+ mem = expand_expr (rhs, NULL_RTX, VOIDmode, EXPAND_WRITE);
+ gcc_assert (MEM_P (mem));
+ mask = expand_normal (maskt);
+ target = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE);
+ create_output_operand (&ops[0], target, TYPE_MODE (type));
+ create_fixed_operand (&ops[1], mem);
+ create_input_operand (&ops[2], mask, TYPE_MODE (TREE_TYPE (maskt)));
+ expand_insn (optab_handler (maskload_optab, TYPE_MODE (type)), 3, ops);
+}
+
+static void
+expand_MASK_STORE (gimple stmt)
+{
+ struct expand_operand ops[3];
+ tree type, lhs, rhs, maskt;
+ rtx mem, reg, mask;
+
+ maskt = gimple_call_arg (stmt, 2);
+ rhs = gimple_call_arg (stmt, 3);
+ type = TREE_TYPE (rhs);
+ lhs = build2 (MEM_REF, type, gimple_call_arg (stmt, 0),
+ gimple_call_arg (stmt, 1));
+
+ mem = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE);
+ gcc_assert (MEM_P (mem));
+ mask = expand_normal (maskt);
+ reg = expand_normal (rhs);
+ create_fixed_operand (&ops[0], mem);
+ create_input_operand (&ops[1], reg, TYPE_MODE (type));
+ create_input_operand (&ops[2], mask, TYPE_MODE (TREE_TYPE (maskt)));
+ expand_insn (optab_handler (maskstore_optab, TYPE_MODE (type)), 3, ops);
+}
+
/* Routines to expand each internal function, indexed by function number.
Each routine has the prototype:
--- gcc/tree-vectorizer.c.jj 2013-11-22 21:03:14.525852274 +0100
+++ gcc/tree-vectorizer.c 2013-11-28 15:10:33.364872892 +0100
@@ -75,11 +75,13 @@ along with GCC; see the file COPYING3.
#include "tree-phinodes.h"
#include "ssa-iterators.h"
#include "tree-ssa-loop-manip.h"
+#include "tree-cfg.h"
#include "cfgloop.h"
#include "tree-vectorizer.h"
#include "tree-pass.h"
#include "tree-ssa-propagate.h"
#include "dbgcnt.h"
+#include "gimple-fold.h"
/* Loop or bb location. */
source_location vect_location;
@@ -317,6 +319,68 @@ vect_destroy_datarefs (loop_vec_info loo
}
+/* If LOOP has been versioned during ifcvt, return the internal call
+ guarding it. */
+
+static gimple
+vect_loop_vectorized_call (struct loop *loop)
+{
+ basic_block bb = loop_preheader_edge (loop)->src;
+ gimple g;
+ do
+ {
+ g = last_stmt (bb);
+ if (g)
+ break;
+ if (!single_pred_p (bb))
+ break;
+ bb = single_pred (bb);
+ }
+ while (1);
+ if (g && gimple_code (g) == GIMPLE_COND)
+ {
+ gimple_stmt_iterator gsi = gsi_for_stmt (g);
+ gsi_prev (&gsi);
+ if (!gsi_end_p (gsi))
+ {
+ g = gsi_stmt (gsi);
+ if (is_gimple_call (g)
+ && gimple_call_internal_p (g)
+ && gimple_call_internal_fn (g) == IFN_LOOP_VECTORIZED
+ && (tree_to_shwi (gimple_call_arg (g, 0)) == loop->num
+ || tree_to_shwi (gimple_call_arg (g, 1)) == loop->num))
+ return g;
+ }
+ }
+ return NULL;
+}
+
+/* Helper function of vectorize_loops. If LOOP is non-if-converted
+ loop that has if-converted counterpart, return the if-converted
+ counterpart, so that we try vectorizing if-converted loops before
+ inner loops of non-if-converted loops. */
+
+static struct loop *
+vect_loop_select (struct loop *loop)
+{
+ if (!loop->dont_vectorize)
+ return loop;
+
+ gimple g = vect_loop_vectorized_call (loop);
+ if (g == NULL)
+ return loop;
+
+ if (tree_to_shwi (gimple_call_arg (g, 1)) != loop->num)
+ return loop;
+
+ struct loop *ifcvt_loop
+ = get_loop (cfun, tree_to_shwi (gimple_call_arg (g, 0)));
+ if (ifcvt_loop && !ifcvt_loop->dont_vectorize)
+ return ifcvt_loop;
+ return loop;
+}
+
+
/* Function vectorize_loops.
Entry point to loop vectorization phase. */
@@ -327,9 +391,11 @@ vectorize_loops (void)
unsigned int i;
unsigned int num_vectorized_loops = 0;
unsigned int vect_loops_num;
- struct loop *loop;
+ struct loop *loop, *iloop;
hash_table <simduid_to_vf> simduid_to_vf_htab;
hash_table <simd_array_to_simduid> simd_array_to_simduid_htab;
+ bool any_ifcvt_loops = false;
+ unsigned ret = 0;
vect_loops_num = number_of_loops (cfun);
@@ -351,9 +417,12 @@ vectorize_loops (void)
/* If some loop was duplicated, it gets bigger number
than all previously defined loops. This fact allows us to run
only over initial loops skipping newly generated ones. */
- FOR_EACH_LOOP (loop, 0)
- if ((flag_tree_loop_vectorize && optimize_loop_nest_for_speed_p (loop))
- || loop->force_vect)
+ FOR_EACH_LOOP (iloop, 0)
+ if ((loop = vect_loop_select (iloop))->dont_vectorize)
+ any_ifcvt_loops = true;
+ else if ((flag_tree_loop_vectorize
+ && optimize_loop_nest_for_speed_p (loop))
+ || loop->force_vect)
{
loop_vec_info loop_vinfo;
vect_location = find_loop_location (loop);
@@ -363,6 +432,10 @@ vectorize_loops (void)
LOCATION_FILE (vect_location),
LOCATION_LINE (vect_location));
+ /* Make sure we don't try to vectorize this loop
+ more than once. */
+ loop->dont_vectorize = true;
+
loop_vinfo = vect_analyze_loop (loop);
loop->aux = loop_vinfo;
@@ -372,6 +445,45 @@ vectorize_loops (void)
if (!dbg_cnt (vect_loop))
break;
+ gimple loop_vectorized_call = vect_loop_vectorized_call (loop);
+ if (loop_vectorized_call)
+ {
+ tree arg = gimple_call_arg (loop_vectorized_call, 1);
+ basic_block *bbs;
+ unsigned int i;
+ struct loop *scalar_loop = get_loop (cfun, tree_to_shwi (arg));
+ struct loop *inner;
+
+ LOOP_VINFO_SCALAR_LOOP (loop_vinfo) = scalar_loop;
+ gcc_checking_assert (vect_loop_vectorized_call
+ (LOOP_VINFO_SCALAR_LOOP (loop_vinfo))
+ == loop_vectorized_call);
+ bbs = get_loop_body (scalar_loop);
+ for (i = 0; i < scalar_loop->num_nodes; i++)
+ {
+ basic_block bb = bbs[i];
+ gimple_stmt_iterator gsi;
+ for (gsi = gsi_start_phis (bb); !gsi_end_p (gsi);
+ gsi_next (&gsi))
+ {
+ gimple phi = gsi_stmt (gsi);
+ gimple_set_uid (phi, 0);
+ }
+ for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi);
+ gsi_next (&gsi))
+ {
+ gimple stmt = gsi_stmt (gsi);
+ gimple_set_uid (stmt, 0);
+ }
+ }
+ free (bbs);
+ /* If we have successfully vectorized an if-converted outer
+ loop, don't attempt to vectorize the if-converted inner
+ loop of the alternate loop. */
+ for (inner = scalar_loop->inner; inner; inner = inner->next)
+ inner->dont_vectorize = true;
+ }
+
if (LOCATION_LOCUS (vect_location) != UNKNOWN_LOCATION
&& dump_enabled_p ())
dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, vect_location,
@@ -392,7 +504,29 @@ vectorize_loops (void)
*simduid_to_vf_htab.find_slot (simduid_to_vf_data, INSERT)
= simduid_to_vf_data;
}
+
+ if (loop_vectorized_call)
+ {
+ gimple g = loop_vectorized_call;
+ tree lhs = gimple_call_lhs (g);
+ gimple_stmt_iterator gsi = gsi_for_stmt (g);
+ gimplify_and_update_call_from_tree (&gsi, boolean_true_node);
+ gsi_next (&gsi);
+ if (!gsi_end_p (gsi))
+ {
+ g = gsi_stmt (gsi);
+ if (gimple_code (g) == GIMPLE_COND
+ && gimple_cond_lhs (g) == lhs)
+ {
+ gimple_cond_set_lhs (g, boolean_true_node);
+ update_stmt (g);
+ ret |= TODO_cleanup_cfg;
+ }
+ }
+ }
}
+ else
+ loop->dont_vectorize = true;
vect_location = UNKNOWN_LOCATION;
@@ -405,6 +539,34 @@ vectorize_loops (void)
/* ----------- Finalize. ----------- */
+ if (any_ifcvt_loops)
+ for (i = 1; i < vect_loops_num; i++)
+ {
+ loop = get_loop (cfun, i);
+ if (loop && loop->dont_vectorize)
+ {
+ gimple g = vect_loop_vectorized_call (loop);
+ if (g)
+ {
+ tree lhs = gimple_call_lhs (g);
+ gimple_stmt_iterator gsi = gsi_for_stmt (g);
+ gimplify_and_update_call_from_tree (&gsi, boolean_false_node);
+ gsi_next (&gsi);
+ if (!gsi_end_p (gsi))
+ {
+ g = gsi_stmt (gsi);
+ if (gimple_code (g) == GIMPLE_COND
+ && gimple_cond_lhs (g) == lhs)
+ {
+ gimple_cond_set_lhs (g, boolean_false_node);
+ update_stmt (g);
+ ret |= TODO_cleanup_cfg;
+ }
+ }
+ }
+ }
+ }
+
for (i = 1; i < vect_loops_num; i++)
{
loop_vec_info loop_vinfo;
@@ -462,7 +624,7 @@ vectorize_loops (void)
return TODO_cleanup_cfg;
}
- return 0;
+ return ret;
}
--- gcc/tree-vect-loop-manip.c.jj 2013-11-22 21:03:08.418882641 +0100
+++ gcc/tree-vect-loop-manip.c 2013-11-28 14:54:01.621096704 +0100
@@ -703,12 +703,42 @@ slpeel_make_loop_iterate_ntimes (struct
loop->nb_iterations = niters;
}
+/* Helper routine of slpeel_tree_duplicate_loop_to_edge_cfg.
+ For all PHI arguments in FROM->dest and TO->dest from those
+ edges ensure that TO->dest PHI arguments have current_def
+ to that in from. */
+
+static void
+slpeel_duplicate_current_defs_from_edges (edge from, edge to)
+{
+ gimple_stmt_iterator gsi_from, gsi_to;
+
+ for (gsi_from = gsi_start_phis (from->dest),
+ gsi_to = gsi_start_phis (to->dest);
+ !gsi_end_p (gsi_from) && !gsi_end_p (gsi_to);
+ gsi_next (&gsi_from), gsi_next (&gsi_to))
+ {
+ gimple from_phi = gsi_stmt (gsi_from);
+ gimple to_phi = gsi_stmt (gsi_to);
+ tree from_arg = PHI_ARG_DEF_FROM_EDGE (from_phi, from);
+ tree to_arg = PHI_ARG_DEF_FROM_EDGE (to_phi, to);
+ if (TREE_CODE (from_arg) == SSA_NAME
+ && TREE_CODE (to_arg) == SSA_NAME
+ && get_current_def (to_arg) == NULL_TREE)
+ set_current_def (to_arg, get_current_def (from_arg));
+ }
+}
+
/* Given LOOP this function generates a new copy of it and puts it
- on E which is either the entry or exit of LOOP. */
+ on E which is either the entry or exit of LOOP. If SCALAR_LOOP is
+ non-NULL, assume LOOP and SCALAR_LOOP are equivalent and copy the
+ basic blocks from SCALAR_LOOP instead of LOOP, but to either the
+ entry or exit of LOOP. */
struct loop *
-slpeel_tree_duplicate_loop_to_edge_cfg (struct loop *loop, edge e)
+slpeel_tree_duplicate_loop_to_edge_cfg (struct loop *loop,
+ struct loop *scalar_loop, edge e)
{
struct loop *new_loop;
basic_block *new_bbs, *bbs;
@@ -722,19 +752,22 @@ slpeel_tree_duplicate_loop_to_edge_cfg (
if (!at_exit && e != loop_preheader_edge (loop))
return NULL;
- bbs = XNEWVEC (basic_block, loop->num_nodes + 1);
- get_loop_body_with_size (loop, bbs, loop->num_nodes);
+ if (scalar_loop == NULL)
+ scalar_loop = loop;
+
+ bbs = XNEWVEC (basic_block, scalar_loop->num_nodes + 1);
+ get_loop_body_with_size (scalar_loop, bbs, scalar_loop->num_nodes);
/* Check whether duplication is possible. */
- if (!can_copy_bbs_p (bbs, loop->num_nodes))
+ if (!can_copy_bbs_p (bbs, scalar_loop->num_nodes))
{
free (bbs);
return NULL;
}
/* Generate new loop structure. */
- new_loop = duplicate_loop (loop, loop_outer (loop));
- duplicate_subloops (loop, new_loop);
+ new_loop = duplicate_loop (scalar_loop, loop_outer (scalar_loop));
+ duplicate_subloops (scalar_loop, new_loop);
exit_dest = exit->dest;
was_imm_dom = (get_immediate_dominator (CDI_DOMINATORS,
@@ -744,35 +777,80 @@ slpeel_tree_duplicate_loop_to_edge_cfg (
/* Also copy the pre-header, this avoids jumping through hoops to
duplicate the loop entry PHI arguments. Create an empty
pre-header unconditionally for this. */
- basic_block preheader = split_edge (loop_preheader_edge (loop));
+ basic_block preheader = split_edge (loop_preheader_edge (scalar_loop));
edge entry_e = single_pred_edge (preheader);
- bbs[loop->num_nodes] = preheader;
- new_bbs = XNEWVEC (basic_block, loop->num_nodes + 1);
+ bbs[scalar_loop->num_nodes] = preheader;
+ new_bbs = XNEWVEC (basic_block, scalar_loop->num_nodes + 1);
- copy_bbs (bbs, loop->num_nodes + 1, new_bbs,
+ exit = single_exit (scalar_loop);
+ copy_bbs (bbs, scalar_loop->num_nodes + 1, new_bbs,
&exit, 1, &new_exit, NULL,
e->src, true);
- basic_block new_preheader = new_bbs[loop->num_nodes];
+ exit = single_exit (loop);
+ basic_block new_preheader = new_bbs[scalar_loop->num_nodes];
- add_phi_args_after_copy (new_bbs, loop->num_nodes + 1, NULL);
+ add_phi_args_after_copy (new_bbs, scalar_loop->num_nodes + 1, NULL);
+
+ if (scalar_loop != loop)
+ {
+ /* If we copied from SCALAR_LOOP rather than LOOP, SSA_NAMEs from
+ SCALAR_LOOP will have current_def set to SSA_NAMEs in the new_loop,
+ but LOOP will not. slpeel_update_phi_nodes_for_guard{1,2} expects
+ the LOOP SSA_NAMEs (on the exit edge and edge from latch to
+ header) to have current_def set, so copy them over. */
+ slpeel_duplicate_current_defs_from_edges (single_exit (scalar_loop),
+ exit);
+ slpeel_duplicate_current_defs_from_edges (EDGE_SUCC (scalar_loop->latch,
+ 0),
+ EDGE_SUCC (loop->latch, 0));
+ }
if (at_exit) /* Add the loop copy at exit. */
{
+ if (scalar_loop != loop)
+ {
+ gimple_stmt_iterator gsi;
+ new_exit = redirect_edge_and_branch (new_exit, exit_dest);
+
+ for (gsi = gsi_start_phis (exit_dest); !gsi_end_p (gsi);
+ gsi_next (&gsi))
+ {
+ gimple phi = gsi_stmt (gsi);
+ tree orig_arg = PHI_ARG_DEF_FROM_EDGE (phi, e);
+ location_t orig_locus
+ = gimple_phi_arg_location_from_edge (phi, e);
+
+ add_phi_arg (phi, orig_arg, new_exit, orig_locus);
+ }
+ }
redirect_edge_and_branch_force (e, new_preheader);
flush_pending_stmts (e);
set_immediate_dominator (CDI_DOMINATORS, new_preheader, e->src);
if (was_imm_dom)
- set_immediate_dominator (CDI_DOMINATORS, exit_dest, new_loop->header);
+ set_immediate_dominator (CDI_DOMINATORS, exit_dest, new_exit->src);
/* And remove the non-necessary forwarder again. Keep the other
one so we have a proper pre-header for the loop at the exit edge. */
- redirect_edge_pred (single_succ_edge (preheader), single_pred (preheader));
+ redirect_edge_pred (single_succ_edge (preheader),
+ single_pred (preheader));
delete_basic_block (preheader);
- set_immediate_dominator (CDI_DOMINATORS, loop->header,
- loop_preheader_edge (loop)->src);
+ set_immediate_dominator (CDI_DOMINATORS, scalar_loop->header,
+ loop_preheader_edge (scalar_loop)->src);
}
else /* Add the copy at entry. */
{
+ if (scalar_loop != loop)
+ {
+ /* Remove the non-necessary forwarder of scalar_loop again. */
+ redirect_edge_pred (single_succ_edge (preheader),
+ single_pred (preheader));
+ delete_basic_block (preheader);
+ set_immediate_dominator (CDI_DOMINATORS, scalar_loop->header,
+ loop_preheader_edge (scalar_loop)->src);
+ preheader = split_edge (loop_preheader_edge (loop));
+ entry_e = single_pred_edge (preheader);
+ }
+
redirect_edge_and_branch_force (entry_e, new_preheader);
flush_pending_stmts (entry_e);
set_immediate_dominator (CDI_DOMINATORS, new_preheader, entry_e->src);
@@ -783,15 +861,39 @@ slpeel_tree_duplicate_loop_to_edge_cfg (
/* And remove the non-necessary forwarder again. Keep the other
one so we have a proper pre-header for the loop at the exit edge. */
- redirect_edge_pred (single_succ_edge (new_preheader), single_pred (new_preheader));
+ redirect_edge_pred (single_succ_edge (new_preheader),
+ single_pred (new_preheader));
delete_basic_block (new_preheader);
set_immediate_dominator (CDI_DOMINATORS, new_loop->header,
loop_preheader_edge (new_loop)->src);
}
- for (unsigned i = 0; i < loop->num_nodes+1; i++)
+ for (unsigned i = 0; i < scalar_loop->num_nodes + 1; i++)
rename_variables_in_bb (new_bbs[i]);
+ if (scalar_loop != loop)
+ {
+ /* Update new_loop->header PHIs, so that on the preheader
+ edge they are the ones from loop rather than scalar_loop. */
+ gimple_stmt_iterator gsi_orig, gsi_new;
+ edge orig_e = loop_preheader_edge (loop);
+ edge new_e = loop_preheader_edge (new_loop);
+
+ for (gsi_orig = gsi_start_phis (loop->header),
+ gsi_new = gsi_start_phis (new_loop->header);
+ !gsi_end_p (gsi_orig) && !gsi_end_p (gsi_new);
+ gsi_next (&gsi_orig), gsi_next (&gsi_new))
+ {
+ gimple orig_phi = gsi_stmt (gsi_orig);
+ gimple new_phi = gsi_stmt (gsi_new);
+ tree orig_arg = PHI_ARG_DEF_FROM_EDGE (orig_phi, orig_e);
+ location_t orig_locus
+ = gimple_phi_arg_location_from_edge (orig_phi, orig_e);
+
+ add_phi_arg (new_phi, orig_arg, new_e, orig_locus);
+ }
+ }
+
free (new_bbs);
free (bbs);
@@ -1002,6 +1104,8 @@ set_prologue_iterations (basic_block bb_
Input:
- LOOP: the loop to be peeled.
+ - SCALAR_LOOP: if non-NULL, the alternate loop from which basic blocks
+ should be copied.
- E: the exit or entry edge of LOOP.
If it is the entry edge, we peel the first iterations of LOOP. In this
case first-loop is LOOP, and second-loop is the newly created loop.
@@ -1043,8 +1147,8 @@ set_prologue_iterations (basic_block bb_
FORNOW the resulting code will not be in loop-closed-ssa form.
*/
-static struct loop*
-slpeel_tree_peel_loop_to_edge (struct loop *loop,
+static struct loop *
+slpeel_tree_peel_loop_to_edge (struct loop *loop, struct loop *scalar_loop,
edge e, tree *first_niters,
tree niters, bool update_first_loop_count,
unsigned int th, bool check_profitability,
@@ -1129,7 +1233,8 @@ slpeel_tree_peel_loop_to_edge (struct lo
orig_exit_bb:
*/
- if (!(new_loop = slpeel_tree_duplicate_loop_to_edge_cfg (loop, e)))
+ if (!(new_loop = slpeel_tree_duplicate_loop_to_edge_cfg (loop, scalar_loop,
+ e)))
{
loop_loc = find_loop_location (loop);
dump_printf_loc (MSG_MISSED_OPTIMIZATION, loop_loc,
@@ -1625,6 +1730,7 @@ vect_do_peeling_for_loop_bound (loop_vec
unsigned int th, bool check_profitability)
{
struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
+ struct loop *scalar_loop = LOOP_VINFO_SCALAR_LOOP (loop_vinfo);
struct loop *new_loop;
edge update_e;
basic_block preheader;
@@ -1641,11 +1747,12 @@ vect_do_peeling_for_loop_bound (loop_vec
loop_num = loop->num;
- new_loop = slpeel_tree_peel_loop_to_edge (loop, single_exit (loop),
- &ratio_mult_vf_name, ni_name, false,
- th, check_profitability,
- cond_expr, cond_expr_stmt_list,
- 0, LOOP_VINFO_VECT_FACTOR (loop_vinfo));
+ new_loop
+ = slpeel_tree_peel_loop_to_edge (loop, scalar_loop, single_exit (loop),
+ &ratio_mult_vf_name, ni_name, false,
+ th, check_profitability,
+ cond_expr, cond_expr_stmt_list,
+ 0, LOOP_VINFO_VECT_FACTOR (loop_vinfo));
gcc_assert (new_loop);
gcc_assert (loop_num == loop->num);
#ifdef ENABLE_CHECKING
@@ -1878,6 +1985,7 @@ vect_do_peeling_for_alignment (loop_vec_
unsigned int th, bool check_profitability)
{
struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
+ struct loop *scalar_loop = LOOP_VINFO_SCALAR_LOOP (loop_vinfo);
tree niters_of_prolog_loop;
tree wide_prolog_niters;
struct loop *new_loop;
@@ -1899,11 +2007,11 @@ vect_do_peeling_for_alignment (loop_vec_
/* Peel the prolog loop and iterate it niters_of_prolog_loop. */
new_loop =
- slpeel_tree_peel_loop_to_edge (loop, loop_preheader_edge (loop),
+ slpeel_tree_peel_loop_to_edge (loop, scalar_loop,
+ loop_preheader_edge (loop),
&niters_of_prolog_loop, ni_name, true,
th, check_profitability, NULL_TREE, NULL,
- bound,
- 0);
+ bound, 0);
gcc_assert (new_loop);
#ifdef ENABLE_CHECKING
@@ -2187,6 +2295,7 @@ vect_loop_versioning (loop_vec_info loop
unsigned int th, bool check_profitability)
{
struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
+ struct loop *scalar_loop = LOOP_VINFO_SCALAR_LOOP (loop_vinfo);
basic_block condition_bb;
gimple_stmt_iterator gsi, cond_exp_gsi;
basic_block merge_bb;
@@ -2222,8 +2331,43 @@ vect_loop_versioning (loop_vec_info loop
gimple_seq_add_seq (&cond_expr_stmt_list, gimplify_stmt_list);
initialize_original_copy_tables ();
- loop_version (loop, cond_expr, &condition_bb,
- prob, prob, REG_BR_PROB_BASE - prob, true);
+ if (scalar_loop)
+ {
+ edge scalar_e;
+ basic_block preheader, scalar_preheader;
+
+ /* We don't want to scale SCALAR_LOOP's frequencies, we need to
+ scale LOOP's frequencies instead. */
+ loop_version (scalar_loop, cond_expr, &condition_bb,
+ prob, REG_BR_PROB_BASE, REG_BR_PROB_BASE - prob, true);
+ scale_loop_frequencies (loop, prob, REG_BR_PROB_BASE);
+ /* CONDITION_BB was created above SCALAR_LOOP's preheader,
+ while we need to move it above LOOP's preheader. */
+ e = loop_preheader_edge (loop);
+ scalar_e = loop_preheader_edge (scalar_loop);
+ gcc_assert (empty_block_p (e->src)
+ && single_pred_p (e->src));
+ gcc_assert (empty_block_p (scalar_e->src)
+ && single_pred_p (scalar_e->src));
+ gcc_assert (single_pred_p (condition_bb));
+ preheader = e->src;
+ scalar_preheader = scalar_e->src;
+ scalar_e = find_edge (condition_bb, scalar_preheader);
+ e = single_pred_edge (preheader);
+ redirect_edge_and_branch_force (single_pred_edge (condition_bb),
+ scalar_preheader);
+ redirect_edge_and_branch_force (scalar_e, preheader);
+ redirect_edge_and_branch_force (e, condition_bb);
+ set_immediate_dominator (CDI_DOMINATORS, condition_bb,
+ single_pred (condition_bb));
+ set_immediate_dominator (CDI_DOMINATORS, scalar_preheader,
+ single_pred (scalar_preheader));
+ set_immediate_dominator (CDI_DOMINATORS, preheader,
+ condition_bb);
+ }
+ else
+ loop_version (loop, cond_expr, &condition_bb,
+ prob, prob, REG_BR_PROB_BASE - prob, true);
if (LOCATION_LOCUS (vect_location) != UNKNOWN_LOCATION
&& dump_enabled_p ())
@@ -2246,24 +2390,29 @@ vect_loop_versioning (loop_vec_info loop
basic block (i.e. it has two predecessors). Just in order to simplify
following transformations in the vectorizer, we fix this situation
here by adding a new (empty) block on the exit-edge of the loop,
- with the proper loop-exit phis to maintain loop-closed-form. */
+ with the proper loop-exit phis to maintain loop-closed-form.
+ If loop versioning wasn't done from loop, but scalar_loop instead,
+ merge_bb will have already just a single successor. */
merge_bb = single_exit (loop)->dest;
- gcc_assert (EDGE_COUNT (merge_bb->preds) == 2);
- new_exit_bb = split_edge (single_exit (loop));
- new_exit_e = single_exit (loop);
- e = EDGE_SUCC (new_exit_bb, 0);
-
- for (gsi = gsi_start_phis (merge_bb); !gsi_end_p (gsi); gsi_next (&gsi))
- {
- tree new_res;
- orig_phi = gsi_stmt (gsi);
- new_res = copy_ssa_name (PHI_RESULT (orig_phi), NULL);
- new_phi = create_phi_node (new_res, new_exit_bb);
- arg = PHI_ARG_DEF_FROM_EDGE (orig_phi, e);
- add_phi_arg (new_phi, arg, new_exit_e,
- gimple_phi_arg_location_from_edge (orig_phi, e));
- adjust_phi_and_debug_stmts (orig_phi, e, PHI_RESULT (new_phi));
+ if (scalar_loop == NULL || EDGE_COUNT (merge_bb->preds) >= 2)
+ {
+ gcc_assert (EDGE_COUNT (merge_bb->preds) >= 2);
+ new_exit_bb = split_edge (single_exit (loop));
+ new_exit_e = single_exit (loop);
+ e = EDGE_SUCC (new_exit_bb, 0);
+
+ for (gsi = gsi_start_phis (merge_bb); !gsi_end_p (gsi); gsi_next (&gsi))
+ {
+ tree new_res;
+ orig_phi = gsi_stmt (gsi);
+ new_res = copy_ssa_name (PHI_RESULT (orig_phi), NULL);
+ new_phi = create_phi_node (new_res, new_exit_bb);
+ arg = PHI_ARG_DEF_FROM_EDGE (orig_phi, e);
+ add_phi_arg (new_phi, arg, new_exit_e,
+ gimple_phi_arg_location_from_edge (orig_phi, e));
+ adjust_phi_and_debug_stmts (orig_phi, e, PHI_RESULT (new_phi));
+ }
}
--- gcc/tree-vect-loop.c.jj 2013-11-28 09:18:11.772774927 +0100
+++ gcc/tree-vect-loop.c 2013-11-28 14:13:57.643572214 +0100
@@ -374,7 +374,11 @@ vect_determine_vectorization_factor (loo
analyze_pattern_stmt = false;
}
- if (gimple_get_lhs (stmt) == NULL_TREE)
+ if (gimple_get_lhs (stmt) == NULL_TREE
+ /* MASK_STORE has no lhs, but is ok. */
+ && (!is_gimple_call (stmt)
+ || !gimple_call_internal_p (stmt)
+ || gimple_call_internal_fn (stmt) != IFN_MASK_STORE))
{
if (is_gimple_call (stmt))
{
@@ -426,7 +430,12 @@ vect_determine_vectorization_factor (loo
else
{
gcc_assert (!STMT_VINFO_DATA_REF (stmt_info));
- scalar_type = TREE_TYPE (gimple_get_lhs (stmt));
+ if (is_gimple_call (stmt)
+ && gimple_call_internal_p (stmt)
+ && gimple_call_internal_fn (stmt) == IFN_MASK_STORE)
+ scalar_type = TREE_TYPE (gimple_call_arg (stmt, 3));
+ else
+ scalar_type = TREE_TYPE (gimple_get_lhs (stmt));
if (dump_enabled_p ())
{
dump_printf_loc (MSG_NOTE, vect_location,
--- gcc/cfgloop.h.jj 2013-11-19 21:56:40.389335752 +0100
+++ gcc/cfgloop.h 2013-11-28 14:13:57.602572427 +0100
@@ -176,6 +176,9 @@ struct GTY ((chain_next ("%h.next"))) lo
/* True if we should try harder to vectorize this loop. */
bool force_vect;
+ /* True if this loop should never be vectorized. */
+ bool dont_vectorize;
+
/* For SIMD loops, this is a unique identifier of the loop, referenced
by IFN_GOMP_SIMD_VF, IFN_GOMP_SIMD_LANE and IFN_GOMP_SIMD_LAST_LANE
builtins. */
--- gcc/tree-loop-distribution.c.jj 2013-11-22 21:03:05.696896177 +0100
+++ gcc/tree-loop-distribution.c 2013-11-28 14:13:57.632572271 +0100
@@ -588,7 +588,7 @@ copy_loop_before (struct loop *loop)
edge preheader = loop_preheader_edge (loop);
initialize_original_copy_tables ();
- res = slpeel_tree_duplicate_loop_to_edge_cfg (loop, preheader);
+ res = slpeel_tree_duplicate_loop_to_edge_cfg (loop, NULL, preheader);
gcc_assert (res != NULL);
free_original_copy_tables ();
delete_update_ssa ();
--- gcc/optabs.def.jj 2013-11-26 21:36:14.066329682 +0100
+++ gcc/optabs.def 2013-11-28 14:13:57.624572312 +0100
@@ -248,6 +248,8 @@ OPTAB_D (sdot_prod_optab, "sdot_prod$I$a
OPTAB_D (ssum_widen_optab, "widen_ssum$I$a3")
OPTAB_D (udot_prod_optab, "udot_prod$I$a")
OPTAB_D (usum_widen_optab, "widen_usum$I$a3")
+OPTAB_D (maskload_optab, "maskload$a")
+OPTAB_D (maskstore_optab, "maskstore$a")
OPTAB_D (vec_extract_optab, "vec_extract$a")
OPTAB_D (vec_init_optab, "vec_init$a")
OPTAB_D (vec_pack_sfix_trunc_optab, "vec_pack_sfix_trunc_$a")
--- gcc/testsuite/gcc.target/i386/avx2-gather-6.c.jj 2013-11-28 14:13:57.633572267 +0100
+++ gcc/testsuite/gcc.target/i386/avx2-gather-6.c 2013-11-28 14:13:57.633572267 +0100
@@ -0,0 +1,7 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -mavx2 -fno-common -fdump-tree-vect-details" } */
+
+#include "avx2-gather-5.c"
+
+/* { dg-final { scan-tree-dump-times "note: vectorized 1 loops in function" 1 "vect" } } */
+/* { dg-final { cleanup-tree-dump "vect" } } */
--- gcc/testsuite/gcc.target/i386/vect-cond-1.c.jj 2013-11-28 14:57:58.182864189 +0100
+++ gcc/testsuite/gcc.target/i386/vect-cond-1.c 2013-11-28 14:57:58.182864189 +0100
@@ -0,0 +1,21 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -ftree-vectorize -mavx2" { target avx2 } } */
+
+int a[1024];
+
+int
+foo (int *p)
+{
+ int i;
+ for (i = 0; i < 1024; i++)
+ {
+ int t;
+ if (a[i] < 30)
+ t = *p;
+ else
+ t = a[i] + 12;
+ a[i] = t;
+ }
+}
+
+/* { dg-final { cleanup-tree-dump "vect" } } */
--- gcc/testsuite/gcc.target/i386/avx2-gather-5.c.jj 2013-11-28 14:13:57.633572267 +0100
+++ gcc/testsuite/gcc.target/i386/avx2-gather-5.c 2013-11-28 14:13:57.633572267 +0100
@@ -0,0 +1,47 @@
+/* { dg-do run } */
+/* { dg-require-effective-target avx2 } */
+/* { dg-options "-O3 -mavx2 -fno-common" } */
+
+#include "avx2-check.h"
+
+#define N 1024
+float vf1[N+16], vf2[N], vf3[N];
+int k[N];
+
+__attribute__((noinline, noclone)) void
+foo (void)
+{
+ int i;
+ for (i = 0; i < N; i++)
+ {
+ float f;
+ if (vf3[i] < 0.0f)
+ f = vf1[k[i]];
+ else
+ f = 7.0f;
+ vf2[i] = f;
+ }
+}
+
+static void
+avx2_test (void)
+{
+ int i;
+ for (i = 0; i < N + 16; i++)
+ {
+ vf1[i] = 5.5f * i;
+ if (i >= N)
+ continue;
+ vf2[i] = 2.0f;
+ vf3[i] = (i & 1) ? i : -i - 1;
+ k[i] = (i & 1) ? ((i & 2) ? -i : N / 2 + i) : (i * 7) % N;
+ asm ("");
+ }
+ foo ();
+ for (i = 0; i < N; i++)
+ if (vf1[i] != 5.5 * i
+ || vf2[i] != ((i & 1) ? 7.0f : 5.5f * ((i * 7) % N))
+ || vf3[i] != ((i & 1) ? i : -i - 1)
+ || k[i] != ((i & 1) ? ((i & 2) ? -i : N / 2 + i) : ((i * 7) % N)))
+ abort ();
+}
--- gcc/testsuite/gcc.dg/vect/vect-cond-11.c.jj 2013-11-28 14:13:57.634572262 +0100
+++ gcc/testsuite/gcc.dg/vect/vect-cond-11.c 2013-11-28 14:13:57.634572262 +0100
@@ -0,0 +1,116 @@
+#include "tree-vect.h"
+
+#define N 1024
+typedef int V __attribute__((vector_size (4)));
+unsigned int a[N * 2] __attribute__((aligned));
+unsigned int b[N * 2] __attribute__((aligned));
+V c[N];
+
+__attribute__((noinline, noclone)) unsigned int
+foo (unsigned int *a, unsigned int *b)
+{
+ int i;
+ unsigned int r = 0;
+ for (i = 0; i < N; i++)
+ {
+ unsigned int x = a[i], y = b[i];
+ if (x < 32)
+ {
+ x = x + 127;
+ y = y * 2;
+ }
+ else
+ {
+ x = x - 16;
+ y = y + 1;
+ }
+ a[i] = x;
+ b[i] = y;
+ r += x;
+ }
+ return r;
+}
+
+__attribute__((noinline, noclone)) unsigned int
+bar (unsigned int *a, unsigned int *b)
+{
+ int i;
+ unsigned int r = 0;
+ for (i = 0; i < N; i++)
+ {
+ unsigned int x = a[i], y = b[i];
+ if (x < 32)
+ {
+ x = x + 127;
+ y = y * 2;
+ }
+ else
+ {
+ x = x - 16;
+ y = y + 1;
+ }
+ a[i] = x;
+ b[i] = y;
+ c[i] = c[i] + 1;
+ r += x;
+ }
+ return r;
+}
+
+void
+baz (unsigned int *a, unsigned int *b,
+ unsigned int (*fn) (unsigned int *, unsigned int *))
+{
+ int i;
+ for (i = -64; i < 0; i++)
+ {
+ a[i] = 19;
+ b[i] = 17;
+ }
+ for (; i < N; i++)
+ {
+ a[i] = i - 512;
+ b[i] = i;
+ }
+ for (; i < N + 64; i++)
+ {
+ a[i] = 27;
+ b[i] = 19;
+ }
+ if (fn (a, b) != -512U - (N - 32) * 16U + 32 * 127U)
+ __builtin_abort ();
+ for (i = -64; i < 0; i++)
+ if (a[i] != 19 || b[i] != 17)
+ __builtin_abort ();
+ for (; i < N; i++)
+ if (a[i] != (i - 512U < 32U ? i - 512U + 127 : i - 512U - 16)
+ || b[i] != (i - 512U < 32U ? i * 2U : i + 1U))
+ __builtin_abort ();
+ for (; i < N + 64; i++)
+ if (a[i] != 27 || b[i] != 19)
+ __builtin_abort ();
+}
+
+int
+main ()
+{
+ int i;
+ check_vect ();
+ baz (a + 512, b + 512, foo);
+ baz (a + 512, b + 512, bar);
+ baz (a + 512 + 1, b + 512 + 1, foo);
+ baz (a + 512 + 1, b + 512 + 1, bar);
+ baz (a + 512 + 31, b + 512 + 31, foo);
+ baz (a + 512 + 31, b + 512 + 31, bar);
+ baz (a + 512 + 1, b + 512, foo);
+ baz (a + 512 + 1, b + 512, bar);
+ baz (a + 512 + 31, b + 512, foo);
+ baz (a + 512 + 31, b + 512, bar);
+ baz (a + 512, b + 512 + 1, foo);
+ baz (a + 512, b + 512 + 1, bar);
+ baz (a + 512, b + 512 + 31, foo);
+ baz (a + 512, b + 512 + 31, bar);
+ return 0;
+}
+
+/* { dg-final { cleanup-tree-dump "vect" } } */
--- gcc/testsuite/gcc.dg/vect/vect-mask-load-1.c.jj 2013-11-28 14:13:57.633572267 +0100
+++ gcc/testsuite/gcc.dg/vect/vect-mask-load-1.c 2013-11-28 14:13:57.633572267 +0100
@@ -0,0 +1,52 @@
+/* { dg-do run } */
+/* { dg-additional-options "-Ofast -fno-common" } */
+/* { dg-additional-options "-Ofast -fno-common -mavx" { target avx_runtime } } */
+
+#include <stdlib.h>
+#include "tree-vect.h"
+
+__attribute__((noinline, noclone)) void
+foo (double *x, double *y)
+{
+ double *p = __builtin_assume_aligned (x, 16);
+ double *q = __builtin_assume_aligned (y, 16);
+ double z, h;
+ int i;
+ for (i = 0; i < 1024; i++)
+ {
+ if (p[i] < 0.0)
+ z = q[i], h = q[i] * 7.0 + 3.0;
+ else
+ z = p[i] + 6.0, h = p[1024 + i];
+ p[i] = z + 2.0 * h;
+ }
+}
+
+double a[2048] __attribute__((aligned (16)));
+double b[1024] __attribute__((aligned (16)));
+
+int
+main ()
+{
+ int i;
+ check_vect ();
+ for (i = 0; i < 1024; i++)
+ {
+ a[i] = (i & 1) ? -i : 2 * i;
+ a[i + 1024] = i;
+ b[i] = 7 * i;
+ asm ("");
+ }
+ foo (a, b);
+ for (i = 0; i < 1024; i++)
+ if (a[i] != ((i & 1)
+ ? 7 * i + 2.0 * (7 * i * 7.0 + 3.0)
+ : 2 * i + 6.0 + 2.0 * i)
+ || b[i] != 7 * i
+ || a[i + 1024] != i)
+ abort ();
+ return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "note: vectorized 1 loops" 1 "vect" { target avx_runtime } } } */
+/* { dg-final { cleanup-tree-dump "vect" } } */
--- gcc/testsuite/gcc.dg/vect/vect-mask-loadstore-1.c.jj 2013-11-28 14:13:57.634572262 +0100
+++ gcc/testsuite/gcc.dg/vect/vect-mask-loadstore-1.c 2013-11-28 14:13:57.634572262 +0100
@@ -0,0 +1,50 @@
+/* { dg-do run } */
+/* { dg-additional-options "-Ofast -fno-common" } */
+/* { dg-additional-options "-Ofast -fno-common -mavx" { target avx_runtime } } */
+
+#include <stdlib.h>
+#include "tree-vect.h"
+
+__attribute__((noinline, noclone)) void
+foo (float *__restrict x, float *__restrict y, float *__restrict z)
+{
+ float *__restrict p = __builtin_assume_aligned (x, 32);
+ float *__restrict q = __builtin_assume_aligned (y, 32);
+ float *__restrict r = __builtin_assume_aligned (z, 32);
+ int i;
+ for (i = 0; i < 1024; i++)
+ {
+ if (p[i] < 0.0f)
+ q[i] = p[i] + 2.0f;
+ else
+ p[i] = r[i] + 3.0f;
+ }
+}
+
+float a[1024] __attribute__((aligned (32)));
+float b[1024] __attribute__((aligned (32)));
+float c[1024] __attribute__((aligned (32)));
+
+int
+main ()
+{
+ int i;
+ check_vect ();
+ for (i = 0; i < 1024; i++)
+ {
+ a[i] = (i & 1) ? -i : i;
+ b[i] = 7 * i;
+ c[i] = a[i] - 3.0f;
+ asm ("");
+ }
+ foo (a, b, c);
+ for (i = 0; i < 1024; i++)
+ if (a[i] != ((i & 1) ? -i : i)
+ || b[i] != ((i & 1) ? a[i] + 2.0f : 7 * i)
+ || c[i] != a[i] - 3.0f)
+ abort ();
+ return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "note: vectorized 1 loops" 1 "vect" { target avx_runtime } } } */
+/* { dg-final { cleanup-tree-dump "vect" } } */
--- gcc/passes.def.jj 2013-11-27 12:15:13.999517045 +0100
+++ gcc/passes.def 2013-11-28 14:13:57.602572427 +0100
@@ -217,6 +217,8 @@ along with GCC; see the file COPYING3.
NEXT_PASS (pass_iv_canon);
NEXT_PASS (pass_parallelize_loops);
NEXT_PASS (pass_if_conversion);
+ /* pass_vectorize must immediately follow pass_if_conversion.
+ Please do not add any other passes in between. */
NEXT_PASS (pass_vectorize);
PUSH_INSERT_PASSES_WITHIN (pass_vectorize)
NEXT_PASS (pass_dce_loop);
--- gcc/tree-predcom.c.jj 2013-11-22 21:03:14.589851957 +0100
+++ gcc/tree-predcom.c 2013-11-28 14:59:15.529464377 +0100
@@ -732,6 +732,9 @@ split_data_refs_to_components (struct lo
just fail. */
goto end;
}
+ /* predcom pass isn't prepared to handle calls with data references. */
+ if (is_gimple_call (DR_STMT (dr)))
+ goto end;
dr->aux = (void *) (size_t) i;
comp_father[i] = i;
comp_size[i] = 1;
--- gcc/tree-vect-stmts.c.jj 2013-11-27 12:15:14.038516844 +0100
+++ gcc/tree-vect-stmts.c 2013-11-28 14:57:58.182864189 +0100
@@ -235,7 +235,7 @@ vect_mark_relevant (vec<gimple> *worklis
/* This use is out of pattern use, if LHS has other uses that are
pattern uses, we should mark the stmt itself, and not the pattern
stmt. */
- if (TREE_CODE (lhs) == SSA_NAME)
+ if (lhs && TREE_CODE (lhs) == SSA_NAME)
FOR_EACH_IMM_USE_FAST (use_p, imm_iter, lhs)
{
if (is_gimple_debug (USE_STMT (use_p)))
@@ -393,7 +393,27 @@ exist_non_indexing_operands_for_use_p (t
first case, and whether var corresponds to USE. */
if (!gimple_assign_copy_p (stmt))
- return false;
+ {
+ if (is_gimple_call (stmt)
+ && gimple_call_internal_p (stmt))
+ switch (gimple_call_internal_fn (stmt))
+ {
+ case IFN_MASK_STORE:
+ operand = gimple_call_arg (stmt, 3);
+ if (operand == use)
+ return true;
+ /* FALLTHRU */
+ case IFN_MASK_LOAD:
+ operand = gimple_call_arg (stmt, 2);
+ if (operand == use)
+ return true;
+ break;
+ default:
+ break;
+ }
+ return false;
+ }
+
if (TREE_CODE (gimple_assign_lhs (stmt)) == SSA_NAME)
return false;
operand = gimple_assign_rhs1 (stmt);
@@ -1696,6 +1716,413 @@ vectorizable_function (gimple call, tree
vectype_in);
}
+
+static tree permute_vec_elements (tree, tree, tree, gimple,
+ gimple_stmt_iterator *);
+
+
+/* Function vectorizable_mask_load_store.
+
+ Check if STMT performs a conditional load or store that can be vectorized.
+ If VEC_STMT is also passed, vectorize the STMT: create a vectorized
+ stmt to replace it, put it in VEC_STMT, and insert it at GSI.
+ Return FALSE if not a vectorizable STMT, TRUE otherwise. */
+
+static bool
+vectorizable_mask_load_store (gimple stmt, gimple_stmt_iterator *gsi,
+ gimple *vec_stmt, slp_tree slp_node)
+{
+ tree vec_dest = NULL;
+ stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
+ stmt_vec_info prev_stmt_info;
+ loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
+ struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
+ bool nested_in_vect_loop = nested_in_vect_loop_p (loop, stmt);
+ struct data_reference *dr = STMT_VINFO_DATA_REF (stmt_info);
+ tree vectype = STMT_VINFO_VECTYPE (stmt_info);
+ tree elem_type;
+ gimple new_stmt;
+ tree dummy;
+ tree dataref_ptr = NULL_TREE;
+ gimple ptr_incr;
+ int nunits = TYPE_VECTOR_SUBPARTS (vectype);
+ int ncopies;
+ int i, j;
+ bool inv_p;
+ tree gather_base = NULL_TREE, gather_off = NULL_TREE;
+ tree gather_off_vectype = NULL_TREE, gather_decl = NULL_TREE;
+ int gather_scale = 1;
+ enum vect_def_type gather_dt = vect_unknown_def_type;
+ bool is_store;
+ tree mask;
+ gimple def_stmt;
+ tree def;
+ enum vect_def_type dt;
+
+ if (slp_node != NULL)
+ return false;
+
+ ncopies = LOOP_VINFO_VECT_FACTOR (loop_vinfo) / nunits;
+ gcc_assert (ncopies >= 1);
+
+ is_store = gimple_call_internal_fn (stmt) == IFN_MASK_STORE;
+ mask = gimple_call_arg (stmt, 2);
+ if (TYPE_PRECISION (TREE_TYPE (mask))
+ != GET_MODE_BITSIZE (TYPE_MODE (TREE_TYPE (vectype))))
+ return false;
+
+ /* FORNOW. This restriction should be relaxed. */
+ if (nested_in_vect_loop && ncopies > 1)
+ {
+ if (dump_enabled_p ())
+ dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+ "multiple types in nested loop.");
+ return false;
+ }
+
+ if (!STMT_VINFO_RELEVANT_P (stmt_info))
+ return false;
+
+ if (STMT_VINFO_DEF_TYPE (stmt_info) != vect_internal_def)
+ return false;
+
+ if (!STMT_VINFO_DATA_REF (stmt_info))
+ return false;
+
+ elem_type = TREE_TYPE (vectype);
+
+ if (STMT_VINFO_GROUPED_ACCESS (stmt_info))
+ return false;
+
+ if (STMT_VINFO_STRIDE_LOAD_P (stmt_info))
+ return false;
+
+ if (STMT_VINFO_GATHER_P (stmt_info))
+ {
+ gimple def_stmt;
+ tree def;
+ gather_decl = vect_check_gather (stmt, loop_vinfo, &gather_base,
+ &gather_off, &gather_scale);
+ gcc_assert (gather_decl);
+ if (!vect_is_simple_use_1 (gather_off, NULL, loop_vinfo, NULL,
+ &def_stmt, &def, &gather_dt,
+ &gather_off_vectype))
+ {
+ if (dump_enabled_p ())
+ dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+ "gather index use not simple.");
+ return false;
+ }
+ }
+ else if (tree_int_cst_compare (nested_in_vect_loop
+ ? STMT_VINFO_DR_STEP (stmt_info)
+ : DR_STEP (dr), size_zero_node) <= 0)
+ return false;
+ else if (optab_handler (is_store ? maskstore_optab : maskload_optab,
+ TYPE_MODE (vectype)) == CODE_FOR_nothing)
+ return false;
+
+ if (TREE_CODE (mask) != SSA_NAME)
+ return false;
+
+ if (!vect_is_simple_use (mask, stmt, loop_vinfo, NULL,
+ &def_stmt, &def, &dt))
+ return false;
+
+ if (is_store)
+ {
+ tree rhs = gimple_call_arg (stmt, 3);
+ if (!vect_is_simple_use (rhs, stmt, loop_vinfo, NULL,
+ &def_stmt, &def, &dt))
+ return false;
+ }
+
+ if (!vec_stmt) /* transformation not required. */
+ {
+ STMT_VINFO_TYPE (stmt_info) = call_vec_info_type;
+ if (is_store)
+ vect_model_store_cost (stmt_info, ncopies, false, dt,
+ NULL, NULL, NULL);
+ else
+ vect_model_load_cost (stmt_info, ncopies, false, NULL, NULL, NULL);
+ return true;
+ }
+
+ /** Transform. **/
+
+ if (STMT_VINFO_GATHER_P (stmt_info))
+ {
+ tree vec_oprnd0 = NULL_TREE, op;
+ tree arglist = TYPE_ARG_TYPES (TREE_TYPE (gather_decl));
+ tree rettype, srctype, ptrtype, idxtype, masktype, scaletype;
+ tree ptr, vec_mask = NULL_TREE, mask_op, var, scale;
+ tree perm_mask = NULL_TREE, prev_res = NULL_TREE;
+ edge pe = loop_preheader_edge (loop);
+ gimple_seq seq;
+ basic_block new_bb;
+ enum { NARROW, NONE, WIDEN } modifier;
+ int gather_off_nunits = TYPE_VECTOR_SUBPARTS (gather_off_vectype);
+
+ if (nunits == gather_off_nunits)
+ modifier = NONE;
+ else if (nunits == gather_off_nunits / 2)
+ {
+ unsigned char *sel = XALLOCAVEC (unsigned char, gather_off_nunits);
+ modifier = WIDEN;
+
+ for (i = 0; i < gather_off_nunits; ++i)
+ sel[i] = i | nunits;
+
+ perm_mask = vect_gen_perm_mask (gather_off_vectype, sel);
+ gcc_assert (perm_mask != NULL_TREE);
+ }
+ else if (nunits == gather_off_nunits * 2)
+ {
+ unsigned char *sel = XALLOCAVEC (unsigned char, nunits);
+ modifier = NARROW;
+
+ for (i = 0; i < nunits; ++i)
+ sel[i] = i < gather_off_nunits
+ ? i : i + nunits - gather_off_nunits;
+
+ perm_mask = vect_gen_perm_mask (vectype, sel);
+ gcc_assert (perm_mask != NULL_TREE);
+ ncopies *= 2;
+ }
+ else
+ gcc_unreachable ();
+
+ rettype = TREE_TYPE (TREE_TYPE (gather_decl));
+ srctype = TREE_VALUE (arglist); arglist = TREE_CHAIN (arglist);
+ ptrtype = TREE_VALUE (arglist); arglist = TREE_CHAIN (arglist);
+ idxtype = TREE_VALUE (arglist); arglist = TREE_CHAIN (arglist);
+ masktype = TREE_VALUE (arglist); arglist = TREE_CHAIN (arglist);
+ scaletype = TREE_VALUE (arglist);
+ gcc_checking_assert (types_compatible_p (srctype, rettype)
+ && types_compatible_p (srctype, masktype));
+
+ vec_dest = vect_create_destination_var (gimple_call_lhs (stmt), vectype);
+
+ ptr = fold_convert (ptrtype, gather_base);
+ if (!is_gimple_min_invariant (ptr))
+ {
+ ptr = force_gimple_operand (ptr, &seq, true, NULL_TREE);
+ new_bb = gsi_insert_seq_on_edge_immediate (pe, seq);
+ gcc_assert (!new_bb);
+ }
+
+ scale = build_int_cst (scaletype, gather_scale);
+
+ prev_stmt_info = NULL;
+ for (j = 0; j < ncopies; ++j)
+ {
+ if (modifier == WIDEN && (j & 1))
+ op = permute_vec_elements (vec_oprnd0, vec_oprnd0,
+ perm_mask, stmt, gsi);
+ else if (j == 0)
+ op = vec_oprnd0
+ = vect_get_vec_def_for_operand (gather_off, stmt, NULL);
+ else
+ op = vec_oprnd0
+ = vect_get_vec_def_for_stmt_copy (gather_dt, vec_oprnd0);
+
+ if (!useless_type_conversion_p (idxtype, TREE_TYPE (op)))
+ {
+ gcc_assert (TYPE_VECTOR_SUBPARTS (TREE_TYPE (op))
+ == TYPE_VECTOR_SUBPARTS (idxtype));
+ var = vect_get_new_vect_var (idxtype, vect_simple_var, NULL);
+ var = make_ssa_name (var, NULL);
+ op = build1 (VIEW_CONVERT_EXPR, idxtype, op);
+ new_stmt
+ = gimple_build_assign_with_ops (VIEW_CONVERT_EXPR, var,
+ op, NULL_TREE);
+ vect_finish_stmt_generation (stmt, new_stmt, gsi);
+ op = var;
+ }
+
+ if (j == 0)
+ vec_mask = vect_get_vec_def_for_operand (mask, stmt, NULL);
+ else
+ {
+ vect_is_simple_use (vec_mask, NULL, loop_vinfo, NULL, &def_stmt,
+ &def, &dt);
+ vec_mask = vect_get_vec_def_for_stmt_copy (dt, vec_mask);
+ }
+
+ mask_op = vec_mask;
+ if (!useless_type_conversion_p (masktype, TREE_TYPE (vec_mask)))
+ {
+ gcc_assert (TYPE_VECTOR_SUBPARTS (TREE_TYPE (mask_op))
+ == TYPE_VECTOR_SUBPARTS (masktype));
+ var = vect_get_new_vect_var (masktype, vect_simple_var, NULL);
+ var = make_ssa_name (var, NULL);
+ mask_op = build1 (VIEW_CONVERT_EXPR, masktype, mask_op);
+ new_stmt
+ = gimple_build_assign_with_ops (VIEW_CONVERT_EXPR, var,
+ mask_op, NULL_TREE);
+ vect_finish_stmt_generation (stmt, new_stmt, gsi);
+ mask_op = var;
+ }
+
+ new_stmt
+ = gimple_build_call (gather_decl, 5, mask_op, ptr, op, mask_op,
+ scale);
+
+ if (!useless_type_conversion_p (vectype, rettype))
+ {
+ gcc_assert (TYPE_VECTOR_SUBPARTS (vectype)
+ == TYPE_VECTOR_SUBPARTS (rettype));
+ var = vect_get_new_vect_var (rettype, vect_simple_var, NULL);
+ op = make_ssa_name (var, new_stmt);
+ gimple_call_set_lhs (new_stmt, op);
+ vect_finish_stmt_generation (stmt, new_stmt, gsi);
+ var = make_ssa_name (vec_dest, NULL);
+ op = build1 (VIEW_CONVERT_EXPR, vectype, op);
+ new_stmt
+ = gimple_build_assign_with_ops (VIEW_CONVERT_EXPR, var, op,
+ NULL_TREE);
+ }
+ else
+ {
+ var = make_ssa_name (vec_dest, new_stmt);
+ gimple_call_set_lhs (new_stmt, var);
+ }
+
+ vect_finish_stmt_generation (stmt, new_stmt, gsi);
+
+ if (modifier == NARROW)
+ {
+ if ((j & 1) == 0)
+ {
+ prev_res = var;
+ continue;
+ }
+ var = permute_vec_elements (prev_res, var,
+ perm_mask, stmt, gsi);
+ new_stmt = SSA_NAME_DEF_STMT (var);
+ }
+
+ if (prev_stmt_info == NULL)
+ STMT_VINFO_VEC_STMT (stmt_info) = *vec_stmt = new_stmt;
+ else
+ STMT_VINFO_RELATED_STMT (prev_stmt_info) = new_stmt;
+ prev_stmt_info = vinfo_for_stmt (new_stmt);
+ }
+ return true;
+ }
+ else if (is_store)
+ {
+ tree vec_rhs = NULL_TREE, vec_mask = NULL_TREE;
+ prev_stmt_info = NULL;
+ for (i = 0; i < ncopies; i++)
+ {
+ unsigned align, misalign;
+
+ if (i == 0)
+ {
+ tree rhs = gimple_call_arg (stmt, 3);
+ vec_rhs = vect_get_vec_def_for_operand (rhs, stmt, NULL);
+ vec_mask = vect_get_vec_def_for_operand (mask, stmt, NULL);
+ /* We should have catched mismatched types earlier. */
+ gcc_assert (useless_type_conversion_p (vectype,
+ TREE_TYPE (vec_rhs)));
+ dataref_ptr = vect_create_data_ref_ptr (stmt, vectype, NULL,
+ NULL_TREE, &dummy, gsi,
+ &ptr_incr, false, &inv_p);
+ gcc_assert (!inv_p);
+ }
+ else
+ {
+ vect_is_simple_use (vec_rhs, NULL, loop_vinfo, NULL, &def_stmt,
+ &def, &dt);
+ vec_rhs = vect_get_vec_def_for_stmt_copy (dt, vec_rhs);
+ vect_is_simple_use (vec_mask, NULL, loop_vinfo, NULL, &def_stmt,
+ &def, &dt);
+ vec_mask = vect_get_vec_def_for_stmt_copy (dt, vec_mask);
+ dataref_ptr = bump_vector_ptr (dataref_ptr, ptr_incr, gsi, stmt,
+ TYPE_SIZE_UNIT (vectype));
+ }
+
+ align = TYPE_ALIGN_UNIT (vectype);
+ if (aligned_access_p (dr))
+ misalign = 0;
+ else if (DR_MISALIGNMENT (dr) == -1)
+ {
+ align = TYPE_ALIGN_UNIT (elem_type);
+ misalign = 0;
+ }
+ else
+ misalign = DR_MISALIGNMENT (dr);
+ set_ptr_info_alignment (get_ptr_info (dataref_ptr), align,
+ misalign);
+ new_stmt
+ = gimple_build_call_internal (IFN_MASK_STORE, 4, dataref_ptr,
+ gimple_call_arg (stmt, 1),
+ vec_mask, vec_rhs);
+ vect_finish_stmt_generation (stmt, new_stmt, gsi);
+ if (i == 0)
+ STMT_VINFO_VEC_STMT (stmt_info) = *vec_stmt = new_stmt;
+ else
+ STMT_VINFO_RELATED_STMT (prev_stmt_info) = new_stmt;
+ prev_stmt_info = vinfo_for_stmt (new_stmt);
+ }
+ }
+ else
+ {
+ tree vec_mask = NULL_TREE;
+ prev_stmt_info = NULL;
+ vec_dest = vect_create_destination_var (gimple_call_lhs (stmt), vectype);
+ for (i = 0; i < ncopies; i++)
+ {
+ unsigned align, misalign;
+
+ if (i == 0)
+ {
+ vec_mask = vect_get_vec_def_for_operand (mask, stmt, NULL);
+ dataref_ptr = vect_create_data_ref_ptr (stmt, vectype, NULL,
+ NULL_TREE, &dummy, gsi,
+ &ptr_incr, false, &inv_p);
+ gcc_assert (!inv_p);
+ }
+ else
+ {
+ vect_is_simple_use (vec_mask, NULL, loop_vinfo, NULL, &def_stmt,
+ &def, &dt);
+ vec_mask = vect_get_vec_def_for_stmt_copy (dt, vec_mask);
+ dataref_ptr = bump_vector_ptr (dataref_ptr, ptr_incr, gsi, stmt,
+ TYPE_SIZE_UNIT (vectype));
+ }
+
+ align = TYPE_ALIGN_UNIT (vectype);
+ if (aligned_access_p (dr))
+ misalign = 0;
+ else if (DR_MISALIGNMENT (dr) == -1)
+ {
+ align = TYPE_ALIGN_UNIT (elem_type);
+ misalign = 0;
+ }
+ else
+ misalign = DR_MISALIGNMENT (dr);
+ set_ptr_info_alignment (get_ptr_info (dataref_ptr), align,
+ misalign);
+ new_stmt
+ = gimple_build_call_internal (IFN_MASK_LOAD, 3, dataref_ptr,
+ gimple_call_arg (stmt, 1),
+ vec_mask);
+ gimple_call_set_lhs (new_stmt, make_ssa_name (vec_dest, NULL));
+ vect_finish_stmt_generation (stmt, new_stmt, gsi);
+ if (i == 0)
+ STMT_VINFO_VEC_STMT (stmt_info) = *vec_stmt = new_stmt;
+ else
+ STMT_VINFO_RELATED_STMT (prev_stmt_info) = new_stmt;
+ prev_stmt_info = vinfo_for_stmt (new_stmt);
+ }
+ }
+
+ return true;
+}
+
+
/* Function vectorizable_call.
Check if STMT performs a function call that can be vectorized.
@@ -1738,6 +2165,12 @@ vectorizable_call (gimple stmt, gimple_s
if (!is_gimple_call (stmt))
return false;
+ if (gimple_call_internal_p (stmt)
+ && (gimple_call_internal_fn (stmt) == IFN_MASK_LOAD
+ || gimple_call_internal_fn (stmt) == IFN_MASK_STORE))
+ return vectorizable_mask_load_store (stmt, gsi, vec_stmt,
+ slp_node);
+
if (gimple_call_lhs (stmt) == NULL_TREE
|| TREE_CODE (gimple_call_lhs (stmt)) != SSA_NAME)
return false;
@@ -4051,10 +4484,6 @@ vectorizable_shift (gimple stmt, gimple_
}
-static tree permute_vec_elements (tree, tree, tree, gimple,
- gimple_stmt_iterator *);
-
-
/* Function vectorizable_operation.
Check if STMT performs a binary, unary or ternary operation that can
@@ -6567,6 +6996,10 @@ vect_transform_stmt (gimple stmt, gimple
case call_vec_info_type:
done = vectorizable_call (stmt, gsi, &vec_stmt, slp_node);
stmt = gsi_stmt (*gsi);
+ if (is_gimple_call (stmt)
+ && gimple_call_internal_p (stmt)
+ && gimple_call_internal_fn (stmt) == IFN_MASK_STORE)
+ is_store = true;
break;
case call_simd_clone_vec_info_type:
--- gcc/tree-ssa-phiopt.c.jj 2013-11-22 21:03:14.569852057 +0100
+++ gcc/tree-ssa-phiopt.c 2013-11-28 15:01:39.825688128 +0100
@@ -1706,7 +1706,7 @@ cond_if_else_store_replacement (basic_bl
== chrec_dont_know)
|| !then_datarefs.length ()
|| (find_data_references_in_bb (NULL, else_bb, &else_datarefs)
- == chrec_dont_know)
+ == chrec_dont_know)
|| !else_datarefs.length ())
{
free_data_refs (then_datarefs);
@@ -1723,6 +1723,8 @@ cond_if_else_store_replacement (basic_bl
then_store = DR_STMT (then_dr);
then_lhs = gimple_get_lhs (then_store);
+ if (then_lhs == NULL_TREE)
+ continue;
found = false;
FOR_EACH_VEC_ELT (else_datarefs, j, else_dr)
@@ -1732,6 +1734,8 @@ cond_if_else_store_replacement (basic_bl
else_store = DR_STMT (else_dr);
else_lhs = gimple_get_lhs (else_store);
+ if (else_lhs == NULL_TREE)
+ continue;
if (operand_equal_p (then_lhs, else_lhs, 0))
{
Jakub