This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: revised and updated new-if-converter patchâ [PATCH] fix PR46029: reimplement if conversion of loads and stores
- From: Richard Biener <richard dot guenther at gmail dot com>
- To: Jeff Law <law at redhat dot com>
- Cc: Abe <abe_skolnik at yahoo dot com>, Alan Lawrence <alan dot lawrence at arm dot com>, Sebastian Pop <sebpop at gmail dot com>, "gcc-patches at gcc dot gnu dot org" <gcc-patches at gcc dot gnu dot org>
- Date: Tue, 4 Aug 2015 13:09:48 +0200
- Subject: Re: revised and updated new-if-converter patchâ [PATCH] fix PR46029: reimplement if conversion of loads and stores
- Authentication-results: sourceware.org; auth=none
- References: <55A95E1A dot 9000301 at yahoo dot com> <55C04814 dot 5030606 at redhat dot com>
On Tue, Aug 4, 2015 at 7:05 AM, Jeff Law <law@redhat.com> wrote:
> On 07/17/2015 01:57 PM, Abe wrote:
>>
>> Dear all,
>>
>> Relative to the previous submission of this same patch, the below
>> corrects some minor spacing and/or indentation issues,
>> misc. other formatting fixes, and makes the disabled vectorization tests
>> be disabled via "xfail" rather than by adding spaces to
>> deliberately cause the relevant scanned-for text to not be found by
>> DejaGNU so as to prevent the DejaGNU line from being interpreted.
>>
>> The below is also based on a Git checkout that was rebased to the latest
>> upstream check-in from today,
>> so it should merge cleanly with trunk as of today.
>>
>> Regards,
>>
>> Abe
>>
>>
>>
>>
>>
>>
>>
>>
>> 2015-06-12 Sebastian Pop <s.pop@samsung.com>
>> Abe Skolnik <a.skolnik@samsung.com>
>>
>> PR tree-optimization/46029
>> * tree-data-ref.c (struct data_ref_loc_d): Moved...
>> (get_references_in_stmt): Exported.
>> * tree-data-ref.h (struct data_ref_loc_d): ... here.
>> (get_references_in_stmt): Declared.
>>
>> * tree-if-conv.c (struct ifc_dr): Removed.
>> (IFC_DR): Removed.
>> (DR_WRITTEN_AT_LEAST_ONCE): Removed.
>> (DR_RW_UNCONDITIONALLY): Removed.
>> (memrefs_read_or_written_unconditionally): Removed.
>> (write_memrefs_written_at_least_once): Removed.
>> (ifcvt_could_trap_p): Does not take refs parameter anymore.
>> (ifcvt_memrefs_wont_trap): Removed.
>> (has_non_addressable_refs): New.
>> (if_convertible_gimple_assign_stmt_p): Call has_non_addressable_refs.
>> Removed use of refs.
>> (if_convertible_stmt_p): Removed use of refs.
>> (if_convertible_gimple_assign_stmt_p): Same.
>> (if_convertible_loop_p_1): Removed use of refs. Remove
>> initialization
>> of dr->aux, DR_WRITTEN_AT_LEAST_ONCE, and DR_RW_UNCONDITIONALLY.
>> (insert_address_of): New.
>> (create_scratchpad): New.
>> (create_indirect_cond_expr): New.
>> (predicate_mem_writes): Call create_indirect_cond_expr. Take an
>> extra
>> parameter for scratch_pad.
>> (combine_blocks): Same.
>> (tree_if_conversion): Same.
>>
>> testsuite/
>> * g++.dg/tree-ssa/ifc-pr46029.C: New.
>> * gcc.dg/tree-ssa/ifc-5.c: Make it exactly like the FFmpeg kernel.
>> * gcc.dg/tree-ssa/ifc-8.c: New.
>> * gcc.dg/tree-ssa/ifc-9.c: New.
>> * gcc.dg/tree-ssa/ifc-10.c: New.
>> * gcc.dg/tree-ssa/ifc-11.c: New.
>> * gcc.dg/tree-ssa/ifc-12.c: New.
>> * gcc.dg/vect/if-cvt-stores-vect-ifcvt-18.c: Disabled.
>
>
>
>> diff --git a/gcc/common.opt b/gcc/common.opt
>> index 6b2ccbc..49f6b9f 100644
>> --- a/gcc/common.opt
>> +++ b/gcc/common.opt
>> @@ -1413,7 +1413,7 @@ Common Report Var(flag_tree_loop_if_convert)
>> Init(-1) Optimization
>> Convert conditional jumps in innermost loops to branchless equivalents
>>
>> ftree-loop-if-convert-stores
>> -Common Report Var(flag_tree_loop_if_convert_stores) Optimization
>> +Common Report Var(flag_tree_loop_if_convert_stores) Init(-1) Optimization
>> Also if-convert conditional jumps containing memory writes
>>
>> ; -finhibit-size-directive inhibits output of .size for ELF.
>
> I don't see this change mentioned anywhere in the ChangeLog. That seems to
> be a relatively common situation. I called out some of those issues, but
> didn't try to catch them all. Please make sure all changes are reflected in
> the ChangeLog.
>
>
>
>
>> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/cunroll-10.c
>> b/gcc/testsuite/gcc.dg/tree-ssa/cunroll-10.c
>> index f392fbe..775fcd5 100644
>> --- a/gcc/testsuite/gcc.dg/tree-ssa/cunroll-10.c
>> +++ b/gcc/testsuite/gcc.dg/tree-ssa/cunroll-10.c
>
> This change isn't mentioned in the ChangeLog.
>
>
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ifc-5.c
>>
>> b/gcc/testsuite/gcc.dg/tree-ssa/ifc-5.c
>> index 875d2d3..fc69ca2 100644
>> --- a/gcc/testsuite/gcc.dg/tree-ssa/ifc-5.c
>> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ifc-5.c
>> @@ -1,5 +1,5 @@
>> /* { dg-do compile } */
>> -/* { dg-options "-c -O2 -ftree-vectorize -fdump-tree-ifcvt-stats" {
>> target *-*-* } } */
>> +/* { dg-options "-c -O2 -ftree-vectorize -ftree-loop-if-convert-stores
>> -fdump-tree-ifcvt-stats" { target *-*-* } } */
>
> ISTM this really should be two tests, one with this code as-is, another that
> exactly matches the ffmpeg kernel.
>
>
>
>
>> diff --git a/gcc/testsuite/gcc.dg/vect/vect-mask-loadstore-1.c
>> b/gcc/testsuite/gcc.dg/vect/vect-mask-loadstore-1.c
>> index 11e9533..cbd3378 100644
>> --- a/gcc/testsuite/gcc.dg/vect/vect-mask-loadstore-1.c
>> +++ b/gcc/testsuite/gcc.dg/vect/vect-mask-loadstore-1.c
>
> I don't see this mentioned in the ChangeLog. It also doesn't look like you
> actually disabled the test. Obviously this will need to be addressed before
> your patch could go in.
>
>> diff --git a/gcc/testsuite/gcc.target/i386/avx2-gather-6.c
>> b/gcc/testsuite/gcc.target/i386/avx2-gather-6.c
>> index 180b490..aedc66a 100644
>> --- a/gcc/testsuite/gcc.target/i386/avx2-gather-6.c
>> +++ b/gcc/testsuite/gcc.target/i386/avx2-gather-6.c
>
> Not mentioned in the ChangeLog. xfail needs to be fixed.
>
> Similarly for the others were you added xfails.
>
>
>> diff --git a/gcc/tree-data-ref.c b/gcc/tree-data-ref.c
>> index e6ff4ef..2a1e27d 100644
>> --- a/gcc/tree-data-ref.c
>> +++ b/gcc/tree-data-ref.c
>> @@ -4363,22 +4363,11 @@ compute_all_dependences (vec<data_reference_p>
>> datarefs,
>> return true;
>> }
>>
>> -/* Describes a location of a memory reference. */
>> -
>> -typedef struct data_ref_loc_d
>> -{
>> - /* The memory reference. */
>> - tree ref;
>> -
>> - /* True if the memory reference is read. */
>> - bool is_read;
>> -} data_ref_loc;
>
> Moving data_ref_loc_d could potentially be split out and merged in
> immediately assuming we've agreed that you're likely going to need
> data_ref_loc from within tree-data-ref.c and tree-if-conv.c.
>
> Similarly for exporting get_references_in_stmt.
>
> Picking out these things that are necessary prerequisites, but not part of
> the core changes you need to make will make the overall review process
> simpler.
>
>
>
>> +ifcvt_could_trap_p (gimple stmt)
>
> Presumably you're able to consider something with a vuse which does not
> otherwise trap as non-trapping because of the use of the scratchpad?
>
> Perhaps you should clarify the comment to indicate what the "refined for the
> needs of the if-conversion" actually means. Remember, someone other than
> you might be reading this code in the future.
>
>
>> @@ -1459,8 +1309,7 @@ is_cond_scalar_reduction (gimple phi, gimple
>> *reduc, tree arg_0, tree arg_1,
>> if (PHI_ARG_DEF_FROM_EDGE (header_phi, latch_e) != PHI_RESULT (phi))
>> return false;
>>
>> - if (gimple_code (stmt) != GIMPLE_ASSIGN
>> - || gimple_has_volatile_ops (stmt))
>> + if (gimple_code (stmt) != GIMPLE_ASSIGN || gimple_has_volatile_ops
>> (stmt))
>> return false;
>
> This looks like a gratutious change. It probably doesn't matter either way.
> If you really want to make this change, please do so independent of your
> patch.
>
>
>>
>> if (!flow_bb_inside_loop_p (loop, gimple_bb (stmt)))
>> @@ -1882,8 +1731,7 @@ insert_gimplified_predicates (loop_p loop, bool
>> any_mask_load_store)
>> stmts = bb_predicate_gimplified_stmts (bb);
>> if (stmts)
>> {
>> - if (flag_tree_loop_if_convert_stores
>> - || any_mask_load_store)
>> + if (flag_tree_loop_if_convert_stores || any_mask_load_store)
>
> Similarly. Pure whitespace/formatting changes should be handled
> independently.
>
>> @@ -1925,6 +1773,80 @@ mask_exists (int size, vec<int> vec)
>> return -1;
>> }
>>
>> +/* Inserts at position GSI a statement "ADDRESS_OF_AI = &AI;" and
>> + returns the ADDRESS_OF_AI. */
>> +
>> +static tree
>> +insert_address_of (tree ai, gimple_stmt_iterator *gsi)
>
> Presumably we've verified ai is addressable prior to calling this routine
> :-)
+static tree
+insert_address_of (tree ai, gimple_stmt_iterator *gsi)
+{
+ tree addr_expr = build_fold_addr_expr (ai);
+ tree address_of_ai = create_tmp_reg (TREE_TYPE (addr_expr), "_ifc_");
+ gimple addr_stmt = gimple_build_assign (address_of_ai, addr_expr);
+
+ address_of_ai = make_ssa_name (address_of_ai, addr_stmt);
+ gimple_assign_set_lhs (addr_stmt, address_of_ai);
+ SSA_NAME_DEF_STMT (address_of_ai) = addr_stmt;
please simply use
gimple addr_stmt = gimple_build_assign (make_temp_ssa_name
(TREE_TYPE (addr_expr), NULL, "ifc"), addr_expr);
address_of_ai = gimple_assign_lhs (addr_stmt);
maybe you can simplify other code as well in this way.
As Jeff said you better made sure 'ai' is TREE_ADDRESSABLE (I suppose
this is always
the scratchpad). Otherwise you seriously confuse the alias machinery.
So do, at the
beginning of this function
gcc_assert (TREE_ADDRESSABLE (get_base_address (ai)));
>
>> +
>> +/* Insert at the beginning of the first basic block of the current
>> + function the allocation on the stack of N_BYTES of memory and
>> + return a pointer to this scratchpad memory. */
>> +
>> +static tree
>> +create_scratchpad (unsigned int n_bytes)
>> +{
>> + basic_block bb = single_succ ( ENTRY_BLOCK_PTR_FOR_FN(cfun) );
>
> Whitespace is wrong. Should be
> = single_succ (ENTRY_BLOCK_PTR_FOR_FN (cfun));
+static tree
+create_scratchpad (unsigned int n_bytes)
+{
+ basic_block bb = single_succ ( ENTRY_BLOCK_PTR_FOR_FN(cfun) );
+ gimple_stmt_iterator gsi = gsi_after_labels (bb);
+ tree x = build_int_cst (integer_type_node, n_bytes - 1);
+ tree elt_type = char_type_node;
+ tree array_type = build_array_type (elt_type, build_index_type (x));
+ tree base = create_tmp_reg (array_type, "scratch_pad");
+
+ return insert_address_of (base, &gsi);
also use create_tmp_var and add
TREE_ADDRESSABLE (base) = 1;
note that the vectorizer might want to align the scratchpad and aligning
stack locals can be expensive. So I wonder if using a global is better?
(yeah, you get false dependencies in the CPU with threads again, so
you could even use TLS vars...). Not sure if we really need to worry.
>
>> +/* Returns a memory reference to the pointer defined by the
>> + conditional expression: pointer = cond ? &A[i] : scratch_pad; and
>> + inserts this code at GSI. */
>> +
>> +static tree
>> +create_indirect_cond_expr (tree ai, tree cond, tree *scratch_pad,
>> + gimple_stmt_iterator *gsi, bool swap)
>> +{
>> + tree cond_expr;
>> + tree pointer;
>> + gimple pointer_stmt;
>> +
>> + /* address_of_ai = &A[i]; */
>> + tree address_of_ai = insert_address_of (ai, gsi);
>> +
>> + /* Allocate the scratch pad only once per function. */
>> + if (!*scratch_pad)
>> + *scratch_pad = create_scratchpad (64);
>
> Obviously the hardcoded 64 needs to change.
You can iterate over vector modes looking for the biggest one, also
consider word_mode.
>> +
>> + /* pointer = cond ? address_of_ai : scratch_pad; */
>> + pointer = create_tmp_reg (TREE_TYPE (address_of_ai), "_ifc_");
See above - make_temp_ssa_name
>> + cond_expr = build3 (COND_EXPR, TREE_TYPE (address_of_ai),
>> + unshare_expr (cond),
>> + swap ? *scratch_pad : address_of_ai,
>> + swap ? address_of_ai : *scratch_pad);
Don't use build3 but directly build the gimple assign with split ops.
> Note that the types of scratch_pad and address_of_ai might be different. I
> know that's allowed at the generic level, but I'm not sure that's allowed at
> the gimple level. Do you need to do a pointer conversion to stay safe WRT
> the gimple typesystem here?
No, all pointer types are compatible.
>
> One of the things that's unclear to me is how this all interacts with the
> alias oracle.
What matters with type-based aliasing is the actual dereference. Of course
points-to info is another story here - you better should make sure to make
that available for the new pointer you create, otherwise you'll create runtime
alias checks for no good reason in vectorization.
> Richi? Comments about that?
>
>
> Overall I like what I see and I think you're heading in the right direction.
> There's some things we ought to break out and commit now which will keep the
> review work easier in the long term.
So I thought about the way you if-convert and the way to vectorize
that. We end up
with, say
void foo (float *p)
{
...
char scratchpad[];
...
for (i...;;)
{
void *p = cond ? &p[i] : &scratchpad;
*p = ...;
}
for example. Now to vectorize this with scatters (_only_ supported
with AVX512!)
you need to be able to reduce the address to a single base address plus a
vector of scaled indexes. Obviously the scratchpad and p do not have the same
base and for SFmode scatters you have only 4 bytes to represent the index.
This means you won't be able to use scatters here at all!
Also with AVX512 you have masked loads/stores at your disposal which are
much better suited here (and already supported by if-conversion!) as they
will be likely very much faster than gather/scatter operations.
So ... what target are you targeting that has gather/scatter support for
arbitrary unrelated objects and _not_ masked load/store support.
That said - it looks like your way of doing if-conversion isn't a good one for
vectorization.
Richard.
> JEff