This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: fix PR46029: reimplement if conversion of loads and stores
- From: Alan Lawrence <alan dot lawrence at arm dot com>
- To: Abe Skolnik <a dot skolnik at samsung dot com>
- Cc: "gcc-patches at gcc dot gnu dot org" <gcc-patches at gcc dot gnu dot org>, "richard dot guenther at gmail dot com" <richard dot guenther at gmail dot com>, "sebpop at gmail dot com" <sebpop at gmail dot com>
- Date: Mon, 22 Jun 2015 17:27:03 +0100
- Subject: Re: fix PR46029: reimplement if conversion of loads and stores
- Authentication-results: sourceware.org; auth=none
- References: <20150612205047 dot GA27819 at cc00 dot spa dot sarc dot sas>
Abe Skolnik wrote:
Hi everybody!
In the current implementation of if conversion, loads and stores are
if-converted in a thread-unsafe way:
* loads were always executed, even when they should have not been.
Some source code could be rendered invalid due to null pointers
that were OK in the original program because they were never
dereferenced.
* writes were if-converted via load/maybe-modify/store, which
renders some code multithreading-unsafe.
This patch reimplements if-conversion of loads and stores in a safe
way using a scratchpad allocated by the compiler on the stack:
* loads are done through an indirection, reading either the correct
data from the correct source [if the condition is true] or reading
from the scratchpad and later ignoring this read result [if the
condition is false].
* writes are also done through an indirection, writing either to the
correct destination [if the condition is true] or to the
scratchpad [if the condition is false].
Vectorization of "if-cvt-stores-vect-ifcvt-18.c" disabled because the
old if-conversion resulted in unsafe code that could fail under
multithreading even though the as-written code _was_ thread-safe.
Passed regression testing and bootstrap on amd64-linux.
Is this OK to commit to trunk?
Regards,
Abe
Thanks for getting back to this!
My main thought concerns the direction we are travelling here. A major reason
why we do if-conversion is to enable vectorization. Is this is targetted at
gathering/scattering loads? Following vectorization, different elements of the
vector being loaded/stored may have to go to/from the scratchpad or to/from main
memory.
Or, are we aiming at the case where the predicate or address are invariant? That
seems unlikely - loop unswitching would be better for the predicate; loading
from an address, we'd just peel and hoist; storing, this'd result in the address
holding the last value written, at exit from the loop, a curious idiom. Where
the predicate/address is invariant across the vector? (!)
Or, at we aiming at non-vectorized code?
Beyond that question...
Does the description for -ftree-loop-if-convert-stores in doc/invoke.texi
describe what the flag now does? (It doesn't mention loads; the code doesn't
look like we use scratchpads at all without -ftree-loop-if-convert-stores, or am
I missing something?)
In tree-if-conv.c:
@@ -883,7 +733,7 @@ if_convertible_gimple_assign_stmt_p (gimple stmt,
if (flag_tree_loop_if_convert_stores)
{
- if (ifcvt_could_trap_p (stmt, refs))
+ if (ifcvt_could_trap_p (stmt))
{
if (ifcvt_can_use_mask_load_store (stmt))
{
and
+
+ if (has_non_addressable_refs (stmt))
+ {
+ if (dump_file && (dump_flags & TDF_DETAILS))
+ fprintf (dump_file, "has non-addressable memory references\n");
+ return false;
+ }
+
if it doesn't trap, but has_non_addressable_refs, can't we use
ifcvt_can_use_mask_load_store there too?
And/or, I think I may be misunderstanding here, but if an access could trap, but
is addressable, can't we use the scratchpad technique to get round the trapping
problem?
(Look at it another way - this patch makes strictly more things return true from
ifcvt_could_trap_p, which always exits immediately from
if_convertible_gimple_assign_stmt_p...?)
Re. creation of scratchpads:
(1) Should the '64' byte size be the result of scanning the function, for
the largest data size to which we store? (ideally, conditionally store!)
(2) Allocating only once per function: if we had one scratchpad per loop, it
could/would live inside the test of "gimple_build_call_internal
(IFN_LOOP_VECTORIZED, ...". Otherwise, if we if-convert one or more loops in the
function, but then fail to vectorize them, we'll leave the scratchpad around for
later phases to clean up. Is that OK?
Also some style nits:
@@ -1342,7 +1190,7 @@ if_convertible_loop_p_1 (struct loop *loop,
/* Check the if-convertibility of statements in predicated BBs. */
if (!dominated_by_p (CDI_DOMINATORS, loop->latch, bb))
for (itr = gsi_start_bb (bb); !gsi_end_p (itr); gsi_next (&itr))
- if (!if_convertible_stmt_p (gsi_stmt (itr), *refs,
+ if (!if_convertible_stmt_p (gsi_stmt (itr),
any_mask_load_store))
return false;
}
bet that fits on one line now.
+ * Returns a memory reference to the pointer defined by the
+ conditional expression: pointer = cond ? &A[i] : scratch_pad; and
+ inserts this code at GSI. */
+
+static tree
+create_indirect_cond_expr (tree ai, tree cond, tree *scratch_pad,
+ gimple_stmt_iterator *gsi, bool swap)
in comment, should A[i] just be AI, as I see nothing in
create_indirect_cond_expr that requires ai to be an array dereference?
@@ -2063,12 +1998,14 @@ mask_exists (int size, vec<int> vec)
| end_bb_1
|
| bb_2
+ | cond = some_computation;
| if (cond) goto bb_3 else goto bb_4
| end_bb_2
|
| bb_3
| cond = some_computation;
- | A[i] = cond ? expr : A[i];
+ | p = cond ? &A[i] : scratch_pad;
+ | *p = expr;
| goto bb_4
| end_bb_3
|
I think you want to remove the 'cond = some_computation' from bb_3 rather than
copy it.
In tree-data-ref.h, data structures and macro defs came (mingled together)
before prototypes for functions defined elsewhere, with function declarations
beneath.
Thanks, Alan