This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: fix PR46029: reimplement if conversion of loads and stores


Abe Skolnik wrote:
Hi everybody!

In the current implementation of if conversion, loads and stores are
if-converted in a thread-unsafe way:

  * loads were always executed, even when they should have not been.
    Some source code could be rendered invalid due to null pointers
    that were OK in the original program because they were never
    dereferenced.

  * writes were if-converted via load/maybe-modify/store, which
    renders some code multithreading-unsafe.

This patch reimplements if-conversion of loads and stores in a safe
way using a scratchpad allocated by the compiler on the stack:

  * loads are done through an indirection, reading either the correct
    data from the correct source [if the condition is true] or reading
    from the scratchpad and later ignoring this read result [if the
    condition is false].

  * writes are also done through an indirection, writing either to the
    correct destination [if the condition is true] or to the
    scratchpad [if the condition is false].

Vectorization of "if-cvt-stores-vect-ifcvt-18.c" disabled because the
old if-conversion resulted in unsafe code that could fail under
multithreading even though the as-written code _was_ thread-safe.

Passed regression testing and bootstrap on amd64-linux.
Is this OK to commit to trunk?

Regards,

Abe


Thanks for getting back to this!

My main thought concerns the direction we are travelling here. A major reason why we do if-conversion is to enable vectorization. Is this is targetted at gathering/scattering loads? Following vectorization, different elements of the vector being loaded/stored may have to go to/from the scratchpad or to/from main memory.

Or, are we aiming at the case where the predicate or address are invariant? That seems unlikely - loop unswitching would be better for the predicate; loading from an address, we'd just peel and hoist; storing, this'd result in the address holding the last value written, at exit from the loop, a curious idiom. Where the predicate/address is invariant across the vector? (!)

Or, at we aiming at non-vectorized code?

Beyond that question...

Does the description for -ftree-loop-if-convert-stores in doc/invoke.texi describe what the flag now does? (It doesn't mention loads; the code doesn't look like we use scratchpads at all without -ftree-loop-if-convert-stores, or am I missing something?)

In tree-if-conv.c:
@@ -883,7 +733,7 @@ if_convertible_gimple_assign_stmt_p (gimple stmt,

   if (flag_tree_loop_if_convert_stores)
     {
-      if (ifcvt_could_trap_p (stmt, refs))
+      if (ifcvt_could_trap_p (stmt))
        {
          if (ifcvt_can_use_mask_load_store (stmt))
            {

and

+
+      if (has_non_addressable_refs (stmt))
+       {
+         if (dump_file && (dump_flags & TDF_DETAILS))
+           fprintf (dump_file, "has non-addressable memory references\n");
+         return false;
+       }
+

if it doesn't trap, but has_non_addressable_refs, can't we use ifcvt_can_use_mask_load_store there too?

And/or, I think I may be misunderstanding here, but if an access could trap, but is addressable, can't we use the scratchpad technique to get round the trapping problem?

(Look at it another way - this patch makes strictly more things return true from ifcvt_could_trap_p, which always exits immediately from if_convertible_gimple_assign_stmt_p...?)


Re. creation of scratchpads:
(1) Should the '64' byte size be the result of scanning the function, for the largest data size to which we store? (ideally, conditionally store!) (2) Allocating only once per function: if we had one scratchpad per loop, it could/would live inside the test of "gimple_build_call_internal (IFN_LOOP_VECTORIZED, ...". Otherwise, if we if-convert one or more loops in the function, but then fail to vectorize them, we'll leave the scratchpad around for later phases to clean up. Is that OK?


Also some style nits:

@@ -1342,7 +1190,7 @@ if_convertible_loop_p_1 (struct loop *loop,
       /* Check the if-convertibility of statements in predicated BBs.  */
       if (!dominated_by_p (CDI_DOMINATORS, loop->latch, bb))
        for (itr = gsi_start_bb (bb); !gsi_end_p (itr); gsi_next (&itr))
-         if (!if_convertible_stmt_p (gsi_stmt (itr), *refs,
+         if (!if_convertible_stmt_p (gsi_stmt (itr),
                                      any_mask_load_store))
            return false;
     }

bet that fits on one line now.

+ * Returns a memory reference to the pointer defined by the
+    conditional expression: pointer = cond ? &A[i] : scratch_pad; and
+   inserts this code at GSI.  */
+
+static tree
+create_indirect_cond_expr (tree ai, tree cond, tree *scratch_pad,
+                          gimple_stmt_iterator *gsi, bool swap)

in comment, should A[i] just be AI, as I see nothing in create_indirect_cond_expr that requires ai to be an array dereference?

@@ -2063,12 +1998,14 @@ mask_exists (int size, vec<int> vec)
    | end_bb_1
    |
    | bb_2
+   |   cond = some_computation;
    |   if (cond) goto bb_3 else goto bb_4
    | end_bb_2
    |
    | bb_3
    |   cond = some_computation;
-   |   A[i] = cond ? expr : A[i];
+   |   p = cond ? &A[i] : scratch_pad;
+   |   *p = expr;
    |   goto bb_4
    | end_bb_3
    |

I think you want to remove the 'cond = some_computation' from bb_3 rather than copy it.

In tree-data-ref.h, data structures and macro defs came (mingled together) before prototypes for functions defined elsewhere, with function declarations beneath.


Thanks, Alan


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]