[PATCH]middle-end Add an RPO pass after successful vectorization

Tamar Christina tamar.christina@arm.com
Tue Nov 2 13:50:08 GMT 2021


Hi All,

Following my current SVE predicate optimization series a problem has presented
itself in that the way vector masks are generated for masked operations relies
on CSE to share masks efficiently.

The issue however is that masking is done using the & operand and & is
associative and so reassoc decides to reassociate the masked operations.

This makes CSE then unable to CSE an unmasked and a masked operation leading to
duplicate operations being performed.

To counter this we want to add an RPO pass over the vectorized loop body when
vectorization succeeds.  This makes it then no longer reliant on the RTL level
CSE.

I have not added a testcase for this as it requires the changes in my patch
series, however the entire series relies on this patch to work so all the
tests there cover it.

Bootstrapped Regtested on aarch64-none-linux-gnu,
x86_64-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

	* tree-vectorizer.c (vectorize_loops): Do local CSE through RPVN upon
	successful vectorization.

--- inline copy of patch -- 
diff --git a/gcc/tree-vectorizer.c b/gcc/tree-vectorizer.c
index 4712dc6e7f907637774482a71036a0bd381c2bd2..1e370d60fb19b03c3b6bce45c660af4b6d32dc51 100644
--- a/gcc/tree-vectorizer.c
+++ b/gcc/tree-vectorizer.c
@@ -81,7 +81,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "gimple-pretty-print.h"
 #include "opt-problem.h"
 #include "internal-fn.h"
-
+#include "tree-ssa-sccvn.h"
 
 /* Loop or bb location, with hotness information.  */
 dump_user_location_t vect_location;
@@ -1323,6 +1323,27 @@ vectorize_loops (void)
 	 ???  Also while we try hard to update loop-closed SSA form we fail
 	 to properly do this in some corner-cases (see PR56286).  */
       rewrite_into_loop_closed_ssa (NULL, TODO_update_ssa_only_virtuals);
+
+      for (i = 1; i < number_of_loops (cfun); i++)
+	{
+	  loop = get_loop (cfun, i);
+	  if (!loop || !single_exit (loop))
+	    continue;
+
+	  bitmap exit_bbs;
+	  /* Perform local CSE, this esp. helps because we emit code for
+	     predicates that need to be shared for optimal predicate usage.
+	     However reassoc will re-order them and prevent CSE from working
+	     as it should.  CSE only the loop body, not the entry.  */
+	  exit_bbs = BITMAP_ALLOC (NULL);
+	  bitmap_set_bit (exit_bbs, single_exit (loop)->dest->index);
+	  bitmap_set_bit (exit_bbs, loop->latch->index);
+
+	  do_rpo_vn (cfun, loop_preheader_edge (loop), exit_bbs);
+
+	  BITMAP_FREE (exit_bbs);
+	}
+
       return TODO_cleanup_cfg;
     }
 


-- 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: rb15007.patch
Type: text/x-diff
Size: 1475 bytes
Desc: not available
URL: <https://gcc.gnu.org/pipermail/gcc-patches/attachments/20211102/f2207793/attachment-0001.bin>


More information about the Gcc-patches mailing list