This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [RFC][PR target/39726 P4 regression] match.pd pattern to do type narrowing


On 02/02/15 11:04, Jeff Law wrote:
On 02/02/15 09:59, Joseph Myers wrote:
On Sat, 31 Jan 2015, Jeff Law wrote:

The nice thing about wrapping the result inside a convert is the
types for the
inner operations will propagate from the type of the inner operands,
which is
exactly what we want.  We then remove the hack assigning type and
instead the
original type will be used for the outermost convert.

Those inner operands still need converting to unsigned for arithmetic.
Yes.
So it's actually painful to try to get those inner operands converted to unsigned. So at least for this iteration, it's probably best to punt for signed arithmetic and focus on the logicals and unsigned arithmetic


I haven't looked at those routines in a long time, but reviewing them
seems wise both in the immediate term WRT this bug and ensuring we're
doing the right thing for the various corner cases.
So shorten_binary_op is the closest to what the code in match.pd tries to do right now. They use slightly different means to know when its safe to narrow a binary operation.

shorten_binary_op is passed in a resulting type and it assumes that no bits outside that type are needed.

My match.pd code looks for an explicit mask to know when the bits outside the binary operation's type are not needed. I'd think it ought to be possible to extend the match.pd code to handle other mechansisms where we know some set of high bits aren't needed. Hard to justify that extension in stage4 though.

The latest match.pd code requires the types of op0 and op1 to have the same precision and signedness. simplify_binary_op is a bit looser than that, but that's most likely a historical quirk since it pre-dates gimple.


I'm now using two match.pd patterns, one for logicals, one for unsigned arithmetic which also simplifies things a bit. Finally the match.pd code does not try to handle signed inner arith operands which helps too.

It would certainly be interesting to instrument shorten_binary_op in stage1 and catch the cases where it's triggering and then looking to see how those cases can be done in match.pd.

Anyway, here's the two patterns now. They bootstrap and don't cause any code generation changes for x86_64, but they do fix 39726 and considerably improve a variety of related testcases on the m68k.



Jeff
diff --git a/gcc/match.pd b/gcc/match.pd
index 81c4ee6..d55fccd 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -1018,3 +1018,31 @@ along with GCC; see the file COPYING3.  If not see
    (logs (pows @0 @1))
    (mult @1 (logs @0)))))
 
+/* Given a bit-wise operation performed in mode P1 on operands
+   in some narrower type P2 that feeds an outer masking operation.
+   See if the mask turns off all the bits outside P2, and if so
+   perform the all the operations in P2 and just convert the final
+   result from P1 to P2.  */
+(for inner_op (bit_and bit_ior bit_xor)
+  (simplify
+    (bit_and (inner_op (convert @0) (convert @1)) INTEGER_CST@3)
+    (if ((TREE_INT_CST_LOW (@3) & ~GET_MODE_MASK (TYPE_MODE (TREE_TYPE (@0)))) == 0
+	 && TYPE_PRECISION (TREE_TYPE (@0)) == TYPE_PRECISION (TREE_TYPE (@1))
+	 && TYPE_UNSIGNED (TREE_TYPE (@0)) == TYPE_UNSIGNED (TREE_TYPE (@1))
+	 && TYPE_PRECISION (type) > TYPE_PRECISION (TREE_TYPE (@0)))
+      (convert (bit_and (inner_op @0 @1) (convert @3))))))
+
+/* Similarly, but for unsigned arithmetic operations.
+
+   It would be nice to handle signed arithmetic, but that runs the
+   risk of introducing undefined behaviour where none existed before.  */
+(for inner_op (minus plus mult)
+  (simplify
+    (bit_and (inner_op (convert @0) (convert @1)) INTEGER_CST@3)
+    (if ((TREE_INT_CST_LOW (@3) & ~GET_MODE_MASK (TYPE_MODE (TREE_TYPE (@0)))) == 0
+	 && TYPE_PRECISION (TREE_TYPE (@0)) == TYPE_PRECISION (TREE_TYPE (@1))
+	 && TYPE_UNSIGNED (TREE_TYPE (@0)) == TYPE_UNSIGNED (TREE_TYPE (@1))
+	 /* Restricted to unsigned inner arithmetic for now.  */
+	 && TYPE_UNSIGNED (TREE_TYPE (@0))
+	 && TYPE_PRECISION (TREE_TYPE (@3)) > TYPE_PRECISION (TREE_TYPE (@0)))
+        (convert (bit_and (inner_op @0 @1) (convert @3))))))

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]