This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [patch tree-ssa-forwprop]: Add type raising in shift-operations


On 11/21/13 08:15, Kai Tietz wrote:
Hi,

this is the required forward-propagation part for the type-demotion pass as separate patch.
This patch is sharing some adjustments in testsuite due new optimization of type-casts on shift-operations.

This patch tries to generate the "normal" form (TYPE) (X shift-op Y) out of the "denormal" form "((TYPE) X) shift-op Y".

ChangeLog

2013-11-21  Kai Tietz  <ktietz@redhat.com>

	* tree-ssa-forwprop.c (simplify_shift):  New function.
	(ssa_forward_propagate_and_combine): Use it.

ChangeLog testsuite

	* gcc.dg/tree-ssa-ts-shift-2.c: New test.


Shared testsuite part between type-demotion and forward-propagation patches.

Changelog gcc/testsuite:

	* gcc.dg/vect/vect-over-widen-1-big-array.c: Likewise.
	* gcc.dg/vect/vect-over-widen-1.c: Likewise.
	* gcc.dg/vect/vect-over-widen-3-big-array.c: Likewise.
	* gcc.dg/vect/vect-over-widen-3.c: Likewise.
	* gcc.dg/vect/vect-over-widen-4-big-array.c: Likewise.
	* gcc.dg/vect/vect-over-widen-4.c: Likewise.

Bootstrapped for x86_64-unknown-linux-gnu.  Ok for apply?

Index: gcc-org/gcc/testsuite/gcc.dg/tree-ssa/ts-shift-2.c
===================================================================
--- /dev/null
+++ gcc-org/gcc/testsuite/gcc.dg/tree-ssa/ts-shift-2.c
@@ -0,0 +1,33 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+unsigned char f1 (char x)
+{
+  unsigned char x1 = (unsigned char) x;
+  return (x1 << 5);
+}
+
+unsigned short f2 (unsigned int x)
+{
+  unsigned short x1 = (unsigned short) x;
+  return (x1 << 15);
+}
+
+unsigned int f3 (unsigned short x)
+{
+  unsigned long long x1 = (unsigned long long) x;
+  return (unsigned int) (x1 >> 15);
+}
+
+unsigned long long f4 (unsigned short x)
+{
+  unsigned int x1 = (unsigned int) x;
+  return (unsigned long long) (x1 >> 7);
+}
+
+/* { dg-final { scan-tree-dump-times "= \\\(long long unsigned int\\\)" 1 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "= \\\(short unsigned int\\\)" 1 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "= \\\(unsigned int\\\)" 1 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "= \\\(unsigned char\\\)" 1 "optimized" } } */
So at least for f1 and f2, these regexps match regardless of whether or not the code was transformed.

Without your patch (f1)
  x1_2 = (unsigned char) x_1(D);
  _3 = x1_2 << 5;
  return _3;

With your patch (f1):
  _4 = x_1(D) << 5;
  _2 = (unsigned char) _4;
  return _2;

So all that's happened is we changed which object got casted -- we have the same number casts of an object to (unsigned char).

ISTM you might be better off scanning the details of the forwprop dump. For example, in f1 we get:

   Replaced _3 = x1_2 << 5;
     x1_2 = (unsigned char) x_1(D);
   by    by _3 = (unsigned char) _5;
     _5 = x_1(D) << 5;


We basically see the same thing happen in f2.

For f1 & f2 I don't see that we've done anything other than change the casts. They don't show any benefit from this patch.

f3 & f4 on the other hand do show improvements.


WHen I looked over a bunch of .i files, the results were mixed. Sometimes better, sometimes worse without any clear bias.

So here's an example that gets worse:


*************** rshift_double (long unsigned int l1, lon
*** 651,661 ****
    _29 = (long int) _28;
    *hv_13(D) = _29;
    _31 = l1_9(D) >> _27;
!   _32 = (unsigned int) count_4(D);
!   _33 = 63 - _32;
!   _34 = (int) _33;
!   _35 = h1.18_26 << _34;
!   _36 = _35 << 1;
    _37 = _36 | _31;
    *lv_12(D) = _37;
  ;;    succ:       11
--- 652,663 ----
    _29 = (long int) _28;
    *hv_13(D) = _29;
    _31 = l1_9(D) >> _27;
!   _33 = (unsigned int) count_4(D);
!   _34 = 63 - _33;
!   _35 = (int) _34;
!   _71 = h1_10(D) << _35;
!   _32 = _71 << 1;
!   _36 = (long unsigned int) _32;
    _37 = _36 | _31;
    *lv_12(D) = _37;


+/* { dg-final { cleanup-tree-dump "optimized" } } */
+
Index: gcc-org/gcc/tree-ssa-forwprop.c
===================================================================
--- gcc-org.orig/gcc/tree-ssa-forwprop.c
+++ gcc-org/gcc/tree-ssa-forwprop.c
@@ -551,6 +551,107 @@ forward_propagate_into_gimple_cond (gimp
    return 0;
  }

+/* Try to simplify shift-statement ((TYPE1) X) CODE Y for
+   integral-kind types.
+   Returns none-zero if the stmt was changed.  */
Simplify it to what? I know from reading the patch explanation, but it'd be better to include that in a comment. Even better would be to show a little sequence of psuedo-gimple and how it gets changed.


+
+static int
+simplify_shift (enum tree_code code, gimple_stmt_iterator *gsi_p)
+{
+  gimple stmt = gsi_stmt (*gsi_p);
+  gimple def, newop;
+  tree op1, opx, op2, t_opx, tem;
+  int ret;
+
+  if (!INTEGRAL_TYPE_P (TREE_TYPE (gimple_assign_lhs (stmt))))
+    return 0;
+
+  op1 = gimple_assign_rhs1 (stmt);
+  if (TREE_CODE (op1) != SSA_NAME
+      || !(def = SSA_NAME_DEF_STMT (op1))
+      || !is_gimple_assign (def)
+      || !gimple_assign_cast_p (def)
+      || SSA_NAME_OCCURS_IN_ABNORMAL_PHI (op1)
+      || !has_single_use (op1))
+    return 0;
+
+  opx = gimple_assign_rhs1 (def);
+  if (TREE_CODE (opx) != SSA_NAME)
+    return 0;
+
+  t_opx = TREE_TYPE (opx);
+
+  if (!INTEGRAL_TYPE_P (t_opx))
+    return 0;
+
+  op2 = gimple_assign_rhs2 (stmt);
+
+  if (code == LSHIFT_EXPR)
+    {
+      if (TYPE_PRECISION (TREE_TYPE (op1))
+	  > TYPE_PRECISION (t_opx))
+        return 0;
+    }
+  else if (code == RSHIFT_EXPR)
+    {
+      /* If type of OPX and OP1 are compatible, we don't need to check OP2
+         for validity as we don't change range of operation.
+         Otherwise we need to check that right-hand operand of shift-right
+	 doesn't exceed type-precision of inner operand.  */
There's some kind of whitespace problem here.

+
+  /* Do transformation ((T1) X) shift-code N -> (T1) (X shift-code N).  */
+  if (dump_file)
+    {
+      fprintf (dump_file, "  Replaced ");
+      print_gimple_stmt (dump_file, stmt, 0, 0);
+      fprintf (dump_file, "    ");
+      print_gimple_stmt (dump_file, def, 0, 0);
+      fprintf (dump_file, "  by ");
+    }
+
+  tem = make_ssa_name (t_opx, NULL);
+  newop = gimple_build_assign_with_ops (code, tem, opx, op2);
+  gimple_set_location (newop, gimple_location (stmt));
+  gsi_insert_before (gsi_p, newop, GSI_SAME_STMT);
+  gimple_assign_set_rhs_with_ops_1 (gsi_p, NOP_EXPR, tem, NULL_TREE, NULL_TREE);
+
+  if (gimple_location (def) != UNKNOWN_LOCATION)
+    gimple_set_location (stmt, gimple_location (def));
+
+  stmt = gsi_stmt (*gsi_p);
+  update_stmt (stmt);
+
+  if (TREE_CODE (op1) == SSA_NAME)
+    ret = remove_prop_source_from_use (op1) ? 2 : 1;
+  else
+    ret = 1;
+  if (dump_file)
+    {
+      fprintf (dump_file, "   by ");
+      print_gimple_stmt (dump_file, stmt, 0, 0);
+      fprintf (dump_file, "    ");
+      print_gimple_stmt (dump_file, newop, 0, 0);
As someone has already noted, the dump files are sub-optimal with the "by by" output:

  Replaced _3 = x1_2 << 5;
    x1_2 = (unsigned char) x_1(D);
  by    by _3 = (unsigned char) _5;
    _5 = x_1(D) << 5;

  /* Propagate from the ssa name definition statements of COND_EXPR
     in the rhs of statement STMT into the conditional if that simplifies it.
@@ -3432,6 +3533,15 @@ ssa_forward_propagate_and_combine (void)
  		     || code == NEGATE_EXPR)
  		    && TREE_CODE (rhs1) == SSA_NAME)
  		  changed = simplify_not_neg_expr (&gsi);
+		else if (code == LSHIFT_EXPR
+		         || code == RSHIFT_EXPR)
+		  {
+		    int did_something;
+		    did_something = simplify_shift (code, &gsi);
+		    if (did_something == 2)
+		      cfg_changed = true;
+		    changed = did_something != 0;
+		  }
  		else if (code == COND_EXPR
  			 || code == VEC_COND_EXPR)
  		  {
Does this apply to rotates as well?



Index: gcc-trunk/gcc/testsuite/gcc.dg/vect/vect-over-widen-1-big-array.c
===================================================================
--- gcc-trunk.orig/gcc/testsuite/gcc.dg/vect/vect-over-widen-1-big-array.c
+++ gcc-trunk/gcc/testsuite/gcc.dg/vect/vect-over-widen-1-big-array.c
@@ -58,9 +58,8 @@ int main (void)
    return 0;
  }

-/* { dg-final { scan-tree-dump-times "vect_recog_widen_shift_pattern: detected" 2 "vect" { target vect_widen_shift } } } */
-/* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 2 "vect" { target vect_widen_shift } } } */
-/* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 4 "vect" { target { ! vect_widen_shift } } } } */
+/* { dg-final { scan-tree-dump-times "vect_recog_widen_shift_pattern: detected" 4 "vect" { target vect_widen_shift } } } */
+/* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: detected" 0 "vect" } } */
  /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
  /* { dg-final { cleanup-tree-dump "vect" } } */
I'm guessing this was one of the cases you mentioned to me privately via email. Are we still vectorizing as well, just differently without the need for as many conversions?

Simliarly for the other vect testcases. At the least you need to indicate we're not actually regressing on those tests -- when you change an epected output in a way that makes it look like the test might be compromised, you need to explain yourself more thoroughly. My obvious concern is we're not vectorizing as much and the testsuite changes are merely a way of hiding that fact.

I think there's some questions that need to be answered before this can go forward.

jeff



Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]