This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Revision 107218 changed addressing mode generation


On Mon, 30 Mar 2009, Richard Guenther wrote:

> On Sun, 29 Mar 2009, Richard Guenther wrote:
> 
> > On Sun, Mar 29, 2009 at 10:46 PM, Bernd Schmidt <bernds_cb1@t-online.de> wrote:
> > > Richard Guenther wrote:
> > >>
> > >> I think CSE added/removed by either canoncial form is less important
> > >> than things like SCEV analysis being unaffected and IVOPTs still
> > >> creating proper induction variables on the tree level.
> > >>
> > >> Of course a canonical form should minimize the number of operations,
> > >> so folding i * 4 + j * 4 to (i + j) * 4 should still be done. ?But
> > >> folding i * 4 + j * 2 to (i * 2 + j) * 2 and not i * 4 + 4 to (i + 1) * 4
> > >> looks strange.
> > >>
> > >> I have been resistant to change this canonicalization simply because
> > >> the current one seems to work well with the tree loop optimizers and
> > >> date-dependence analysis nowadays, thus also my hint at doing
> > >> such change during stage1 instead of stage4 (after all this "problem"
> > >> now exists since 4.1 ...?)
> > >>
> > >> Implementing the change should be straight-forward (we should
> > >> make sure the tree reassociation pass agrees with it).
> > >
> > > So, how do we proceed? ?IMO the canonical form should be that addition of a
> > > constant is always the outermost operation. ?That seems to be what the
> > > EXPAND_SUM machinery expects, and what extract_muldiv wants to do. It's also
> > > IMO most likely to be helpful when generating addressing modes.
> > >
> > > The patch below should do that, but unfortunately:
> > > +FAIL: gcc.dg/vect/vect-103.c scan-tree-dump-times vect "dependence distance
> > > modulo vf == 0" 1
> > 
> > *((int *)p + x + i) = *((int *)p + x + i + 8);
> > 
> > > +FAIL: gcc.dg/vect/no-vfa-vect-102.c scan-tree-dump-times vect "possible
> > > dependence between data-refs" 1
> > 
> > *((int *)p + x + i + 1) = *((int *)p + x + i);
> > 
> > these are exactly cases that I was concerned about.
> 
> Ok, the main issue is that with the canonicalization
> 
>  *((int *)p + (x + i + 8) * 4)
> 
> we can re-construct an array access from
> 
>   D.2535_23 = &p_5->a[0];
>   D.2536_24 = (long unsigned int) x_17(D);
>   D.2537_25 = (long unsigned int) i_2;
>   D.2538_26 = D.2536_24 + D.2537_25;
>   D.2541_27 = D.2538_26 + 8;
>   D.2542_28 = D.2541_27 * 4;
>   D.2543_29 = D.2535_23 + D.2542_28;
>   D.2544_30 = *D.2543_29;
> 
> during forwprop (p_5->a[D.2541_27]) but with the canonicalization
> 
>  *((int *)p + (x + i) * 4 + 32)
> 
> this fails because in
> 
>   D.2535_23 = &p_5->a[0];
>   D.2536_24 = (long unsigned int) x_17(D);
>   D.2537_25 = (long unsigned int) i_2;
>   D.2538_26 = D.2536_24 + D.2537_25;
>   D.2539_27 = D.2538_26 * 4;
>   D.2541_28 = D.2539_27 + 32;
>   D.2542_29 = D.2535_23 + D.2541_28;
>   D.2543_30 = *D.2542_29;
> 
> the offset addition to the base pointer is not multiplied by the
> array element size.
> 
> Thus later data dependence analysis gets confused.
> 
> To fix this in forwprop (and CCP for the invariant address case) we
> would need to apply the re-association there, producing new
> intermediate stmts, etc. - not necessarily a good idea and a non-trivial
> task anyway.
> 
> The problem with data-dependence analysis is that for the different
> canonicalization we have two DRs with completely different bases:
> 
> for the non-array form:
> 
>         base_address: (int *) p_5 + (long unsigned int) pretmp.24_57 * 4
>         offset from base address: 0
>         constant offset from base address: 32
>         step: 4
>         aligned to: 128
>         base_object: *((int *) p_5 + (long unsigned int) pretmp.24_57 * 4)
> 
> and for the other access which is in proper array form:
> 
>         base_address: p_5
>         offset from base address: (<unnamed-signed:64>) ((long unsigned 
> int) pretmp.24_57 * 4)
>         constant offset from base address: 0
>         step: 4
>         aligned to: 4
>         base_object: p_5->a[0]
> 
> where at least the base_address of the non-array form looks bogus (the
> offset part should be in offset really).
> 
> Thus, I'm trying to have a look at data-dependence analysis.
> 
> But - given the above - wouldn't it be easier to fix the MULT_EXPR
> case in expand_expr_real_1 to do the un-canonicalization of
> (X + CST1) * CST2 to X * CST2 + CST3?

The following patch addresses this particular problem (no longer
producing array accesses) in forwprop - fixes at other places failed.

Bootstrapped and tested on x86_64-unknown-linux-gnu - I'll give it
some SPEC testing tonight, can you check if that works for you?
If so we can put this into trunk and watch for more fallout.

Thanks,
Richard.


2009-03-30  Richard Guenther  <rguenther@suse.de>

	* tree.h (div_if_zero_remainder): Declare.
	* fold-const.c (div_if_zero_remainder): Export.
	(fold_plusminus_mult_expr): Do not canonicalize with constants.
	* tree-ssa-forwprop.c
	(forward_propagate_addr_into_variable_array_index): Handle
	constant array index addition outside of the variable index.

	* gcc.dg/tree-ssa/forwprop-12.c: New testcase.

Index: gcc/tree.h
===================================================================
*** gcc/tree.h	(revision 145282)
--- gcc/tree.h	(working copy)
*************** extern tree build_fold_indirect_ref (tre
*** 4834,4839 ****
--- 4834,4840 ----
  extern tree fold_indirect_ref (tree);
  extern tree constant_boolean_node (int, tree);
  extern tree build_low_bits_mask (tree, unsigned);
+ extern tree div_if_zero_remainder (enum tree_code, const_tree, const_tree);
  
  extern bool tree_swap_operands_p (const_tree, const_tree, bool);
  extern enum tree_code swap_tree_comparison (enum tree_code);
Index: gcc/fold-const.c
===================================================================
*** gcc/fold-const.c	(revision 145282)
--- gcc/fold-const.c	(working copy)
*************** div_and_round_double (enum tree_code cod
*** 874,880 ****
     of type CODE and returns the quotient.
     Otherwise returns NULL_TREE.  */
  
! static tree
  div_if_zero_remainder (enum tree_code code, const_tree arg1, const_tree arg2)
  {
    unsigned HOST_WIDE_INT int1l, int2l;
--- 874,880 ----
     of type CODE and returns the quotient.
     Otherwise returns NULL_TREE.  */
  
! tree
  div_if_zero_remainder (enum tree_code code, const_tree arg1, const_tree arg2)
  {
    unsigned HOST_WIDE_INT int1l, int2l;
*************** fold_plusminus_mult_expr (enum tree_code
*** 7379,7384 ****
--- 7379,7387 ----
    tree arg00, arg01, arg10, arg11;
    tree alt0 = NULL_TREE, alt1 = NULL_TREE, same;
  
+   if (TREE_CODE (arg0) == INTEGER_CST || TREE_CODE (arg1) == INTEGER_CST)
+     return NULL_TREE;
+ 
    /* (A * C) +- (B * C) -> (A+-B) * C.
       (A * C) +- A -> A * (C+-1).
       We are most concerned about the case where C is a constant,
*************** fold_plusminus_mult_expr (enum tree_code
*** 7390,7400 ****
        arg00 = TREE_OPERAND (arg0, 0);
        arg01 = TREE_OPERAND (arg0, 1);
      }
-   else if (TREE_CODE (arg0) == INTEGER_CST)
-     {
-       arg00 = build_one_cst (type);
-       arg01 = arg0;
-     }
    else
      {
        /* We cannot generate constant 1 for fract.  */
--- 7393,7398 ----
*************** fold_plusminus_mult_expr (enum tree_code
*** 7403,7428 ****
        arg00 = arg0;
        arg01 = build_one_cst (type);
      }
    if (TREE_CODE (arg1) == MULT_EXPR)
      {
        arg10 = TREE_OPERAND (arg1, 0);
        arg11 = TREE_OPERAND (arg1, 1);
      }
-   else if (TREE_CODE (arg1) == INTEGER_CST)
-     {
-       arg10 = build_one_cst (type);
-       /* As we canonicalize A - 2 to A + -2 get rid of that sign for
- 	 the purpose of this canonicalization.  */
-       if (TREE_INT_CST_HIGH (arg1) == -1
- 	  && negate_expr_p (arg1)
- 	  && code == PLUS_EXPR)
- 	{
- 	  arg11 = negate_expr (arg1);
- 	  code = MINUS_EXPR;
- 	}
-       else
- 	arg11 = arg1;
-     }
    else
      {
        /* We cannot generate constant 1 for fract.  */
--- 7401,7412 ----
        arg00 = arg0;
        arg01 = build_one_cst (type);
      }
+ 
    if (TREE_CODE (arg1) == MULT_EXPR)
      {
        arg10 = TREE_OPERAND (arg1, 0);
        arg11 = TREE_OPERAND (arg1, 1);
      }
    else
      {
        /* We cannot generate constant 1 for fract.  */
Index: gcc/tree-ssa-forwprop.c
===================================================================
*** gcc/tree-ssa-forwprop.c	(revision 145282)
--- gcc/tree-ssa-forwprop.c	(working copy)
*************** forward_propagate_addr_into_variable_arr
*** 618,625 ****
  						  tree def_rhs,
  						  gimple_stmt_iterator *use_stmt_gsi)
  {
!   tree index;
    gimple offset_def, use_stmt = gsi_stmt (*use_stmt_gsi);
  
    /* Get the offset's defining statement.  */
    offset_def = SSA_NAME_DEF_STMT (offset);
--- 618,630 ----
  						  tree def_rhs,
  						  gimple_stmt_iterator *use_stmt_gsi)
  {
!   tree index, tunit;
    gimple offset_def, use_stmt = gsi_stmt (*use_stmt_gsi);
+   tree tmp;
+ 
+   tunit = TYPE_SIZE_UNIT (TREE_TYPE (TREE_TYPE (def_rhs)));
+   if (!host_integerp (tunit, 1))
+     return false;
  
    /* Get the offset's defining statement.  */
    offset_def = SSA_NAME_DEF_STMT (offset);
*************** forward_propagate_addr_into_variable_arr
*** 629,635 ****
       along in case the element size is one. In that case, however, we do not
       allow multiplications because they can be computing index to a higher
       level dimension (PR 37861). */
!   if (integer_onep (TYPE_SIZE_UNIT (TREE_TYPE (TREE_TYPE (def_rhs)))))
      {
        if (is_gimple_assign (offset_def)
  	  && gimple_assign_rhs_code (offset_def) == MULT_EXPR)
--- 634,640 ----
       along in case the element size is one. In that case, however, we do not
       allow multiplications because they can be computing index to a higher
       level dimension (PR 37861). */
!   if (integer_onep (tunit))
      {
        if (is_gimple_assign (offset_def)
  	  && gimple_assign_rhs_code (offset_def) == MULT_EXPR)
*************** forward_propagate_addr_into_variable_arr
*** 648,665 ****
  	 multiplication of an object by the size of the array elements. 
  	 This implicitly verifies that the size of the array elements
  	 is constant.  */
!      offset = gimple_assign_rhs1 (offset_def);
!      if (gimple_assign_rhs_code (offset_def) != MULT_EXPR
! 	 || TREE_CODE (gimple_assign_rhs2 (offset_def)) != INTEGER_CST
! 	 || !simple_cst_equal (gimple_assign_rhs2 (offset_def),
! 			       TYPE_SIZE_UNIT (TREE_TYPE (TREE_TYPE (def_rhs)))))
  	return false;
- 
-       /* The first operand to the MULT_EXPR is the desired index.  */
-       index = offset;
      }
  
    /* Replace the pointer addition with array indexing.  */
    gimple_assign_set_rhs_from_tree (use_stmt_gsi, unshare_expr (def_rhs));
    use_stmt = gsi_stmt (*use_stmt_gsi);
    TREE_OPERAND (TREE_OPERAND (gimple_assign_rhs1 (use_stmt), 0), 1)
--- 653,693 ----
  	 multiplication of an object by the size of the array elements. 
  	 This implicitly verifies that the size of the array elements
  	 is constant.  */
!      if (gimple_assign_rhs_code (offset_def) == MULT_EXPR
! 	 && TREE_CODE (gimple_assign_rhs2 (offset_def)) == INTEGER_CST
! 	 && tree_int_cst_equal (gimple_assign_rhs2 (offset_def), tunit))
!        {
! 	 /* The first operand to the MULT_EXPR is the desired index.  */
! 	 index = gimple_assign_rhs1 (offset_def);
!        }
!      /* If we have idx * tunit + CST * tunit re-associate that.  */
!      else if ((gimple_assign_rhs_code (offset_def) == PLUS_EXPR
! 	       || gimple_assign_rhs_code (offset_def) == MINUS_EXPR)
! 	      && TREE_CODE (gimple_assign_rhs1 (offset_def)) == SSA_NAME
! 	      && TREE_CODE (gimple_assign_rhs2 (offset_def)) == INTEGER_CST
! 	      && (tmp = div_if_zero_remainder (EXACT_DIV_EXPR,
! 					       gimple_assign_rhs2 (offset_def),
! 					       tunit)) != NULL_TREE)
!        {
! 	 gimple offset_def2 = SSA_NAME_DEF_STMT (gimple_assign_rhs1 (offset_def));
! 	 if (gimple_assign_rhs_code (offset_def2) == MULT_EXPR
! 	     && TREE_CODE (gimple_assign_rhs2 (offset_def2)) == INTEGER_CST
! 	     && tree_int_cst_equal (gimple_assign_rhs2 (offset_def2), tunit))
! 	   {
! 	     index = fold_build2 (gimple_assign_rhs_code (offset_def),
! 				  TREE_TYPE (offset),
! 				  gimple_assign_rhs1 (offset_def2), tmp);
! 	   }
! 	 else
! 	   return false;
!        }
!      else
  	return false;
      }
  
    /* Replace the pointer addition with array indexing.  */
+   index = force_gimple_operand_gsi (use_stmt_gsi, index, true, NULL_TREE,
+ 				    true, GSI_SAME_STMT);
    gimple_assign_set_rhs_from_tree (use_stmt_gsi, unshare_expr (def_rhs));
    use_stmt = gsi_stmt (*use_stmt_gsi);
    TREE_OPERAND (TREE_OPERAND (gimple_assign_rhs1 (use_stmt), 0), 1)
Index: gcc/testsuite/gcc.dg/tree-ssa/forwprop-12.c
===================================================================
*** gcc/testsuite/gcc.dg/tree-ssa/forwprop-12.c	(revision 0)
--- gcc/testsuite/gcc.dg/tree-ssa/forwprop-12.c	(revision 0)
***************
*** 0 ****
--- 1,22 ----
+ /* { dg-do compile } */
+ /* { dg-options "-O -fdump-tree-forwprop1" } */
+ 
+ struct X { int a[256]; };
+ 
+ int foo(struct X *p, __SIZE_TYPE__ i)
+ {
+   int *q = &p->a[0];
+   int *q2 = (int *)((void *)q + i*4 + 32);
+   return *q2;
+ }
+ 
+ int bar(struct X *p, int i)
+ {
+   return *((int *)p + i + 8);
+ }
+ 
+ /* We should have propagated the base array address through the
+    address arithmetic into the memory access as an array access.  */
+ 
+ /* { dg-final { scan-tree-dump-times "->a\\\[D\\\." 2 "forwprop1" } } */
+ /* { dg-final { cleanup-tree-dump "forwprop1" } } */

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]