This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: Revision 107218 changed addressing mode generation
On Mon, 30 Mar 2009, Richard Guenther wrote:
> On Sun, 29 Mar 2009, Richard Guenther wrote:
>
> > On Sun, Mar 29, 2009 at 10:46 PM, Bernd Schmidt <bernds_cb1@t-online.de> wrote:
> > > Richard Guenther wrote:
> > >>
> > >> I think CSE added/removed by either canoncial form is less important
> > >> than things like SCEV analysis being unaffected and IVOPTs still
> > >> creating proper induction variables on the tree level.
> > >>
> > >> Of course a canonical form should minimize the number of operations,
> > >> so folding i * 4 + j * 4 to (i + j) * 4 should still be done. ?But
> > >> folding i * 4 + j * 2 to (i * 2 + j) * 2 and not i * 4 + 4 to (i + 1) * 4
> > >> looks strange.
> > >>
> > >> I have been resistant to change this canonicalization simply because
> > >> the current one seems to work well with the tree loop optimizers and
> > >> date-dependence analysis nowadays, thus also my hint at doing
> > >> such change during stage1 instead of stage4 (after all this "problem"
> > >> now exists since 4.1 ...?)
> > >>
> > >> Implementing the change should be straight-forward (we should
> > >> make sure the tree reassociation pass agrees with it).
> > >
> > > So, how do we proceed? ?IMO the canonical form should be that addition of a
> > > constant is always the outermost operation. ?That seems to be what the
> > > EXPAND_SUM machinery expects, and what extract_muldiv wants to do. It's also
> > > IMO most likely to be helpful when generating addressing modes.
> > >
> > > The patch below should do that, but unfortunately:
> > > +FAIL: gcc.dg/vect/vect-103.c scan-tree-dump-times vect "dependence distance
> > > modulo vf == 0" 1
> >
> > *((int *)p + x + i) = *((int *)p + x + i + 8);
> >
> > > +FAIL: gcc.dg/vect/no-vfa-vect-102.c scan-tree-dump-times vect "possible
> > > dependence between data-refs" 1
> >
> > *((int *)p + x + i + 1) = *((int *)p + x + i);
> >
> > these are exactly cases that I was concerned about.
>
> Ok, the main issue is that with the canonicalization
>
> *((int *)p + (x + i + 8) * 4)
>
> we can re-construct an array access from
>
> D.2535_23 = &p_5->a[0];
> D.2536_24 = (long unsigned int) x_17(D);
> D.2537_25 = (long unsigned int) i_2;
> D.2538_26 = D.2536_24 + D.2537_25;
> D.2541_27 = D.2538_26 + 8;
> D.2542_28 = D.2541_27 * 4;
> D.2543_29 = D.2535_23 + D.2542_28;
> D.2544_30 = *D.2543_29;
>
> during forwprop (p_5->a[D.2541_27]) but with the canonicalization
>
> *((int *)p + (x + i) * 4 + 32)
>
> this fails because in
>
> D.2535_23 = &p_5->a[0];
> D.2536_24 = (long unsigned int) x_17(D);
> D.2537_25 = (long unsigned int) i_2;
> D.2538_26 = D.2536_24 + D.2537_25;
> D.2539_27 = D.2538_26 * 4;
> D.2541_28 = D.2539_27 + 32;
> D.2542_29 = D.2535_23 + D.2541_28;
> D.2543_30 = *D.2542_29;
>
> the offset addition to the base pointer is not multiplied by the
> array element size.
>
> Thus later data dependence analysis gets confused.
>
> To fix this in forwprop (and CCP for the invariant address case) we
> would need to apply the re-association there, producing new
> intermediate stmts, etc. - not necessarily a good idea and a non-trivial
> task anyway.
>
> The problem with data-dependence analysis is that for the different
> canonicalization we have two DRs with completely different bases:
>
> for the non-array form:
>
> base_address: (int *) p_5 + (long unsigned int) pretmp.24_57 * 4
> offset from base address: 0
> constant offset from base address: 32
> step: 4
> aligned to: 128
> base_object: *((int *) p_5 + (long unsigned int) pretmp.24_57 * 4)
>
> and for the other access which is in proper array form:
>
> base_address: p_5
> offset from base address: (<unnamed-signed:64>) ((long unsigned
> int) pretmp.24_57 * 4)
> constant offset from base address: 0
> step: 4
> aligned to: 4
> base_object: p_5->a[0]
>
> where at least the base_address of the non-array form looks bogus (the
> offset part should be in offset really).
>
> Thus, I'm trying to have a look at data-dependence analysis.
>
> But - given the above - wouldn't it be easier to fix the MULT_EXPR
> case in expand_expr_real_1 to do the un-canonicalization of
> (X + CST1) * CST2 to X * CST2 + CST3?
The following patch addresses this particular problem (no longer
producing array accesses) in forwprop - fixes at other places failed.
Bootstrapped and tested on x86_64-unknown-linux-gnu - I'll give it
some SPEC testing tonight, can you check if that works for you?
If so we can put this into trunk and watch for more fallout.
Thanks,
Richard.
2009-03-30 Richard Guenther <rguenther@suse.de>
* tree.h (div_if_zero_remainder): Declare.
* fold-const.c (div_if_zero_remainder): Export.
(fold_plusminus_mult_expr): Do not canonicalize with constants.
* tree-ssa-forwprop.c
(forward_propagate_addr_into_variable_array_index): Handle
constant array index addition outside of the variable index.
* gcc.dg/tree-ssa/forwprop-12.c: New testcase.
Index: gcc/tree.h
===================================================================
*** gcc/tree.h (revision 145282)
--- gcc/tree.h (working copy)
*************** extern tree build_fold_indirect_ref (tre
*** 4834,4839 ****
--- 4834,4840 ----
extern tree fold_indirect_ref (tree);
extern tree constant_boolean_node (int, tree);
extern tree build_low_bits_mask (tree, unsigned);
+ extern tree div_if_zero_remainder (enum tree_code, const_tree, const_tree);
extern bool tree_swap_operands_p (const_tree, const_tree, bool);
extern enum tree_code swap_tree_comparison (enum tree_code);
Index: gcc/fold-const.c
===================================================================
*** gcc/fold-const.c (revision 145282)
--- gcc/fold-const.c (working copy)
*************** div_and_round_double (enum tree_code cod
*** 874,880 ****
of type CODE and returns the quotient.
Otherwise returns NULL_TREE. */
! static tree
div_if_zero_remainder (enum tree_code code, const_tree arg1, const_tree arg2)
{
unsigned HOST_WIDE_INT int1l, int2l;
--- 874,880 ----
of type CODE and returns the quotient.
Otherwise returns NULL_TREE. */
! tree
div_if_zero_remainder (enum tree_code code, const_tree arg1, const_tree arg2)
{
unsigned HOST_WIDE_INT int1l, int2l;
*************** fold_plusminus_mult_expr (enum tree_code
*** 7379,7384 ****
--- 7379,7387 ----
tree arg00, arg01, arg10, arg11;
tree alt0 = NULL_TREE, alt1 = NULL_TREE, same;
+ if (TREE_CODE (arg0) == INTEGER_CST || TREE_CODE (arg1) == INTEGER_CST)
+ return NULL_TREE;
+
/* (A * C) +- (B * C) -> (A+-B) * C.
(A * C) +- A -> A * (C+-1).
We are most concerned about the case where C is a constant,
*************** fold_plusminus_mult_expr (enum tree_code
*** 7390,7400 ****
arg00 = TREE_OPERAND (arg0, 0);
arg01 = TREE_OPERAND (arg0, 1);
}
- else if (TREE_CODE (arg0) == INTEGER_CST)
- {
- arg00 = build_one_cst (type);
- arg01 = arg0;
- }
else
{
/* We cannot generate constant 1 for fract. */
--- 7393,7398 ----
*************** fold_plusminus_mult_expr (enum tree_code
*** 7403,7428 ****
arg00 = arg0;
arg01 = build_one_cst (type);
}
if (TREE_CODE (arg1) == MULT_EXPR)
{
arg10 = TREE_OPERAND (arg1, 0);
arg11 = TREE_OPERAND (arg1, 1);
}
- else if (TREE_CODE (arg1) == INTEGER_CST)
- {
- arg10 = build_one_cst (type);
- /* As we canonicalize A - 2 to A + -2 get rid of that sign for
- the purpose of this canonicalization. */
- if (TREE_INT_CST_HIGH (arg1) == -1
- && negate_expr_p (arg1)
- && code == PLUS_EXPR)
- {
- arg11 = negate_expr (arg1);
- code = MINUS_EXPR;
- }
- else
- arg11 = arg1;
- }
else
{
/* We cannot generate constant 1 for fract. */
--- 7401,7412 ----
arg00 = arg0;
arg01 = build_one_cst (type);
}
+
if (TREE_CODE (arg1) == MULT_EXPR)
{
arg10 = TREE_OPERAND (arg1, 0);
arg11 = TREE_OPERAND (arg1, 1);
}
else
{
/* We cannot generate constant 1 for fract. */
Index: gcc/tree-ssa-forwprop.c
===================================================================
*** gcc/tree-ssa-forwprop.c (revision 145282)
--- gcc/tree-ssa-forwprop.c (working copy)
*************** forward_propagate_addr_into_variable_arr
*** 618,625 ****
tree def_rhs,
gimple_stmt_iterator *use_stmt_gsi)
{
! tree index;
gimple offset_def, use_stmt = gsi_stmt (*use_stmt_gsi);
/* Get the offset's defining statement. */
offset_def = SSA_NAME_DEF_STMT (offset);
--- 618,630 ----
tree def_rhs,
gimple_stmt_iterator *use_stmt_gsi)
{
! tree index, tunit;
gimple offset_def, use_stmt = gsi_stmt (*use_stmt_gsi);
+ tree tmp;
+
+ tunit = TYPE_SIZE_UNIT (TREE_TYPE (TREE_TYPE (def_rhs)));
+ if (!host_integerp (tunit, 1))
+ return false;
/* Get the offset's defining statement. */
offset_def = SSA_NAME_DEF_STMT (offset);
*************** forward_propagate_addr_into_variable_arr
*** 629,635 ****
along in case the element size is one. In that case, however, we do not
allow multiplications because they can be computing index to a higher
level dimension (PR 37861). */
! if (integer_onep (TYPE_SIZE_UNIT (TREE_TYPE (TREE_TYPE (def_rhs)))))
{
if (is_gimple_assign (offset_def)
&& gimple_assign_rhs_code (offset_def) == MULT_EXPR)
--- 634,640 ----
along in case the element size is one. In that case, however, we do not
allow multiplications because they can be computing index to a higher
level dimension (PR 37861). */
! if (integer_onep (tunit))
{
if (is_gimple_assign (offset_def)
&& gimple_assign_rhs_code (offset_def) == MULT_EXPR)
*************** forward_propagate_addr_into_variable_arr
*** 648,665 ****
multiplication of an object by the size of the array elements.
This implicitly verifies that the size of the array elements
is constant. */
! offset = gimple_assign_rhs1 (offset_def);
! if (gimple_assign_rhs_code (offset_def) != MULT_EXPR
! || TREE_CODE (gimple_assign_rhs2 (offset_def)) != INTEGER_CST
! || !simple_cst_equal (gimple_assign_rhs2 (offset_def),
! TYPE_SIZE_UNIT (TREE_TYPE (TREE_TYPE (def_rhs)))))
return false;
-
- /* The first operand to the MULT_EXPR is the desired index. */
- index = offset;
}
/* Replace the pointer addition with array indexing. */
gimple_assign_set_rhs_from_tree (use_stmt_gsi, unshare_expr (def_rhs));
use_stmt = gsi_stmt (*use_stmt_gsi);
TREE_OPERAND (TREE_OPERAND (gimple_assign_rhs1 (use_stmt), 0), 1)
--- 653,693 ----
multiplication of an object by the size of the array elements.
This implicitly verifies that the size of the array elements
is constant. */
! if (gimple_assign_rhs_code (offset_def) == MULT_EXPR
! && TREE_CODE (gimple_assign_rhs2 (offset_def)) == INTEGER_CST
! && tree_int_cst_equal (gimple_assign_rhs2 (offset_def), tunit))
! {
! /* The first operand to the MULT_EXPR is the desired index. */
! index = gimple_assign_rhs1 (offset_def);
! }
! /* If we have idx * tunit + CST * tunit re-associate that. */
! else if ((gimple_assign_rhs_code (offset_def) == PLUS_EXPR
! || gimple_assign_rhs_code (offset_def) == MINUS_EXPR)
! && TREE_CODE (gimple_assign_rhs1 (offset_def)) == SSA_NAME
! && TREE_CODE (gimple_assign_rhs2 (offset_def)) == INTEGER_CST
! && (tmp = div_if_zero_remainder (EXACT_DIV_EXPR,
! gimple_assign_rhs2 (offset_def),
! tunit)) != NULL_TREE)
! {
! gimple offset_def2 = SSA_NAME_DEF_STMT (gimple_assign_rhs1 (offset_def));
! if (gimple_assign_rhs_code (offset_def2) == MULT_EXPR
! && TREE_CODE (gimple_assign_rhs2 (offset_def2)) == INTEGER_CST
! && tree_int_cst_equal (gimple_assign_rhs2 (offset_def2), tunit))
! {
! index = fold_build2 (gimple_assign_rhs_code (offset_def),
! TREE_TYPE (offset),
! gimple_assign_rhs1 (offset_def2), tmp);
! }
! else
! return false;
! }
! else
return false;
}
/* Replace the pointer addition with array indexing. */
+ index = force_gimple_operand_gsi (use_stmt_gsi, index, true, NULL_TREE,
+ true, GSI_SAME_STMT);
gimple_assign_set_rhs_from_tree (use_stmt_gsi, unshare_expr (def_rhs));
use_stmt = gsi_stmt (*use_stmt_gsi);
TREE_OPERAND (TREE_OPERAND (gimple_assign_rhs1 (use_stmt), 0), 1)
Index: gcc/testsuite/gcc.dg/tree-ssa/forwprop-12.c
===================================================================
*** gcc/testsuite/gcc.dg/tree-ssa/forwprop-12.c (revision 0)
--- gcc/testsuite/gcc.dg/tree-ssa/forwprop-12.c (revision 0)
***************
*** 0 ****
--- 1,22 ----
+ /* { dg-do compile } */
+ /* { dg-options "-O -fdump-tree-forwprop1" } */
+
+ struct X { int a[256]; };
+
+ int foo(struct X *p, __SIZE_TYPE__ i)
+ {
+ int *q = &p->a[0];
+ int *q2 = (int *)((void *)q + i*4 + 32);
+ return *q2;
+ }
+
+ int bar(struct X *p, int i)
+ {
+ return *((int *)p + i + 8);
+ }
+
+ /* We should have propagated the base array address through the
+ address arithmetic into the memory access as an array access. */
+
+ /* { dg-final { scan-tree-dump-times "->a\\\[D\\\." 2 "forwprop1" } } */
+ /* { dg-final { cleanup-tree-dump "forwprop1" } } */