This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Fix ICE in expand_cse_reciprocals (PR tree-optimization/42078)


On Nov 20, 2009, Richard Guenther <richard.guenther@gmail.com> wrote:

> I don't see why gimple_replace_lhs couldn't get the value on its own.

It can, but that's not necessarily the best representation for the
value.

Consider the case at hand.  We have say

  y_2 = builtin_sqrt(x_1)
  # DEBUG y => y_2

now, we want to effectively drop this y_2 and introduce another
expression that evaluates to the reciprocal.  I.e.:

  reciprocal_of_y_2 = builtin_rsqrt(x_1)
  
but if we take the expression originally stored in y_2, which is the
best gimple_replace_lhs could do, we'll end up with:

  # DEBUG y => builtin_sqrt(x_1)

while is both non-gimple and suboptimal, given that now there's a much
simpler way to compute the value of y, namely:

  # DEBUG y => 1.0 / reciprocal_of_y_2

It could be argued that gimple_replace_lhs or insert_debug_temps or
somesuch could be given enough brains to figure this out, but they would
then have to be told what the new lhs is going to be set to (let's say
newval), and try to figure out how to get from newval to the current rhs
(say oldval).  That's unnecessary complexity, given that the caller
knows exactly how one relates to the other, and it must know it.

This new interface provides us with a way to tell the debug temps
machinery how the released DEF relates with other DEFs that remain or
that will be introduced.

Think how useful this could be, for example, for IV elimination.

Given:

  sum = 0;
  for (i = 0; i < n; i++)
    sum += a[i];
  
i.e. (in a representation that exposed the address computations in array
indexing):

 <entry>:
  sum_3 = 0;
  # DEBUG sum => sum_3
  i_1 = 0;
  # DEBUG i => i_1
  goto <test>;

 <body>:
  T_10 = i_6 * <sizeof(*a)>;
  T_9 = a + T_10;
  T_8 = *T_9;
  sum_5 = sum_4 + T_8;
  # DEBUG sum => sum_5
  i_7 = i_6 + 1;
  # DEBUG i => i_7

 <test>:
  sum_4 = PHI <sum_3(<entry>), sum_5(<body>)>;
  i_6 = PHI <i_1(<entry>), i_7(<body>)>;
  # DEBUG sum => sum_4
  # DEBUG i => i_6
  if (i_6 < n_2(D))
    goto <body>;

Now, if we were to perform strength reduction:

 <entry>:
  sum_3 = 0;
  # DEBUG sum => sum_3
  ia_11 = a;
  T_12 = n_2(D) * sizeof(*a);
  ea_13 = ia_11 + T_12;
  # DEBUG i => 0
  goto <test>;

 <body>:
  T_8 = *ia_14;
  sum_5 = sum_4 + T_8;
  # DEBUG sum => sum_5
  ia_15 = ia_14 + sizeof(*a);
  # DEBUG i => NULL

 <test>:
  sum_4 = PHI <sum_3(<entry>), sum_5(<body>)>;
  ia_14 = PHI <ia_11(<entry>), ia_15(<body>)>;
  # DEBUG sum => sum_4
  # DEBUG i => NULL
  if (ia_14 < ea_13)
    goto <body>;

Oops, i is now completely gone.  Although i_7 was propagated into the
debug bind stmt in the loop body, when the PHI node was deleted there
was nothing we could do to preserve it.

We need some way to tell how to compute IVs that are removed based on
other IVs that remain, and this machinery would enable us to do just
that.  E.g., since we know that:

  ia = a + i * sizeof(*a);

we might very well compute i:

  i = (ia - a) / sizeof(*a);

and use this expression *instead* of dumbly failing to propagate the
increments into debug stmts.  Then we'd have:

  # DEBUG i => (ia_15 - ia_11) / sizeof(*a)

within the loop body.

Handling the confluence in the loop exit test would require implementing
one of the ??? in insert_debug_temps, namely inserting debug temps at
the end of incoming edges, yielding something like:

 <entry>:
  sum_3 = 0;
  # DEBUG sum => sum_3
  ia_11 = a;
  T_12 = n_2(D) * sizeof(*a);
  ea_13 = ia_11 + T_12;
  # DEBUG D#1 => 0
  # DEBUG i => D#1
  goto <test>;

 <body>:
  T_8 = *ia_14;
  sum_5 = sum_4 + T_8;
  # DEBUG sum => sum_5
  ia_15 = ia_14 + sizeof(*a);
  # DEBUG D#1 => (ia_15 - ia_11) / sizeof(*a)
  # DEBUG i => D#1

 <test>:
  sum_4 = PHI <sum_3(<entry>), sum_5(<body>)>;
  ia_14 = PHI <ia_11(<entry>), ia_15(<body>)>;
  # DEBUG sum => sum_4
  # DEBUG i => D#1
  if (ia_14 < ea_13)
    goto <body>;

I don't see how insert_debug_temps could make this kind of inference.


Further consider rewrite_bittest() in tree-ssa-loop-im.c.  Given:

  _3 = A_1 >> B_2;
  _4 = _3 & 1;
  # DEBUG foo => _4
  if (_4 [=!]= 0)
    ...

it optimizes to the following, in which _5 can be hoisted out of the
loop:

  # DEBUG D#1 (A_1 >> B_2)
  _5 = 1 << B_2;
  _6 = A_1 & _5;
  # DEBUG foo => D#1 & 1
  if (_6 [=!]= 0)
    ...
  
Now, if we could somehow communicate the debug machinery that _4 can be
computed from the new defs, we'd end up with fewer and simpler debug
stmts and temps:

  _5 = 1 << B_2;
  _6 = A_1 & _5;
  # DEBUG foo => _6 >> B_2
  if (_6 [=!]= 0)

I suspect other changes that went in in the initial debug temps patch
could benefit similarly:

	* tree-ssa-reassoc.c (rewrite_expr_tree, linearize_expr): Likewise.
	* tree-ssa-sink.c (statement_sink_location): Likewise.
	* tree-ssa-forwprop (forward_propagate_addr_expr): Likewise.

-- 
Alexandre Oliva, freedom fighter    http://FSFLA.org/~lxoliva/
You must be the change you wish to see in the world. -- Gandhi
Be Free! -- http://FSFLA.org/   FSF Latin America board member
Free Software Evangelist      Red Hat Brazil Compiler Engineer


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]