This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [RFC] Redundant zero store elimination (PR31150)


On Fri, Sep 07, 2007 at 06:28:09PM +0200, Zdenek Dvorak wrote:
> > In this case the optimization decided to delete redundant:
> >   Store 
> > # HEAP.515D.1988_400 = VDEF <HEAP.515D.1988_402> { HEAP.515D.1988 }
> > (*D.1848_489)[2].xD.866.dataD.857 = 0B
> >   made redundant by 
> > # HEAP.515D.1988_417 = VDEF <HEAP.515D.1988_905> { HEAP.515D.1988 }
> > (*D.1848_489)[2].xD.866.dataD.857 = 0B
> > 
> > so I have bsi_remove(&bsi, true);'ed it and TODO_update_ssa is in
> > the pass' todo_flags_finish.  But unfortunately that bsi_remove
> > removed the VDEF as well and has not updated the places where
> > HEAP.515D.1988_417 is used.
> > I tried to release_defs (stmt); after the bsi_remove, but that only
> > lead to a different ICE.  Any ideas?
> 
> you need replace the VDEF definitions of the removed statement to fix up
> the ssa chains:
> 
> FOR_EACH_SSA_VDEF_OPERAND (def, use, stmt, iter)
>   {
>     gcc_assert (VUSE_VECT_NUM_ELEM (*vv) == 1);
>     usevar = VUSE_ELEMENT_VAR (*vv, 0);
> 
>     replace_uses_by (def, usevar);
>   }

Thanks for the suggestion, the following patch actually bootstrapped
and passed regression testing on x86_64-linux.

Attached is also the list of unique locuses of the stmts that have been
either removed by the optimization or changed into memcpy to avoid
clearing of tail padding.  As it hit in quite a few places, I wonder
if this simplified pass wouldn't be useful even for 4.3, before
a full tree-ssa-propagate.c using pass is written.  My guess is that
this simplistic pass will handle more than 50% of all the cases that
such a pass could eliminate.  I will work on the propagating version
soon, but am not sure I can write and fully test it till Monday.

On ia64-linux, this optimization helps tremendously on the pr28003.C
testcase, _Z41__static_initialization_and_destruction_0ii function
shrunk there almost by 50% (from over 6KB to 3.2KB).  On other arches
on this testcase RTL DSE manages to remove some redundant stores,
but on ia64 as there is no addressing mode with base + offset RTL
DSE loses badly on it.

BTW, in cfgrtl.c:1022, the code looks errorneously:

          b->probability = prob;
          b->count = e->count * prob / REG_BR_PROB_BASE;
          e->probability -= e->probability;
          e->count -= b->count;
          if (e->probability < 0)
            e->probability = 0;
          if (e->count < 0)
            e->count = 0;

e->probability -= e->probability;
if (e->probability < 0)
  e->probability = 0;

seems to be very fancy way of writing e->probability = 0;
Wasn't that supposed to be
e->probability -= b->probability;
?

	Jakub

Attachment: J5
Description: Text document

Attachment: 5
Description: Text document


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]