This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

RE: Optimization for static local variables


> On Wed, Jun 14, 2017 at 1:14 PM, Prachi Godbole
> <Prachi.Godbole@imgtec.com> wrote:
> > I'm developing a solution to optimize away intermediate stores (loads) for
> static local variables which are assigned to before referenced on every path
> through a function.
> >
> > Currently GCC eliminates all loads/stores in a straight line code but not in
> control structures. AFAIU, passes like sccvn, lim (store motion in particular)
> sink etc. try to propagate the assignment exprs through SSA vars instead of
> through memory thereby eliminating almost all loads and some stores but
> still not all.
> >
> > For example:
> >
> > void some_function()
> > {
> >     static some_type sum;
> >     ...
> >     for( i = 0 ; i < n ; i++ )
> >          {
> >              sum = matrix [i][n] ;
> >
> >              for( j = 0 ; j < i ; j++ )
> >                  {
> >                      sum -= matrix [i][j] * matrix [j][n] ;
> >                  }
> >
> >             matrix [i][n] = sum ;
> >         }
> >     ...
> > }
> >
> > Resulting SSA:
> >
> > ...
> >   <bb 14> [2.25%]:
> >   _10 = matrix[0][n_13(D)];
> >   sum = _10;
> >   goto <bb 10>; [100.00%]
> > ...
> >   <bb 4> [12.75%]:
> >   _1 = matrix[i_16][n_13(D)];
> >   if (i_16 > 0)
> >     goto <bb 6>; [100.00%]
> >   else
> >     goto <bb 9>; [0.00%]
> > ...
> >   <bb 6> [85.00%]:
> >   # j_24 = PHI <j_18(6), 0(4)>
> >   # sum_lsm.4_27 = PHI <_6(6), _1(4)>
> >   _3 = matrix[i_16][j_24];
> >   _4 = matrix[j_24][n_13(D)];
> >   _5 = _3 * _4;
> >   _6 = sum_lsm.4_27 - _5;
> >   j_18 = j_24 + 1;
> >   if (i_16 > j_18)
> >     goto <bb 6>; [85.00%]
> >   else
> >     goto <bb 7>; [15.00%]
> > ...
> >   <bb 7> [12.75%]:
> >   # _40 = PHI <_6(6)>
> >   goto <bb 9>; [100.00%]
> >
> >   <bb 9> [12.75%]:
> >   # sum_lsm.4_19 = PHI <_40(7), _1(4)>
> >
> >   <bb 10> [15.00%]:
> >   # i_20 = PHI <i_16(9), 0(14)>
> >   # sum_lsm.4_8 = PHI <sum_lsm.4_19(9), _10(14)>
> >   # sum_lsm.5_22 = PHI <1(9), 0(14)>
> >   matrix[i_20][n_13(D)] = sum_lsm.4_8;
> >   i_16 = i_20 + 1;
> >   if (n_13(D) > i_16)
> >     goto <bb 4>; [85.00%]
> >   else
> >     goto <bb 11>; [15.00%]
> >
> >   <bb 11> [2.25%]:
> >   # sum_lsm.4_39 = PHI <sum_lsm.4_8(10)>
> >   # sum_lsm.5_38 = PHI <sum_lsm.5_22(10)>
> >   if (sum_lsm.5_38 != 0)
> >     goto <bb 12>; [0.00%]
> >   else
> >     goto <bb 13>; [100.00%]
> >
> >   <bb 12> [0.00%]:
> >   sum = sum_lsm.4_39;
> > ...
> >
> > Although all intermediate loads are eliminated here, store to sum before
> and after the 'j' loop (see bb 14 and bb 12) still remain which can be
> eliminated.
> >
> > My solution would reuse a lot of data structures and routines from the sink
> pass. So my question is, can I add this optimization into the sink pass itself or
> would it require a new pass and if yes, what would be the ideal place for it.
> > Any other pointers are more than welcome.
> 
> So the idea is that for any store to such variable that is dominated by a kill the
> function exit doesn't count as a use, right?
> (unless the variable escapes the scope of the function, for example via a
> nested function -- which might be the case that is very difficult to detect).
> 
Yes that is the general idea. I had thought of initially being conservative and not optimizing at all when the variable could potentially escape the scope.

> I'm not sure what part of the sink pass is suitable for this but as sinking walks
> post dominators backwards which also DSE does it would fit DSE as well, no?
> You'd pre-compute blocks containing kills (or do two backwards passes) and
> then special case GIMPLE_RETURN in dse_classify_store where previously
> ref_maybe_used_by_stmt_p covers those.
> 
I was looking for a one time solution rather than doing it as and when the opportunities present themselves.
But doing it in DSE is more advantageous, I can see that. So I'll go ahead with DSE.

> nested fn case:
> 
> void foo (void)
> {
>   static int i = 0;
>   int __attribute__((noinline)) bar () { return i; }
>   int j = bar ();
>   i = 1;
> }
> 
> I think we don't mark 'i' TREE_ADDRESSABLE (escaped) here so we simply
> treat 'i' as a global variable accessible from other scopes.
> 
> Richard.
> 
> > Thanks,
> > Prachi

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]