Eliminate write-only variables

Sandra Loosemore sandra@codesourcery.com
Sat May 31 18:21:00 GMT 2014


On 05/20/2014 04:56 AM, Richard Biener wrote:
> On Tue, May 20, 2014 at 6:29 AM, Jan Hubicka <hubicka@ucw.cz> wrote:
>>> On 05/18/2014 08:45 PM, Sandra Loosemore wrote:
>>>> On 05/18/2014 02:59 PM, Jan Hubicka wrote:
>>>>> For cases like local-statics-7 your approach can be "saved" by adding
>>>>> simple IPA analysis
>>>>> to look for static vars that are used only by one function and keeping
>>>>> your DSE code active
>>>>> for them, so we can still get rid of this special case assignments
>>>>> during late compilation.
>>>>> I am however not quite convinced it is worth the effort - do you have
>>>>> some real world
>>>>> cases where it helps?
>>>>
>>>> Um, the well-known benchmark.  ;-)
>>>
>>> I looked at the generated code for this benchmark and see that your
>>> patch is indeed not getting rid of all the pointless static
>>> variables, while ours is, so this is quite disappointing.  I'm
>>> thinking now that we will still need to retain our patch at least
>>> locally, although we'd really like to get it on trunk if possible.
>>
>> Yours patch can really be improved by adding simple IPA analysis to work
>> out what variables are refernced by one function only instead of using
>> not-longer-that-relevant local static info.
>> As last IPA pass for each static variable with no address taken, look at all
>> references and see if they all come from one function or functions inlined to
>> it.
>>>
>>> Here is another testcase vaguely based on the same kind of
>>> signal-processing algorithm as the benchmark, but coded without
>>> reference to it.  I have arm-none-eabi builds around for both 4.9.0
>>> with our remove_local_statics patch applied, and trunk with your
>>> patch.  With -O2, our optimization produces 23 instructions and gets
>>> rid of all 3 statics, yours produces 33 and only gets rid of 1 of
>>> them.
>>>
>>> Of course it's lame to use static variables in this way, but OTOH
>>> it's lame if GCC can't recognize them and optimize them away, too.
>>>
>>> -Sandra
>>>
>>
>>> void
>>> fir (int *coeffs, int coeff_length, int *input, int input_length, int *output)
>>> {
>>>    static int *coeffp;
>>>    static int *inputp;
>>>    static int *outputp;
>>
>> Here inputp read is rmeoved only at 101.dceloop1 pass. That is really late.
>> coeffp is removed late, too.
>>
>>>    int i, c, acc;
>>>
>>>    for (i = 0; i < input_length; i++)
>>>      {
>>>        coeffp = coeffs;
>>>        inputp = input + i;
>>>        outputp = output + i;
>>>        acc = 0;
>>>        for (c = 0; c < coeff_length; c++)
>>>        {
>>>          acc += *coeffp * *inputp;
>>>          coeffp++;
>>>          inputp--;
>>>        }
>>
>> It seems like AA can not work out the fact that inputp is unchanged until that
>> late.  Richi, any ideas?
>
> Well, AA is perfectly fine - it's just that this boils down to a partial
> redundancy problem.  The stores can be removed by DCE even with
>
> Index: gcc/tree-ssa-dce.c
> ===================================================================
> --- gcc/tree-ssa-dce.c  (revision 210635)
> +++ gcc/tree-ssa-dce.c  (working copy)
> @@ -278,10 +278,20 @@ mark_stmt_if_obviously_necessary (gimple
>         break;
>
>       case GIMPLE_ASSIGN:
> -      if (TREE_CODE (gimple_assign_lhs (stmt)) == SSA_NAME
> -         && TREE_CLOBBER_P (gimple_assign_rhs1 (stmt)))
> -       return;
> -      break;
> +      {
> +       tree lhs = gimple_assign_lhs (stmt);
> +       if (TREE_CODE (lhs) == SSA_NAME
> +           && TREE_CLOBBER_P (gimple_assign_rhs1 (stmt)))
> +         return;
> +       if (TREE_CODE (lhs) == VAR_DECL
> +           && !TREE_ADDRESSABLE (lhs)
> +           && TREE_STATIC (lhs)
> +           && !TREE_PUBLIC (lhs) && !DECL_EXTERNAL (lhs)
> +           /* ???  Make sure lhs is only refered to from cfun->decl.  */
> +           && decl_function_context (lhs) == cfun->decl)
> +         return;
> +       break;
> +      }
>
>       default:
>         break;
>
> but I don't have a good way of checking the ??? prerequesite
> (even without taking its address the statics may be refered to
> by a) inline copies, b) ipa-split function parts, c) nested functions).
> I'm sure IPA references are not kept up-to-date.
>
> The last reads go away with PRE at the expense of two
> additional induction variables.
>
> If we know that a static variable does not have its address taken
> and is only refered to from a single function we can in theory
> simply rewrite it into SSA form during early opts (before inlining
> its body), for example by SRA.  That isn't even restricted to
> function-local statics (for statics used in multiple functions but
> not having address-taken we can do the same with doing
> function entry / exit load / store).  I think that would be a much
> better IPA enabled optimization than playing games with
> looking at individual stmts.

FAOD, I'm not currently planning to work on this any more myself.  I 
understand Bernd's patch pretty well and could make some minor changes 
to clean it up if that's what it takes to get it approved, but it sounds 
like what's wanted is a completely different approach, again, and I 
don't have any confidence that I could implement something that wouldn't 
just get stuck in another round of "why don't you try X instead" or 
"maybe it would be better to do this in Y".  :-(

Richard, FWIW I think both of the objections you raised to Bernd's patch 
last year have been resolved now.

https://gcc.gnu.org/ml/gcc-patches/2013-06/msg00322.html

We know that an IPA-only implementation (Jan's patch) isn't sufficient 
to catch the important cases, and the Bernd previously answered the 
question about inlining here:

https://gcc.gnu.org/ml/gcc-patches/2013-06/msg00327.html

-Sandra



More information about the Gcc-patches mailing list