This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.
Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]
Re: conditional lim

From: Evgeniya Maenkova <evgeniya dot maenkova at gmail dot com>
To: Richard Biener <richard dot guenther at gmail dot com>
Cc: GCC Patches <gcc-patches at gcc dot gnu dot org>
Date: Wed, 27 May 2015 17:07:39 +0400
Subject: Re: conditional lim
Authentication-results: sourceware.org; auth=none
References: <CANVW7MPfRvpNjABKuqup9GxbUk7A=BaRqwsPcbP8DRpGoGMWdw at mail dot gmail dot com> <CAFiYyc0jxE958ROvpAEZb53xVOgn=wjm3XSLn-92SiZ3+3t0OA at mail dot gmail dot com> <CANVW7MN_LYqO-p4Vh2LQ6vSEc6RRUMhUo=ydfjr17kGW=XeVLg at mail dot gmail dot com> <CAFiYyc3sR2Z0GVZ+kiQ99GUWyYmuSTNnCdmcPLnFO9aOAEh2NA at mail dot gmail dot com>
On Wed, May 27, 2015 at 2:11 PM, Richard Biener
<richard.guenther@gmail.com> wrote:
> On Tue, May 26, 2015 at 3:10 PM, Evgeniya Maenkova
> <evgeniya.maenkova@gmail.com> wrote:
>> Hi, Richard
>>
>> Thanks for review starting.
>>
>> Do you see any major issues with this patch (i.e. algorithms and ideas
>> that should be completely replaced, effectively causing the re-write
>> of most code)?
>>
>> To decide if there are major issues in the patch, perhaps, you need
>> additional clarifications from me? Could you point at the places where
>> additional explanations could save you most effort?
>>
>> Your answers to these questions are looking the first priority ones.
>> You wrote about several issues in the code, which are looking as easy
>> (or almost easy ;) to fix(inline functions, unswitch-loops flag,
>> comments, etc). But, I think you agree, letâs first decide about the
>> major issues (I mean, whether we continue with this patch or starting
>> new one, this will save a lot of time for both of us).
>
> I didn't get an overall idea on how the patch works, that is, how it integrates
> with the existing algorithm.  If you can elaborate on that a bit that would
> be helpful.
>
Hi,
Sure, I'll write you some notes in several days.

> I think the code-generation part needs some work (whether by following
> my idea with re-using copy_bbs or whether by basically re-implementing
> it is up to debate).  How does your code handle
>
>   for ()
>     {
>        if (cond1)
>         {
>            if (cond2)
>              invariant;
>            if (cond3)
>              invariant;
>         }
>     }
>
> ?  Optimally we'd have before the loop exactly the same if () structure
> (thus if (cond1) is shared).

If both invariants are going out of the same loop (i mean tgt_level),
then if structure will be the same.
for1()
  for ()
    {
      if (cond1)
        {
          if (cond2)
            invariant1;
          if (cond3)
            invariant2;
        }
    }

will be transformed to
for1()
  if (cond1)
    {
      if (cond2)
        invariant1;
      if (cond3)
        invariant2;
     }
  }
  for ()
    {
      if (cond1)
        {
          if (cond2);
          if (cond3);
        }
    }
(I don't cleanup empty if's in lim code).


If these invarians are moved in different loops then

for1
  for2()
    for()
      {
        if (cond1)
          {
            if (cond2)
              invariant1;
            if (cond3)
              invariant2;
           }
       }

will be transformed to:
for1
  {
    if (cond1)
      if (cond2)
        invariant1;
    for2()
      {
         if (cond1)
           if (cond3)
             invariant2;
         for()
           {
              if (cond1)
               {
                 if (cond2);
                 if (cond3);
               }
           }
        }
   }

Of course, there could be some bugs, but the idea was as mentioned above.

This transformation was looking logical to me. What do you think?

Thanks,

Evgeniya

>
> Richard.



>
>
>> Thanks,
>>
>> Evgeniya
>>
>> On Tue, May 26, 2015 at 2:31 PM, Richard Biener
>> <richard.guenther@gmail.com> wrote:
>>> On Fri, May 8, 2015 at 11:07 PM, Evgeniya Maenkova
>>> <evgeniya.maenkova@gmail.com> wrote:
>>>> Hi,
>>>>
>>>> Could you please review my patch for predicated lim?
>>>>
>>>> Let me note some details about it:
>>>>
>>>>
>>>>
>>>> 1)      Phi statements are still moved only if they have 1 or 2
>>>> arguments. However, phi statements could be move under conditions (as
>>>> itâs done for the other statements).  Probably, phi statement motion
>>>> with 3 + arguments could be implemented in the next patch after
>>>> predicated lim.
>>>>
>>>> 2)      Patch has limitations/features like (it was ok to me to
>>>> implement it such way, maybe Iâm not correct. ):
>>>>
>>>> a)      Loop1
>>>>
>>>>     {
>>>>
>>>>               If (a)
>>>>
>>>>                  Loop2
>>>>
>>>>                      {
>>>>
>>>>                        Stmt - Invariant for Loop1
>>>>
>>>>                      }
>>>>
>>>>              }
>>>>
>>>>        In this case Stmt will be moved only out of Loop2, because of if (a).
>>>>
>>>> b)      Or
>>>>
>>>> Loop1
>>>>
>>>>     {
>>>>
>>>>          â
>>>>
>>>>          If (cond1)
>>>>
>>>>               If (cond2)
>>>>
>>>>                   If (cond3)
>>>>
>>>>                       Stmt;
>>>>
>>>>        }
>>>>
>>>> Stmt will be moved out only if cond1 is always executed in Loop1.
>>>>
>>>> c)       It took me a long time to write all of these code, so there
>>>> might be other peculiarities which I forgot to mention. :)
>>>>
>>>>                        Letâs discuss these ones as you will review my patch.
>>>>
>>>> 3)      Patch consists of 9 files:
>>>>
>>>> a)      gcc/testsuite/gcc.dg/tree-ssa/loop-7.c,
>>>> gcc/testsuite/gcc.dg/tree-ssa/recip-3.c â changed tests:
>>>>
>>>> -          gcc/testsuite/gcc.dg/tree-ssa/loop-7.c  changed as
>>>> predicated lim moves 2 more statements out of the loop;
>>>>
>>>> -          gcc/testsuite/gcc.dg/tree-ssa/recip-3.c â with conditional
>>>> lim recip optimization in this test doesnât work (the corresponding
>>>> value is below threshold as I could see in the code for recip, 1<3).
>>>> So to have recip working in this test I changed test a little bit.
>>>>
>>>> b)      gcc/tree-ssa-loop-im.c â the patched lim per se
>>>>
>>>> c)       gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-13.c,
>>>> gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-14.c,
>>>>
>>>> gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-15.c,
>>>> gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-16.c,
>>>>
>>>> gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-17.c,
>>>> gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-18.c
>>>>
>>>> the tests for conditional lim.
>>>>
>>>> 4)      Patch testing:
>>>>
>>>> a)      make âk check (no difference in results for me for the clean
>>>> build and the patched one,
>>>>
>>>> -          Revision: 222849,
>>>>
>>>> -          uname -a
>>>>
>>>>                Linux Istanbul 3.16.0-23-generic #31-Ubuntu SMP Tue Oct
>>>> 21 18:00:35 UTC 2014 i686 i686 i686 GNU/Linux
>>>>
>>>> b)      Bootstrap.
>>>>
>>>>  It goes well now, however to fix it I have made a temporary hack in
>>>> the lim code. And with this fix patch definitely shouldnât be
>>>> committed.
>>>>
>>>> I did so, as I would like to discuss this issue first.
>>>>
>>>> The problem is: I got stage2-stage3 comparison failure on the single
>>>> file (tree-vect-data-refs.o). After some investigation I understood
>>>> that tree-vect-data-refs.o differs being compiled with and without
>>>> â-gâ option (yes, more exactly on stage 2 this is  â-g âO2 âgtoggleâ,
>>>> and for stage 3 this is â-g âO2â. But to simplify things I can
>>>> reproduce this difference on the same build (even not bootstrapped),
>>>> with option â âgâ and without it).
>>>>
>>>> Of course, the file compiled with âg option will contain debug
>>>> information and will differ from the corresponding file without debug
>>>> information. I mean there is the difference reported by
>>>> contrib/compare-debug (I mean we talk about stripped files).
>>>>
>>>> Then I compared assemblers and lim dump files and made a conclusion
>>>> the difference is:
>>>>
>>>> There is statement _484=something, which is conditionally moved out of
>>>> loop X. In non debug cases no usages of _484 are left inside the loop
>>>> X. In debug case, there is the single use in debug statement. So for
>>>> debug case my patch generates phi statement for _484 to make it
>>>> available inside the loop X. For non debug case, no such phi statement
>>>> generated as there is no uses of _484.
>>>>
>>>> There is one more statement with lhs=_493, with exactly this pattern
>>>> of use. But this is not important for our discussion.
>>>>
>>>>
>>>>
>>>> So, in my opinion itâs ok to generate additional phi node for debug
>>>> case. But Iâm not a compiler expert and maybe there is some
>>>> requirement that debug and non-debug versions should differ only by
>>>> debug statements, I donât know.
>>>
>>> As Andi said code generation needs to be the same.
>>>
>>>> My temporary hack fix is just removing of such debug statements (no
>>>> debug statements, no problems J).
>>>
>>> That's fine and generally what is done in other passes.
>>>
>>>> The alternatives I see are:
>>>>
>>>> -          Move debug statements outside the loop (not good in my opinion);
>>>>
>>>> -          Somehow reset values in debug statements (not good in my
>>>> opinion, if I could provide correct information for them);
>>>
>>> You could re-set them via gimple_debug_bind_reset_value which
>>> will tell the user that at this point the value is optimized out
>>> (that's slightly
>>> better than removing the debug stmts).
>>>
>>>> -          Generate phi for debug statements and fix bootstrap (say,
>>>> letâs have some list of files, which we have special check for. I mean
>>>> for my case, the difference is not in stage2 and stage3, itâs in debug
>>>> and non-debug). I like this variant. However, as I donât see such list
>>>> of special files in the whole gcc, I think I might be missing
>>>> something.
>>>
>>> The policy ;)
>>>
>>>> What do you think?
>>>
>>> Now about the patch.  Generally there seem to be spurious whitespace-only
>>> or formatting differences - try to avoid these.
>>>
>>> All new functiuons need a comment explaining them and their function
>>> parameters.
>>>
>>> +static cond_data_map *get_cond_data_map (struct loop * level)
>>> +{
>>>
>>> per GCC coding conventions this should be formatted like
>>>
>>> static cond_data_map *
>>> get_cond_data_map (struct loop * level)
>>> {
>>>
>>> +#define SET_ALWAYS_EXECUTED_IN(BB, VAL)\
>>> +                 BB->aux = (void *) (XCNEW (struct move_cond));\
>>> +                 BB_MOVE_COND (BB)->type=ALWAYS_EXECUTED_IN;\
>>> +                 BB_MOVE_COND (BB)->loop=VAL;
>>>
>>> this kind of side-effects in macros are bad - better turn them into
>>> inline functions.
>>>
>>> -  if (flag_unswitch_loops
>>> -      && gimple_code (stmt) == GIMPLE_COND)
>>> -    {
>>> -      /* If we perform unswitching, force the operands of the invariant
>>> -        condition to be moved out of the loop.  */
>>> -      return MOVE_POSSIBLE;
>>> -    }
>>> +  if (gimple_code (stmt) == GIMPLE_COND)
>>> +    return MOVE_POSSIBLE;
>>>
>>> any reason you removed the guard on flag_unswitch_loops?  We may
>>> want to add a new flag, certainly this transform might not be suitable
>>> for -O[12s] for example.  Like if not all stmts inside a conditional can
>>> be moved the transform will increase code-size.
>>>
>>> You are doing a (lot of?) refactoring which makes review of the patch
>>> harder.  For example
>>>
>>> +bool calculate_tgt_level (gimple stmt, struct loop *outermost,
>>> +                        bool cond_executed, basic_block bb)
>>> +{
>>> +  struct lim_aux_data *lim_dat
>>> ...
>>> +
>>> +  if (lim_data->cost >= LIM_EXPENSIVE)
>>> +    set_profitable_level (stmt);
>>> +  return true;
>>> +}
>>>
>>> ...
>>>
>>> -
>>> -       lim_data = init_lim_data (stmt);
>>> -       lim_data->always_executed_in = outermost;
>>> -
>>> -       if (!determine_max_movement (stmt, false))
>>> -         {
>>> -           lim_data->max_loop = NULL;
>>> -           continue;
>>> -         }
>>> -
>>> -       if (dump_file && (dump_flags & TDF_DETAILS))
>>> -         {
>>> -           print_gimple_stmt (dump_file, stmt, 2, 0);
>>> -           fprintf (dump_file, "  invariant up to level %d, cost %d.\n\n",
>>> -                    loop_depth (lim_data->max_loop),
>>> -                    lim_data->cost);
>>> -         }
>>> -
>>> -       if (lim_data->cost >= LIM_EXPENSIVE)
>>> -         set_profitable_level (stmt);
>>> +       if (!calculate_tgt_level (stmt, outermost, cond_executed, bb))
>>> +         continue;
>>>
>>> where it is not obvious why moving set_profitable_level should be
>>> in a function named calculate_tgt_level.  In fact determine_max_movement
>>> was supposed to be the abstraction already.
>>>
>>> Without yet getting an idea of the overall algorithm (I'm really missing such
>>> a thing...) it seems to me that the current handling of doing invariant motion
>>> of PHIs should handle the analysis phase just fine (you might be adding
>>> more advanced cases in this patch, but I'd rather do that as a followup for
>>> initial support).
>>>
>>> The code-gen part looks somewhat ad-hoc to me.  In principle if we have
>>> a list of PHIs to hoist we can materialize the needed part of the original
>>> basic-block structure on the pre-header edge together with the controlling
>>> conditionals.  The loop pre-header would be the always-executed BB
>>> and the needed part would always start with the loop header and be a
>>> single-entry multiple-exit region.  So ideally we could use sth like
>>> gimple_duplicate_sese_region with the BB copying worker copy_bbs
>>> refactored/replaced with something only copying controlling conditionals
>>> and statements/PHIs that a callback identifies (tree-loop-distribution.c
>>> would also like sth like that - currently it copies whole loops and removes
>>> stmts it didn't intend to copy...).
>>>
>>> So basically invariantness_dom_walker would record a vector of PHIs
>>> to hoist (or we'd construct that at move_computations time).
>>> If there is any control-flow to hoist then we'd do an alternative
>>> move_computations by "copying" the needed CFG and covered stmts.
>>> In theory doing it the loop-distribution way (simply copy the whole
>>> loop body and then remove anything not needed) would also be possible
>>> though that wouldn't be the very best idea obviously ;)
>>>
>>> Thanks,
>>> Richard.
>>>
>>>>
>>>>
>>>> Thanks,
>>>>
>>>>
>>>>
>>>> Evgeniya
>>
>>
>>
>> --
>> Thanks,
>>
>> Evgeniya
>>
>> perfstories.wordpress.com



-- 
Thanks,

Evgeniya

perfstories.wordpress.com
References:
- conditional lim
  - From: Evgeniya Maenkova
- Re: conditional lim
  - From: Richard Biener
- Re: conditional lim
  - From: Evgeniya Maenkova
- Re: conditional lim
  - From: Richard Biener
Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]