This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.
Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]
Re: conditional lim

From: Richard Biener <richard dot guenther at gmail dot com>
To: Evgeniya Maenkova <evgeniya dot maenkova at gmail dot com>
Cc: GCC Patches <gcc-patches at gcc dot gnu dot org>
Date: Wed, 27 May 2015 12:11:59 +0200
Subject: Re: conditional lim
Authentication-results: sourceware.org; auth=none
References: <CANVW7MPfRvpNjABKuqup9GxbUk7A=BaRqwsPcbP8DRpGoGMWdw at mail dot gmail dot com> <CAFiYyc0jxE958ROvpAEZb53xVOgn=wjm3XSLn-92SiZ3+3t0OA at mail dot gmail dot com> <CANVW7MN_LYqO-p4Vh2LQ6vSEc6RRUMhUo=ydfjr17kGW=XeVLg at mail dot gmail dot com>
On Tue, May 26, 2015 at 3:10 PM, Evgeniya Maenkova
<evgeniya.maenkova@gmail.com> wrote:
> Hi, Richard
>
> Thanks for review starting.
>
> Do you see any major issues with this patch (i.e. algorithms and ideas
> that should be completely replaced, effectively causing the re-write
> of most code)?
>
> To decide if there are major issues in the patch, perhaps, you need
> additional clarifications from me? Could you point at the places where
> additional explanations could save you most effort?
>
> Your answers to these questions are looking the first priority ones.
> You wrote about several issues in the code, which are looking as easy
> (or almost easy ;) to fix(inline functions, unswitch-loops flag,
> comments, etc). But, I think you agree, letâs first decide about the
> major issues (I mean, whether we continue with this patch or starting
> new one, this will save a lot of time for both of us).

I didn't get an overall idea on how the patch works, that is, how it integrates
with the existing algorithm.  If you can elaborate on that a bit that would
be helpful.

I think the code-generation part needs some work (whether by following
my idea with re-using copy_bbs or whether by basically re-implementing
it is up to debate).  How does your code handle

  for ()
    {
       if (cond1)
        {
           if (cond2)
             invariant;
           if (cond3)
             invariant;
        }
    }

?  Optimally we'd have before the loop exactly the same if () structure
(thus if (cond1) is shared).

Richard.


> Thanks,
>
> Evgeniya
>
> On Tue, May 26, 2015 at 2:31 PM, Richard Biener
> <richard.guenther@gmail.com> wrote:
>> On Fri, May 8, 2015 at 11:07 PM, Evgeniya Maenkova
>> <evgeniya.maenkova@gmail.com> wrote:
>>> Hi,
>>>
>>> Could you please review my patch for predicated lim?
>>>
>>> Let me note some details about it:
>>>
>>>
>>>
>>> 1)      Phi statements are still moved only if they have 1 or 2
>>> arguments. However, phi statements could be move under conditions (as
>>> itâs done for the other statements).  Probably, phi statement motion
>>> with 3 + arguments could be implemented in the next patch after
>>> predicated lim.
>>>
>>> 2)      Patch has limitations/features like (it was ok to me to
>>> implement it such way, maybe Iâm not correct. ):
>>>
>>> a)      Loop1
>>>
>>>     {
>>>
>>>               If (a)
>>>
>>>                  Loop2
>>>
>>>                      {
>>>
>>>                        Stmt - Invariant for Loop1
>>>
>>>                      }
>>>
>>>              }
>>>
>>>        In this case Stmt will be moved only out of Loop2, because of if (a).
>>>
>>> b)      Or
>>>
>>> Loop1
>>>
>>>     {
>>>
>>>          â
>>>
>>>          If (cond1)
>>>
>>>               If (cond2)
>>>
>>>                   If (cond3)
>>>
>>>                       Stmt;
>>>
>>>        }
>>>
>>> Stmt will be moved out only if cond1 is always executed in Loop1.
>>>
>>> c)       It took me a long time to write all of these code, so there
>>> might be other peculiarities which I forgot to mention. :)
>>>
>>>                        Letâs discuss these ones as you will review my patch.
>>>
>>> 3)      Patch consists of 9 files:
>>>
>>> a)      gcc/testsuite/gcc.dg/tree-ssa/loop-7.c,
>>> gcc/testsuite/gcc.dg/tree-ssa/recip-3.c â changed tests:
>>>
>>> -          gcc/testsuite/gcc.dg/tree-ssa/loop-7.c  changed as
>>> predicated lim moves 2 more statements out of the loop;
>>>
>>> -          gcc/testsuite/gcc.dg/tree-ssa/recip-3.c â with conditional
>>> lim recip optimization in this test doesnât work (the corresponding
>>> value is below threshold as I could see in the code for recip, 1<3).
>>> So to have recip working in this test I changed test a little bit.
>>>
>>> b)      gcc/tree-ssa-loop-im.c â the patched lim per se
>>>
>>> c)       gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-13.c,
>>> gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-14.c,
>>>
>>> gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-15.c,
>>> gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-16.c,
>>>
>>> gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-17.c,
>>> gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-18.c
>>>
>>> the tests for conditional lim.
>>>
>>> 4)      Patch testing:
>>>
>>> a)      make âk check (no difference in results for me for the clean
>>> build and the patched one,
>>>
>>> -          Revision: 222849,
>>>
>>> -          uname -a
>>>
>>>                Linux Istanbul 3.16.0-23-generic #31-Ubuntu SMP Tue Oct
>>> 21 18:00:35 UTC 2014 i686 i686 i686 GNU/Linux
>>>
>>> b)      Bootstrap.
>>>
>>>  It goes well now, however to fix it I have made a temporary hack in
>>> the lim code. And with this fix patch definitely shouldnât be
>>> committed.
>>>
>>> I did so, as I would like to discuss this issue first.
>>>
>>> The problem is: I got stage2-stage3 comparison failure on the single
>>> file (tree-vect-data-refs.o). After some investigation I understood
>>> that tree-vect-data-refs.o differs being compiled with and without
>>> â-gâ option (yes, more exactly on stage 2 this is  â-g âO2 âgtoggleâ,
>>> and for stage 3 this is â-g âO2â. But to simplify things I can
>>> reproduce this difference on the same build (even not bootstrapped),
>>> with option â âgâ and without it).
>>>
>>> Of course, the file compiled with âg option will contain debug
>>> information and will differ from the corresponding file without debug
>>> information. I mean there is the difference reported by
>>> contrib/compare-debug (I mean we talk about stripped files).
>>>
>>> Then I compared assemblers and lim dump files and made a conclusion
>>> the difference is:
>>>
>>> There is statement _484=something, which is conditionally moved out of
>>> loop X. In non debug cases no usages of _484 are left inside the loop
>>> X. In debug case, there is the single use in debug statement. So for
>>> debug case my patch generates phi statement for _484 to make it
>>> available inside the loop X. For non debug case, no such phi statement
>>> generated as there is no uses of _484.
>>>
>>> There is one more statement with lhs=_493, with exactly this pattern
>>> of use. But this is not important for our discussion.
>>>
>>>
>>>
>>> So, in my opinion itâs ok to generate additional phi node for debug
>>> case. But Iâm not a compiler expert and maybe there is some
>>> requirement that debug and non-debug versions should differ only by
>>> debug statements, I donât know.
>>
>> As Andi said code generation needs to be the same.
>>
>>> My temporary hack fix is just removing of such debug statements (no
>>> debug statements, no problems J).
>>
>> That's fine and generally what is done in other passes.
>>
>>> The alternatives I see are:
>>>
>>> -          Move debug statements outside the loop (not good in my opinion);
>>>
>>> -          Somehow reset values in debug statements (not good in my
>>> opinion, if I could provide correct information for them);
>>
>> You could re-set them via gimple_debug_bind_reset_value which
>> will tell the user that at this point the value is optimized out
>> (that's slightly
>> better than removing the debug stmts).
>>
>>> -          Generate phi for debug statements and fix bootstrap (say,
>>> letâs have some list of files, which we have special check for. I mean
>>> for my case, the difference is not in stage2 and stage3, itâs in debug
>>> and non-debug). I like this variant. However, as I donât see such list
>>> of special files in the whole gcc, I think I might be missing
>>> something.
>>
>> The policy ;)
>>
>>> What do you think?
>>
>> Now about the patch.  Generally there seem to be spurious whitespace-only
>> or formatting differences - try to avoid these.
>>
>> All new functiuons need a comment explaining them and their function
>> parameters.
>>
>> +static cond_data_map *get_cond_data_map (struct loop * level)
>> +{
>>
>> per GCC coding conventions this should be formatted like
>>
>> static cond_data_map *
>> get_cond_data_map (struct loop * level)
>> {
>>
>> +#define SET_ALWAYS_EXECUTED_IN(BB, VAL)\
>> +                 BB->aux = (void *) (XCNEW (struct move_cond));\
>> +                 BB_MOVE_COND (BB)->type=ALWAYS_EXECUTED_IN;\
>> +                 BB_MOVE_COND (BB)->loop=VAL;
>>
>> this kind of side-effects in macros are bad - better turn them into
>> inline functions.
>>
>> -  if (flag_unswitch_loops
>> -      && gimple_code (stmt) == GIMPLE_COND)
>> -    {
>> -      /* If we perform unswitching, force the operands of the invariant
>> -        condition to be moved out of the loop.  */
>> -      return MOVE_POSSIBLE;
>> -    }
>> +  if (gimple_code (stmt) == GIMPLE_COND)
>> +    return MOVE_POSSIBLE;
>>
>> any reason you removed the guard on flag_unswitch_loops?  We may
>> want to add a new flag, certainly this transform might not be suitable
>> for -O[12s] for example.  Like if not all stmts inside a conditional can
>> be moved the transform will increase code-size.
>>
>> You are doing a (lot of?) refactoring which makes review of the patch
>> harder.  For example
>>
>> +bool calculate_tgt_level (gimple stmt, struct loop *outermost,
>> +                        bool cond_executed, basic_block bb)
>> +{
>> +  struct lim_aux_data *lim_dat
>> ...
>> +
>> +  if (lim_data->cost >= LIM_EXPENSIVE)
>> +    set_profitable_level (stmt);
>> +  return true;
>> +}
>>
>> ...
>>
>> -
>> -       lim_data = init_lim_data (stmt);
>> -       lim_data->always_executed_in = outermost;
>> -
>> -       if (!determine_max_movement (stmt, false))
>> -         {
>> -           lim_data->max_loop = NULL;
>> -           continue;
>> -         }
>> -
>> -       if (dump_file && (dump_flags & TDF_DETAILS))
>> -         {
>> -           print_gimple_stmt (dump_file, stmt, 2, 0);
>> -           fprintf (dump_file, "  invariant up to level %d, cost %d.\n\n",
>> -                    loop_depth (lim_data->max_loop),
>> -                    lim_data->cost);
>> -         }
>> -
>> -       if (lim_data->cost >= LIM_EXPENSIVE)
>> -         set_profitable_level (stmt);
>> +       if (!calculate_tgt_level (stmt, outermost, cond_executed, bb))
>> +         continue;
>>
>> where it is not obvious why moving set_profitable_level should be
>> in a function named calculate_tgt_level.  In fact determine_max_movement
>> was supposed to be the abstraction already.
>>
>> Without yet getting an idea of the overall algorithm (I'm really missing such
>> a thing...) it seems to me that the current handling of doing invariant motion
>> of PHIs should handle the analysis phase just fine (you might be adding
>> more advanced cases in this patch, but I'd rather do that as a followup for
>> initial support).
>>
>> The code-gen part looks somewhat ad-hoc to me.  In principle if we have
>> a list of PHIs to hoist we can materialize the needed part of the original
>> basic-block structure on the pre-header edge together with the controlling
>> conditionals.  The loop pre-header would be the always-executed BB
>> and the needed part would always start with the loop header and be a
>> single-entry multiple-exit region.  So ideally we could use sth like
>> gimple_duplicate_sese_region with the BB copying worker copy_bbs
>> refactored/replaced with something only copying controlling conditionals
>> and statements/PHIs that a callback identifies (tree-loop-distribution.c
>> would also like sth like that - currently it copies whole loops and removes
>> stmts it didn't intend to copy...).
>>
>> So basically invariantness_dom_walker would record a vector of PHIs
>> to hoist (or we'd construct that at move_computations time).
>> If there is any control-flow to hoist then we'd do an alternative
>> move_computations by "copying" the needed CFG and covered stmts.
>> In theory doing it the loop-distribution way (simply copy the whole
>> loop body and then remove anything not needed) would also be possible
>> though that wouldn't be the very best idea obviously ;)
>>
>> Thanks,
>> Richard.
>>
>>>
>>>
>>> Thanks,
>>>
>>>
>>>
>>> Evgeniya
>
>
>
> --
> Thanks,
>
> Evgeniya
>
> perfstories.wordpress.com
Follow-Ups:
- Re: conditional lim
  - From: Evgeniya Maenkova
References:
- conditional lim
  - From: Evgeniya Maenkova
- Re: conditional lim
  - From: Richard Biener
- Re: conditional lim
  - From: Evgeniya Maenkova
Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]