This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: conditional lim
- From: Evgeniya Maenkova <evgeniya dot maenkova at gmail dot com>
- To: Richard Biener <richard dot guenther at gmail dot com>
- Cc: GCC Patches <gcc-patches at gcc dot gnu dot org>
- Date: Wed, 27 May 2015 17:07:39 +0400
- Subject: Re: conditional lim
- Authentication-results: sourceware.org; auth=none
- References: <CANVW7MPfRvpNjABKuqup9GxbUk7A=BaRqwsPcbP8DRpGoGMWdw at mail dot gmail dot com> <CAFiYyc0jxE958ROvpAEZb53xVOgn=wjm3XSLn-92SiZ3+3t0OA at mail dot gmail dot com> <CANVW7MN_LYqO-p4Vh2LQ6vSEc6RRUMhUo=ydfjr17kGW=XeVLg at mail dot gmail dot com> <CAFiYyc3sR2Z0GVZ+kiQ99GUWyYmuSTNnCdmcPLnFO9aOAEh2NA at mail dot gmail dot com>
On Wed, May 27, 2015 at 2:11 PM, Richard Biener
<richard.guenther@gmail.com> wrote:
> On Tue, May 26, 2015 at 3:10 PM, Evgeniya Maenkova
> <evgeniya.maenkova@gmail.com> wrote:
>> Hi, Richard
>>
>> Thanks for review starting.
>>
>> Do you see any major issues with this patch (i.e. algorithms and ideas
>> that should be completely replaced, effectively causing the re-write
>> of most code)?
>>
>> To decide if there are major issues in the patch, perhaps, you need
>> additional clarifications from me? Could you point at the places where
>> additional explanations could save you most effort?
>>
>> Your answers to these questions are looking the first priority ones.
>> You wrote about several issues in the code, which are looking as easy
>> (or almost easy ;) to fix(inline functions, unswitch-loops flag,
>> comments, etc). But, I think you agree, letâs first decide about the
>> major issues (I mean, whether we continue with this patch or starting
>> new one, this will save a lot of time for both of us).
>
> I didn't get an overall idea on how the patch works, that is, how it integrates
> with the existing algorithm. If you can elaborate on that a bit that would
> be helpful.
>
Hi,
Sure, I'll write you some notes in several days.
> I think the code-generation part needs some work (whether by following
> my idea with re-using copy_bbs or whether by basically re-implementing
> it is up to debate). How does your code handle
>
> for ()
> {
> if (cond1)
> {
> if (cond2)
> invariant;
> if (cond3)
> invariant;
> }
> }
>
> ? Optimally we'd have before the loop exactly the same if () structure
> (thus if (cond1) is shared).
If both invariants are going out of the same loop (i mean tgt_level),
then if structure will be the same.
for1()
for ()
{
if (cond1)
{
if (cond2)
invariant1;
if (cond3)
invariant2;
}
}
will be transformed to
for1()
if (cond1)
{
if (cond2)
invariant1;
if (cond3)
invariant2;
}
}
for ()
{
if (cond1)
{
if (cond2);
if (cond3);
}
}
(I don't cleanup empty if's in lim code).
If these invarians are moved in different loops then
for1
for2()
for()
{
if (cond1)
{
if (cond2)
invariant1;
if (cond3)
invariant2;
}
}
will be transformed to:
for1
{
if (cond1)
if (cond2)
invariant1;
for2()
{
if (cond1)
if (cond3)
invariant2;
for()
{
if (cond1)
{
if (cond2);
if (cond3);
}
}
}
}
Of course, there could be some bugs, but the idea was as mentioned above.
This transformation was looking logical to me. What do you think?
Thanks,
Evgeniya
>
> Richard.
>
>
>> Thanks,
>>
>> Evgeniya
>>
>> On Tue, May 26, 2015 at 2:31 PM, Richard Biener
>> <richard.guenther@gmail.com> wrote:
>>> On Fri, May 8, 2015 at 11:07 PM, Evgeniya Maenkova
>>> <evgeniya.maenkova@gmail.com> wrote:
>>>> Hi,
>>>>
>>>> Could you please review my patch for predicated lim?
>>>>
>>>> Let me note some details about it:
>>>>
>>>>
>>>>
>>>> 1) Phi statements are still moved only if they have 1 or 2
>>>> arguments. However, phi statements could be move under conditions (as
>>>> itâs done for the other statements). Probably, phi statement motion
>>>> with 3 + arguments could be implemented in the next patch after
>>>> predicated lim.
>>>>
>>>> 2) Patch has limitations/features like (it was ok to me to
>>>> implement it such way, maybe Iâm not correct. ):
>>>>
>>>> a) Loop1
>>>>
>>>> {
>>>>
>>>> If (a)
>>>>
>>>> Loop2
>>>>
>>>> {
>>>>
>>>> Stmt - Invariant for Loop1
>>>>
>>>> }
>>>>
>>>> }
>>>>
>>>> In this case Stmt will be moved only out of Loop2, because of if (a).
>>>>
>>>> b) Or
>>>>
>>>> Loop1
>>>>
>>>> {
>>>>
>>>> â
>>>>
>>>> If (cond1)
>>>>
>>>> If (cond2)
>>>>
>>>> If (cond3)
>>>>
>>>> Stmt;
>>>>
>>>> }
>>>>
>>>> Stmt will be moved out only if cond1 is always executed in Loop1.
>>>>
>>>> c) It took me a long time to write all of these code, so there
>>>> might be other peculiarities which I forgot to mention. :)
>>>>
>>>> Letâs discuss these ones as you will review my patch.
>>>>
>>>> 3) Patch consists of 9 files:
>>>>
>>>> a) gcc/testsuite/gcc.dg/tree-ssa/loop-7.c,
>>>> gcc/testsuite/gcc.dg/tree-ssa/recip-3.c â changed tests:
>>>>
>>>> - gcc/testsuite/gcc.dg/tree-ssa/loop-7.c changed as
>>>> predicated lim moves 2 more statements out of the loop;
>>>>
>>>> - gcc/testsuite/gcc.dg/tree-ssa/recip-3.c â with conditional
>>>> lim recip optimization in this test doesnât work (the corresponding
>>>> value is below threshold as I could see in the code for recip, 1<3).
>>>> So to have recip working in this test I changed test a little bit.
>>>>
>>>> b) gcc/tree-ssa-loop-im.c â the patched lim per se
>>>>
>>>> c) gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-13.c,
>>>> gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-14.c,
>>>>
>>>> gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-15.c,
>>>> gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-16.c,
>>>>
>>>> gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-17.c,
>>>> gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-18.c
>>>>
>>>> the tests for conditional lim.
>>>>
>>>> 4) Patch testing:
>>>>
>>>> a) make âk check (no difference in results for me for the clean
>>>> build and the patched one,
>>>>
>>>> - Revision: 222849,
>>>>
>>>> - uname -a
>>>>
>>>> Linux Istanbul 3.16.0-23-generic #31-Ubuntu SMP Tue Oct
>>>> 21 18:00:35 UTC 2014 i686 i686 i686 GNU/Linux
>>>>
>>>> b) Bootstrap.
>>>>
>>>> It goes well now, however to fix it I have made a temporary hack in
>>>> the lim code. And with this fix patch definitely shouldnât be
>>>> committed.
>>>>
>>>> I did so, as I would like to discuss this issue first.
>>>>
>>>> The problem is: I got stage2-stage3 comparison failure on the single
>>>> file (tree-vect-data-refs.o). After some investigation I understood
>>>> that tree-vect-data-refs.o differs being compiled with and without
>>>> â-gâ option (yes, more exactly on stage 2 this is â-g âO2 âgtoggleâ,
>>>> and for stage 3 this is â-g âO2â. But to simplify things I can
>>>> reproduce this difference on the same build (even not bootstrapped),
>>>> with option â âgâ and without it).
>>>>
>>>> Of course, the file compiled with âg option will contain debug
>>>> information and will differ from the corresponding file without debug
>>>> information. I mean there is the difference reported by
>>>> contrib/compare-debug (I mean we talk about stripped files).
>>>>
>>>> Then I compared assemblers and lim dump files and made a conclusion
>>>> the difference is:
>>>>
>>>> There is statement _484=something, which is conditionally moved out of
>>>> loop X. In non debug cases no usages of _484 are left inside the loop
>>>> X. In debug case, there is the single use in debug statement. So for
>>>> debug case my patch generates phi statement for _484 to make it
>>>> available inside the loop X. For non debug case, no such phi statement
>>>> generated as there is no uses of _484.
>>>>
>>>> There is one more statement with lhs=_493, with exactly this pattern
>>>> of use. But this is not important for our discussion.
>>>>
>>>>
>>>>
>>>> So, in my opinion itâs ok to generate additional phi node for debug
>>>> case. But Iâm not a compiler expert and maybe there is some
>>>> requirement that debug and non-debug versions should differ only by
>>>> debug statements, I donât know.
>>>
>>> As Andi said code generation needs to be the same.
>>>
>>>> My temporary hack fix is just removing of such debug statements (no
>>>> debug statements, no problems J).
>>>
>>> That's fine and generally what is done in other passes.
>>>
>>>> The alternatives I see are:
>>>>
>>>> - Move debug statements outside the loop (not good in my opinion);
>>>>
>>>> - Somehow reset values in debug statements (not good in my
>>>> opinion, if I could provide correct information for them);
>>>
>>> You could re-set them via gimple_debug_bind_reset_value which
>>> will tell the user that at this point the value is optimized out
>>> (that's slightly
>>> better than removing the debug stmts).
>>>
>>>> - Generate phi for debug statements and fix bootstrap (say,
>>>> letâs have some list of files, which we have special check for. I mean
>>>> for my case, the difference is not in stage2 and stage3, itâs in debug
>>>> and non-debug). I like this variant. However, as I donât see such list
>>>> of special files in the whole gcc, I think I might be missing
>>>> something.
>>>
>>> The policy ;)
>>>
>>>> What do you think?
>>>
>>> Now about the patch. Generally there seem to be spurious whitespace-only
>>> or formatting differences - try to avoid these.
>>>
>>> All new functiuons need a comment explaining them and their function
>>> parameters.
>>>
>>> +static cond_data_map *get_cond_data_map (struct loop * level)
>>> +{
>>>
>>> per GCC coding conventions this should be formatted like
>>>
>>> static cond_data_map *
>>> get_cond_data_map (struct loop * level)
>>> {
>>>
>>> +#define SET_ALWAYS_EXECUTED_IN(BB, VAL)\
>>> + BB->aux = (void *) (XCNEW (struct move_cond));\
>>> + BB_MOVE_COND (BB)->type=ALWAYS_EXECUTED_IN;\
>>> + BB_MOVE_COND (BB)->loop=VAL;
>>>
>>> this kind of side-effects in macros are bad - better turn them into
>>> inline functions.
>>>
>>> - if (flag_unswitch_loops
>>> - && gimple_code (stmt) == GIMPLE_COND)
>>> - {
>>> - /* If we perform unswitching, force the operands of the invariant
>>> - condition to be moved out of the loop. */
>>> - return MOVE_POSSIBLE;
>>> - }
>>> + if (gimple_code (stmt) == GIMPLE_COND)
>>> + return MOVE_POSSIBLE;
>>>
>>> any reason you removed the guard on flag_unswitch_loops? We may
>>> want to add a new flag, certainly this transform might not be suitable
>>> for -O[12s] for example. Like if not all stmts inside a conditional can
>>> be moved the transform will increase code-size.
>>>
>>> You are doing a (lot of?) refactoring which makes review of the patch
>>> harder. For example
>>>
>>> +bool calculate_tgt_level (gimple stmt, struct loop *outermost,
>>> + bool cond_executed, basic_block bb)
>>> +{
>>> + struct lim_aux_data *lim_dat
>>> ...
>>> +
>>> + if (lim_data->cost >= LIM_EXPENSIVE)
>>> + set_profitable_level (stmt);
>>> + return true;
>>> +}
>>>
>>> ...
>>>
>>> -
>>> - lim_data = init_lim_data (stmt);
>>> - lim_data->always_executed_in = outermost;
>>> -
>>> - if (!determine_max_movement (stmt, false))
>>> - {
>>> - lim_data->max_loop = NULL;
>>> - continue;
>>> - }
>>> -
>>> - if (dump_file && (dump_flags & TDF_DETAILS))
>>> - {
>>> - print_gimple_stmt (dump_file, stmt, 2, 0);
>>> - fprintf (dump_file, " invariant up to level %d, cost %d.\n\n",
>>> - loop_depth (lim_data->max_loop),
>>> - lim_data->cost);
>>> - }
>>> -
>>> - if (lim_data->cost >= LIM_EXPENSIVE)
>>> - set_profitable_level (stmt);
>>> + if (!calculate_tgt_level (stmt, outermost, cond_executed, bb))
>>> + continue;
>>>
>>> where it is not obvious why moving set_profitable_level should be
>>> in a function named calculate_tgt_level. In fact determine_max_movement
>>> was supposed to be the abstraction already.
>>>
>>> Without yet getting an idea of the overall algorithm (I'm really missing such
>>> a thing...) it seems to me that the current handling of doing invariant motion
>>> of PHIs should handle the analysis phase just fine (you might be adding
>>> more advanced cases in this patch, but I'd rather do that as a followup for
>>> initial support).
>>>
>>> The code-gen part looks somewhat ad-hoc to me. In principle if we have
>>> a list of PHIs to hoist we can materialize the needed part of the original
>>> basic-block structure on the pre-header edge together with the controlling
>>> conditionals. The loop pre-header would be the always-executed BB
>>> and the needed part would always start with the loop header and be a
>>> single-entry multiple-exit region. So ideally we could use sth like
>>> gimple_duplicate_sese_region with the BB copying worker copy_bbs
>>> refactored/replaced with something only copying controlling conditionals
>>> and statements/PHIs that a callback identifies (tree-loop-distribution.c
>>> would also like sth like that - currently it copies whole loops and removes
>>> stmts it didn't intend to copy...).
>>>
>>> So basically invariantness_dom_walker would record a vector of PHIs
>>> to hoist (or we'd construct that at move_computations time).
>>> If there is any control-flow to hoist then we'd do an alternative
>>> move_computations by "copying" the needed CFG and covered stmts.
>>> In theory doing it the loop-distribution way (simply copy the whole
>>> loop body and then remove anything not needed) would also be possible
>>> though that wouldn't be the very best idea obviously ;)
>>>
>>> Thanks,
>>> Richard.
>>>
>>>>
>>>>
>>>> Thanks,
>>>>
>>>>
>>>>
>>>> Evgeniya
>>
>>
>>
>> --
>> Thanks,
>>
>> Evgeniya
>>
>> perfstories.wordpress.com
--
Thanks,
Evgeniya
perfstories.wordpress.com