This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.
Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]
Re: conditional lim

From: Evgeniya Maenkova <evgeniya dot maenkova at gmail dot com>
To: Richard Biener <richard dot guenther at gmail dot com>
Cc: GCC Patches <gcc-patches at gcc dot gnu dot org>
Date: Mon, 29 Jun 2015 18:21:22 +0400
Subject: Re: conditional lim
Authentication-results: sourceware.org; auth=none
References: <CANVW7MPfRvpNjABKuqup9GxbUk7A=BaRqwsPcbP8DRpGoGMWdw at mail dot gmail dot com> <CANVW7MP0Uv3LSSd=JfD5hTnMv75SpQ1WMrV_eWMa9owZ3T+yYg at mail dot gmail dot com> <CAFiYyc3Do3Mn+eAgQGpbZCtzUy4y=FvbbtuPo2xkPwi=y=U2hg at mail dot gmail dot com> <CANVW7MNHfPnXeU0bvQRQxjfX0Cq3Fm2ZbQBBowzt36WYg8Zhng at mail dot gmail dot com> <CAFiYyc2w_f5WJrnyAgDf=Y8haqjDvF_d4RgqoHq_jwf5oopDdg at mail dot gmail dot com>
On Mon, Jun 29, 2015 at 5:10 PM, Richard Biener
<richard.guenther@gmail.com> wrote:
> On Tue, Jun 9, 2015 at 10:11 PM, Evgeniya Maenkova
> <evgeniya.maenkova@gmail.com> wrote:
>> On Tue, Jun 9, 2015 at 3:46 PM, Richard Biener
>> <richard.guenther@gmail.com> wrote:
>>> On Fri, May 29, 2015 at 3:14 PM, Evgeniya Maenkova
>>> <evgeniya.maenkova@gmail.com> wrote:
>>>> Hi Richard,
>>>>
>>>> Here is some explanation. I hope you let me know if I need to clarify something.
>>>>
>>>> Also, you asked me about concrete example, to make sure you donât miss
>>>> my answer here is the link:
>>>> https://gcc.gnu.org/ml/gcc-patches/2015-05/msg02417.html.
>>>>
>>>> Also, I doubt whether itâs convenient for you to create a build with
>>>> my patch or not. May be to clarify things you could send me some
>>>> examples/concrete cases, then Iâll compile them with
>>>> âfdump-tree-loopinit-details and âfdump-tree-lim-details and send you
>>>> these dumps. May be these dumps will be useful. (Iâll only disable
>>>> cleanup_cfg TODO after lim to let you know the exact picture after
>>>> lim).
>>>>
>>>> What do you think?
>>>>
>>>> 1.       invariantness _dom_walker â
>>>>
>>>> 1.1   for each GIMPLE_COND in given bb calls handle_cond_stmt to call
>>>> for true and false edges handle_branch_edge, which calls SET_TARGET_OF
>>>> for all bb âpredicatedâ by given GIMPLE_COND.
>>>>
>>>> SET_TARGET_OF sets in basic_blocks aux 2 facts:
>>>>
>>>> a)      this is true or false edge;
>>>>
>>>> b)      link to cond stmt;
>>>>
>>>> Handle_branch_edge works this way:
>>>>
>>>> If (cond1)
>>>>
>>>>   {
>>>>
>>>>      bb1;
>>>>
>>>>      if (cond2}
>>>>
>>>>        {
>>>>
>>>>            bb2;
>>>>
>>>>         }
>>>>
>>>>    Being called for cond1, it sets cond1 as condition for both bb1 and
>>>> bb2 (the whole branch for cond1, ie also for bb containing cond2),
>>>> then this method will be called (as there is dominance order) for
>>>> cond2 to correct things (ie to set cond2 as condition for bb2).
>>>
>>> Hmm, why not track the current condition as state during the DOM walk
>>> and thus avoid processing more than one basic-block in handle_branch_edge?
>>> Thus get rid of handle_branch_edge and instead do everything in handle_cond_stmt
>>> plus the dom-walkers BB visitor?
>>>
>> I need to look more carefully how to implement it, but I think I
>> understand what you mean and this optimization of course looks
>> reasonable to me. Will do.
>>
>>> I see you don't handle BBs with multiple predecessors - that's ok, but
>>> are you sure you don't run into correctness issues when not marking such
>>> BBs as predicated?  This misses handling of, say
>>>
>>>  if (a || b)
>>>    bb;
>>>
>>> which is a pity (but can be fixed later if desired).
>>>
>> I had some test (in gcc testsuite or bootstrap build) which worked
>> incorrectly because of multiple predecessors. As far as I remember the
>> situation was (further, will make some notes about such tests to
>> clarify this better), I mean with previous version of my code which
>> handled bb with 2 predecessors:
>> if (a)
>>   tmpvar=something;
>> while()
>>   if (a || b)
>>       basic_block {do something with tmpvar;} // I mean basic block
>> predicated by bb with a and bb with b
>>
>> So, if a is false, I mean we didn't do tmpvar=something (outside
>> loop), BUT we are in basick_block  (we went by bb with b), we got
>> Segmentation falt in basic_block {do something with tmpvar;}.
>>
>> I think we can reproduce all the details of this test if I remove not
>> handling bb with 2 predecessors.
>>
>> So I wouldn't move bb with 2 predecessors (this is not always executed
>> bb in any loop, not conditionally, they will not be moved at all).
>>
>> This is my more detail explanation on this point. Perhaps, I didn't
>> understand your question about correctness. Could you repeat it in
>> other words (based on this new clarification).
>>
>> So I think according to current code it will not be moved. What
>> incorrectness do you mean?
>
> If the block isn't marked as predicated the question is whether it is
> handled correctly or assumed to be unconditionally executed.
>
>>> I note that collecting predicates has similarities to what if-conversion
>>> does in tree-ifcvt.c (even if its implementation is completely different,
>>> of course).
>>>
>>
>> Ok, I'll look at this. But could you please clarify your point?
>> (Should I just take this into account with low priority and look at
>> this later or you want some refactoring?)
>
> I just noted similar code exists elsewhere - it may be possible to
> factor it out but I didn't investigate.  And no, doing that isn't a prerequesite
> for this patch.
>
>>>> 1.2   As 1.1 goes we identify whether some bb is predicated by some
>>>> condition or not.
>>>>
>>>> bb->aux->type will be [TRUE/FALSE]_TARGET_OF and
>>>> bb->aux->cond_stmt=cond stmt (the nearest condition).
>>>>
>>>> If bb is always executed bb->aux->type = ALWAYS_EXECUTED_IN,
>>>> bb->loop->loop (this info was available in the clean build).
>>>>
>>>> 1.3   As this walker is called in dominance order, information about
>>>> condition is available when invariantness_dom_walker is called for
>>>> given bb.  So we can make some computations based on bb->aux
>>>> structure. This is done in check_conditionally_executed. The main goal
>>>> of this method is to check that the root condition is always executed
>>>> in the loop. I did so to avoid situation like this
>>>>
>>>> Loop:
>>>>
>>>>    Jmp somewhere;
>>>>
>>>>   If (cond1)
>>>>
>>>>       If (cond2)
>>>>
>>>>          Etc
>>>>
>>>> By âroot conditionâ I mean cond1 in this case (the first cond in
>>>> dominance order).
>>>
>>> Ok, but this can't happen (an uncontional jump to somehwere).  It
>>> will always be a conditional jump (a loop exit) and thus should
>>> be handled already.
>>>
>>> Did you have a testcase that was mishandled?
>>
>> No, I have not such testcase. This is because I;m newbie at gcc and
>> compilers. If you tell me this fact, let's just remove this check?
>> Correct?
>
> Yes.
>
>>>
>>>> If  âroot conditionâ is not always executed we disable (free) all the
>>>> info in bb->aux, ie all the blocks becomes neither conditionally nor
>>>> always executed according to bb->aux.
>>>>
>>>> There is some optimization to donât go each time trough the conditions
>>>> chain (cond2->cond1), let me skip such details for now.
>>>>
>>>>
>>>>
>>>> 1.4   Then we calculate tgt_level  (which is used in move_computations later)
>>>>
>>>> The patch is the same for phi and regular stmt (calculate_tgt_level)
>>>>
>>>> 1)      If stmt is conditionally executed we compute max movement
>>>> (determine_max_movement) starting from
>>>> get_lim_data(condition)->max_loop.
>>>>
>>>> 2)      If stmt is not cond executed as start level  for
>>>> determine_max_movement computations we choose ALWAYS_EXECUTED_IN.
>>>>
>>>> To clarify why, see the comment
>>>>
>>>> /* The reason why we don't set start_level for MOVE_POSSIBLE
>>>>
>>>>           statements to more outer level is: this statement could depend on
>>>>
>>>>           conditional statements and even probably is not defined without this
>>>>
>>>>           condition. So either we should analyze these ones and move
>>>>
>>>>           under condition somehow or keep more strong start_level . */
>>>>
>>>> As you noted in your review there was some refactoring. Of course, I
>>>> had no goal to refactor existing code, I intended to remove some
>>>> duplication which I caused by my changes.  I hope we discuss in
>>>> details later if you donât like some refactoring.
>>>
>>> Refactoring makes a big patch harder to review because it ends up
>>> containing no-op changes.  It also makes it appear bigger than it
>>> really is ;)
>>>
>>> A reasonable solution is to factor out the refactoring to a separate patch.
>>>
>> I see what you mean.
>>
>>>> 2.       store_motion  - for some stmts set flags to ignore predicated
>>>> execution (I mean to move statements without conditions).
>>>
>>> That's for the existing "predicated" support as far as I can see.
>>>
>>>>
>>>>
>>>> 3.       move_computations. The patch is doing something in the
>>>> following 3 calls inside of this method.
>>>>
>>>>
>>>>
>>>> 3.1   (more details below) walker.walk â move computations(create if()
>>>> structure) and store information to create phi nodes later (which
>>>> statements require phi nodes, as they conditionally executed and there
>>>> are still their uses on other levels, what are the names for such phi
>>>> nodes, as we replace uses in here, in walker.walk.)
>>>
>>> What kind of uses are that?  By definition all stmts executed
>>> under condition A that are used in code not under condition A have
>>> a PHI node associated.  This is why the existing conditional movement
>>> handling moves PHI nodes (and all dependent computations).
>>>
>>> Are you thinking of
>>>
>>>   for (;;)
>>>     if (a) x = 1;
>>>
>>>   .. use of x
>>>
>>> ?
>> No, I think in this case phi node of x will be used in initial
>> (==loopinit) code (in 'use of x') instead of some version of x (which
>> I move).
>>
>>>If you move the PHI node for x then you don't need to insert any other
>>> PHI nodes (and you _do_ have to move the PHI node anyway).
>>
>>>
>>>> 3.2   (more details below) create_phi_nodes â create phi nodes for
>>>> statements which require that (see 3.1)
>>>
>>> Only PHI nodes need PHI node creation and only its dependences
>>> might need predication.
>>>
>>> That is, if you move a conditionally executed statement but not
>>> any PHI node it feeds then the movement is useless.
>>>
>>> But maybe I am missing something about the PHI business.
>>>
>> Ok. I have different understanding on this (and am afraid of this of
>> course :) ).
>>
>> See the example:
>> void bar (long);
>> void foo (int cond1, int cond2, int cond3, int m, int n)
>> {
>>   unsigned x;
>>   unsigned inv1;
>>
>>   for (x = 0; x < 1000; ++x)
>>     {
>>       if (cond1)
>>         {
>>           int tmp_inv1 = m*n;
>>           non_inv1 = tmp_inv1 + x;
>>         }
>>     }
>>   bar (non_inv1);
>> }
>>
>> This is working example, will attach loopinit and lim1 dumps.
>> There is no phi node for tmp_inv1 in loopinitat all. However, I think
>> we should move tmp_inv1. To use tmp_inv1 in non_inv1 we need phi node
>> (otherwise we get verification failure as bb with conditionally moved
>> tmp_inv1 doesn;t dominate bb with non_inv1). Another example if
>> non_inv1 is invariant but has not enough cost or something else. In
>> short, if it's still used in initial or others loops. Say, non_inv1 is
>> invariant and moved in another loop. Or something else.
>>
>> So, when I read
>> 'Only PHI nodes need PHI node creation'
>> and
>> 'That is, if you move a conditionally executed statement but not any
>> PHI node it feeds then the movement is useless.'
>> I think about this example and don't understand. Could you clarify?
>
> Ok, so this case is already handled by LIM by simply moving tmp_inv1 = m*n
> (and making it unconditionally execute before the loop).  In fact whether
> we can move it does not depend on whether 'cond' is invariant or not.
>
> If we couldn't execute m*n unconditionally on the loop preheader edge
> then we'd have to be able to also move the guarding condition.  Note
> that cost considerations shouldn't come into play here.
>
> So I was only thinking of the case where we also want to move the
> condition (which is wanted if we want to move a PHI node).
>

Don't understand "So I was only thinking of the case where we also
want to move the condition (which is wanted if we want to move a PHI
node)."

What about the case when we want to move condition but don't have phi
node to move.

Let's consider a little bit modified example:

void bar (long);
void foo (int cond1, int cond2, int cond3, int m, int n)
  {
     unsigned x;
     unsigned inv1;

     for (x = 0; x < 1000; ++x)
       {
         if (n)
           {
             int tmp_inv1 = m/n;
             non_inv1 = tmp_inv1 + x;
            }
       }
     bar (non_inv1);
 }
We couldn't  move tmp_inv1 uncoditionally. Right (I have no clean
build at hands right now to check. If I'm wrong let's replace m/n to
something that couldn't be unconditionally moved :))? There is no phi
node for tmp_inv1.
Could you clarify what conditional lim should do in your opinion here?

Is it too little gain to move statements that only used in the same bb
or branch (I mean before phi node creation, even if stmt had phi
node)?

In fact, we could have here 2 thousands of divisions:
if (n)
  {
    int tmp_inv1 = m/n;
    tmp_inv2 = something_which_couldn't_be moved_unconditionally (tmp_inv1);
    ...
    tmp_invN = something_which_couldn't_be moved_unconditionally (tmp_invN-1);
    non_inv1 = tmp_invN + x;
  }


>>>>
>>>> 3.3   replace_uses_in_conditions
>>>>
>>>> After computations movements we can have several copies of the cond
>>>> stmt. In 3.1 we do replacement of uses based on stmtâs tgt_level. For,
>>>> cond stmt it doesnât work. As cond stmt, of course, have something in
>>>> lim_aux_data->tgt_level, but, in fact, they are moved only if
>>>> corresponding invariants are moved. So, in fact, the same cond (copies
>>>> of it, of course) could go to the different levels. So to fix these
>>>> uses, we call replace_uses_in_conditions.
>>>
>>> Hmm, but stmts that a condition depends on always have at least the
>>> target level of the condition statement.  Thus their computation is
>>> available in all other "same" conditon statements that might appear
>>> in more inner loop levels, no?  So isn't this just about performing the
>>> movement in a proper ordering?
>>
>> Let me give your more details, then may be you repeat question in other words.
>> for ()
>> {
>>   if (a)
>>     {
>>        inv1 = some_computations;
>>        if (inv1 < inv2)
>>             {
>>               inv3;
>>               inv4;
>>             }
>>        }
>> }
>>
>> So, the order for movement is: inv1; inv3; inv4. When we move inv1 we
>> have tgt_level for inv3 and inv4 computed but we have not created
>> copies of 'if (inv1 < inv2)' created. In theory, we can have several
>> copies:(1) for tgt_level of inv3 (2) for tgt_level of inv2 (3) in
>> initial cycle. if (1) is in tgt level of inv1 we need no replacement,
>> in other cases we need replacements. Of course, we know this(all the
>> info about tgt_levels) when we move inv1, but then we need to create
>> all needed copies (store it somehow, etc) of 'if (inv1 < inv2') and
>> make replacement in these copies. I don't see now why this is much
>> better. This doesn't mean that current solution is perfect, just
>> clarify your thought.
>
> I still don't get what "replacements" actually mean.  Let's extend the
> example to
>
> for ()  (A)
>   {
>> for ()  (B)
>> {
>>   if (a)
>>     {
>>        inv1 = some_computations;
>>        if (inv1 < inv2)
>>             {
>>               inv3;
>>               inv4;
>>             }
>>        }
>> }
>  }
>
> then you say the target level of inv1 could be 'A' and the target
> level of inv3 'B'
> and that of inv4 is 'B' as well.
>
> That's true.  So we move inv1 to before 'A' and inv3/inv4 to before 'B' guarded
> with if (inv1 < inv2).  I fail to see what "copies" we need here.  'inv1' is
> available in all target levels below its own, no copies needed.
>

I meant the copies of conditions (COND_STMT: inv1 < inv2).

Let's modify this example: tgt_level (inv1) = A; tgt_level(inv3) = A;
tgt_level(inv4) = B.

 Then
if (a)
   inv1=some_computations;
if (inv1 < inv2)
   inv3;
<preheader edge for A, where phi for inv1(phi_for_inv1) is computed>
for ()  (A)
 {
 if (phi_for_inv1 <inv2)
   inv3
 for ()  (B)
   {
   if (a)
     {
        if (phi_for_inv1 < inv2)
          {
          }
     }
 }
So, we have 3 copies of inv1 < inv2 after lim (in some of them we use
inv1, in other phi_for_inv1). But this is the second priority for now
(this is all about whether we should move statements w/o phi node).

>>>
>>>> More details on 3.1 and 3.2
>>>>
>>>> 3.1 The patch is very similar for phi stmts and regular stmts (there
>>>> is some difference for regular stmts related to the case when we move
>>>> sequence instead of single stmt, letâs skip it in this description,
>>>> will describe later. BTW, there are comments, in corresponding parts).
>>>>
>>>> a) for each bb we call check_conditionally_executed and if bb is cond
>>>> executed then invariant statement  will be moved conditionally.
>>>>
>>>> b) prepare_edge_for_stmt collects information needed to create phi
>>>> nodes in 3.2(add_info_for_phi) AND prepares edge ie creates if()
>>>> structure needed to move statement
>>>> (get_block_for_predicated_statement, see below). All of this is done
>>>> for cond executed stmt. Nothing is done for non cond executed.
>>>>
>>>> BTW, there is some strange names for functions like missed ending
>>>> (prepare_edge_for_stm, w/o t in the end, this is because of my script
>>>> to add â â before â(â, will fix in the next patch version, ignore
>>>> please so far).
>>>>
>>>> Get_block_for_predicated_stmt:
>>>>
>>>> Create bb where to add cond stmt to (or return existing, each loop has
>>>> a map  (level_cond_data_map is map <loop, cond_data_map> ,
>>>> cond_data_map is map <gimple, basic_block>) to identify if bb for
>>>> given cond was created or not for given level).
>>>>
>>>> Parameters of this method:
>>>>
>>>> Cond â condition of stmt, branch â is it on true or false edge of
>>>> cond, cond_map  - see above, level (tgt_level), preheader â
>>>> preheader_edge for level.
>>>>
>>>> So first of all we check whether there is a basic block for given
>>>> condition in given loop.
>>>>
>>>> a)      If such bb exists then we trying to find corresponding edge,
>>>> destination of this edge shouldnât be a post dominator of bb
>>>> (find_branch_edge).
>>>>
>>>> a.1) if such edge exists we return the most remote in this branch bb
>>>> (or create it), see get_most_remote_in_branch;
>>>>
>>>> a.2) if such edge doesnât exists, we create corresponding bb, see make_branch
>>>>
>>>>        b)  If such bb doesnât exist we create it (recursively calling
>>>> get_block_for_predicated_stmt if required bb is conditionally
>>>> executed).
>>>>
>>>> Will not describe it further so far (let me know if you would like to
>>>> read additional details).
>>>>
>>>> Add_info_for_phi
>>>>
>>>> The main goal of this method is to identify situation where
>>>> stmt(parameter,ie the statement which we are moving) or more exactly
>>>> lhs of this statement
>>>>
>>>> a)      is still used in other loops, so we need to create phi node to
>>>> use phi name inside other loops.
>>>>
>>>> b)      Is used in phi node of original loop and this phi node is
>>>> going to be moved. BUT, as we move phi node as assignment (in the case
>>>> of phi node with 2 arguments) we need to create phi node to use lhs of
>>>> statement (param of add_info_for_phi).
>>>>
>>>> In these cases we make corresponding replacements and store
>>>> information needed to create phi nodes and make some other
>>>> replacements in 3.3 (replace_uses_in_conditions).
>>>>
>>>> Add_info_for_phi calls create_tmp_var which requires some explanation.
>>>>
>>>> Create_tmp_var
>>>>
>>>> To create new names for phi nodes and to create default def for phi
>>>> arguments I use make_ssa_name and get_or_create_ssa_default_def. These
>>>> methods have some asserts (being merged):   (TREE_CODE (old_var) ==
>>>> VAR_DECL   || TREE_CODE (old_var) == PARM_DECL   || TREE_CODE
>>>> (old_var) == RESULT_DECL).
>>>>
>>>> So in some cases (when I know that I need to use methods mentioned
>>>> above, see the start of create_tmp_var, ie the uses in other loops) I
>>>> create new tmp variable to overcome these asserts. To be honest I
>>>> donât like this but donât know how to deal with this better.
>>>
>>> Hmm, don't you just run into SSA names here?  That is, TREE_CODE
>>> (old_var) == SSA_NAME?
>>> You can use make_ssa_name (TREE_TYPE (old_var), "plim") in this case.
>>>
>>> Or you run into the issue that anonymous SSA names cannot have default
>>> defintions (I don't understand why you need all the fuss about default defs).
>>
>> I used default defs to write something in phi args when I have no values on
>> corresponding branches. Say there is some tmp value which I should
>> create phi for.
>> if (a)
>>    {
>>      tmpvar=something
>>      b = something(tmpvar);
>>     }
>> else
>>    {
>>    }
>> When I move if (a) {tmpvar = something} and create phi for tmpvar (to
>> use phi result
>> in the initial cycle) i need to write something for 'not a' phi arg.
>> BUT, in some cases I use
>>  build_zero_cst (which I started to use after default defs use. I mean
>> I didn't know about
>>  build_zero_cst while I started to use default defs.). So may be I can replace
>> uses of default defs in such cases by build_zero_cst? I mean just SOME values
>> (as I know that control flow will never go to this places actually). I
>> mean if !a tmpvar will not used.
>
> Ok, this is again about the case where you are not moving a PHI node but
> creating it because you want to conditionally execute invariant stmts.
> Yes, a default def is ok here, but you can always use a new VAR_DECL
> just for the default def creation like gimple_fold_call does for example:
>
>                           tree var = create_tmp_var (TREE_TYPE (lhs));
>                           tree def = get_or_create_ssa_default_def (cfun, var);
>
> Note that I think we don't need to bother about this case as we'd only
> want to move PHI nodes in the first place (so you already have values
> for all PHI args).
>
>>>
>>>> And a couple of words regarding where we store info for phi nodes:
>>>>
>>>> Each loop contains in its aux  phi_data structure. On this stage we
>>>> only add there stmt (if phi node will be required), phi name=phi node
>>>> result name in names_map or fl_names_map). See the comment about
>>>> phi_data in patch.
>>>>
>>>> If there should be created several phi nodes for given lhs (I mean the
>>>> first place for phi node doesnât dominate on corresponding preheader,
>>>> we add only the last name in names_map (as intermediate names could be
>>>> created later, but the last name in bb which dominates preheader
>>>> should be used for replacement in other loops. Replacements were
>>>> already made in walker).
>>>>
>>>> If lhs is used only in phi node, which are moved and transformed into
>>>> assignment, and there is no uses in other loops, we need to create
>>>> only first level phi node (donât need to create phi nodes till some bb
>>>> which dominates preheader), then we add such name to fl_names_map.
>>>>
>>>> 3.2 Create_phi_nodes
>>>>
>>>> For each of the loops we go over ((phi_data*) loop->aux)->stmts. These
>>>> are statements which were moved, but they have uses in the original
>>>> loop (more exactly, their uses in some other loops replaced by some
>>>> name from ((phi_data*) loop->aux)->names_map or ((phi_data*)
>>>> loop->aux)->fl_names_map.
>>>>
>>>> So for each of such stmts we create phi nodes (for its lhs) chain.
>>>>
>>>> Basic block for phi node is found in get_bb_for_phi  for given bb with
>>>> stmt. If basic block for phi node dominates preheader->src we end
>>>> chain creation. There is some special condition for the case when we
>>>> need to create only first level phi node. (I described this situation
>>>> above, but let me know If I need to add  more details.
>>>>
>>>> Basic blocks can have maps to identify if we created phi node for
>>>> given variable in given basic_block, so we only add an edge argument
>>>> for such case (phi_map = new hash_map<basic_block, hash_map<tree,
>>>> gphi*>*>).
>>>
>>> Ok.
>>>
>>> Well - I think it might be easier code-generation-wise to have a separate
>>> phase move_phis executed before move_computations, that just moves
>>> invariant PHI nodes (which this all is about - see above) and their
>>> controlling conditions and only then move their dependencies.
>> I didn;t understand so far (I think let's clarify previous questions,especially
>> about phi nodes then let's go back to this one.)
>
> Yeah, I think the PHI node thing is our major mis-understanding (or the
> main thing that I think complicates the patch for too little gain - at least
> initially).
>
> Thanks,
> Richard.
>
>>>
>>> Thanks,
>>> Richard.
>>>
>>>> In short, thatâs all.
>>>>
>>>> Thanks,
>>>>
>>>> Evgeniya
>>>>
>>>>
>>>>
>>>> On Sat, May 9, 2015 at 1:07 AM, Evgeniya Maenkova
>>>> <evgeniya.maenkova@gmail.com> wrote:
>>>>> Hi,
>>>>>
>>>>> Could you please review my patch for predicated lim?
>>>>>
>>>>> Let me note some details about it:
>>>>>
>>>>>
>>>>>
>>>>> 1)      Phi statements are still moved only if they have 1 or 2
>>>>> arguments. However, phi statements could be move under conditions (as
>>>>> itâs done for the other statements).  Probably, phi statement motion
>>>>> with 3 + arguments could be implemented in the next patch after
>>>>> predicated lim.
>>>>>
>>>>> 2)      Patch has limitations/features like (it was ok to me to
>>>>> implement it such way, maybe Iâm not correct. ):
>>>>>
>>>>> a)      Loop1
>>>>>
>>>>>     {
>>>>>
>>>>>               If (a)
>>>>>
>>>>>                  Loop2
>>>>>
>>>>>                      {
>>>>>
>>>>>                        Stmt - Invariant for Loop1
>>>>>
>>>>>                      }
>>>>>
>>>>>              }
>>>>>
>>>>>        In this case Stmt will be moved only out of Loop2, because of if (a).
>>>>>
>>>>> b)      Or
>>>>>
>>>>> Loop1
>>>>>
>>>>>     {
>>>>>
>>>>>          â
>>>>>
>>>>>          If (cond1)
>>>>>
>>>>>               If (cond2)
>>>>>
>>>>>                   If (cond3)
>>>>>
>>>>>                       Stmt;
>>>>>
>>>>>        }
>>>>>
>>>>> Stmt will be moved out only if cond1 is always executed in Loop1.
>>>>>
>>>>> c)       It took me a long time to write all of these code, so there
>>>>> might be other peculiarities which I forgot to mention. :)
>>>>>
>>>>>                        Letâs discuss these ones as you will review my patch.
>>>>>
>>>>> 3)      Patch consists of 9 files:
>>>>>
>>>>> a)      gcc/testsuite/gcc.dg/tree-ssa/loop-7.c,
>>>>> gcc/testsuite/gcc.dg/tree-ssa/recip-3.c â changed tests:
>>>>>
>>>>> -          gcc/testsuite/gcc.dg/tree-ssa/loop-7.c  changed as
>>>>> predicated lim moves 2 more statements out of the loop;
>>>>>
>>>>> -          gcc/testsuite/gcc.dg/tree-ssa/recip-3.c â with conditional
>>>>> lim recip optimization in this test doesnât work (the corresponding
>>>>> value is below threshold as I could see in the code for recip, 1<3).
>>>>> So to have recip working in this test I changed test a little bit.
>>>>>
>>>>> b)      gcc/tree-ssa-loop-im.c â the patched lim per se
>>>>>
>>>>> c)       gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-13.c,
>>>>> gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-14.c,
>>>>>
>>>>> gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-15.c,
>>>>> gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-16.c,
>>>>>
>>>>> gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-17.c,
>>>>> gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-18.c
>>>>>
>>>>> the tests for conditional lim.
>>>>>
>>>>> 4)      Patch testing:
>>>>>
>>>>> a)      make âk check (no difference in results for me for the clean
>>>>> build and the patched one,
>>>>>
>>>>> -          Revision: 222849,
>>>>>
>>>>> -          uname -a
>>>>>
>>>>>                Linux Istanbul 3.16.0-23-generic #31-Ubuntu SMP Tue Oct
>>>>> 21 18:00:35 UTC 2014 i686 i686 i686 GNU/Linux
>>>>>
>>>>> b)      Bootstrap.
>>>>>
>>>>>  It goes well now, however to fix it I have made a temporary hack in
>>>>> the lim code. And with this fix patch definitely shouldnât be
>>>>> committed.
>>>>>
>>>>> I did so, as I would like to discuss this issue first.
>>>>>
>>>>> The problem is: I got stage2-stage3 comparison failure on the single
>>>>> file (tree-vect-data-refs.o). After some investigation I understood
>>>>> that tree-vect-data-refs.o differs being compiled with and without
>>>>> â-gâ option (yes, more exactly on stage 2 this is  â-g âO2 âgtoggleâ,
>>>>> and for stage 3 this is â-g âO2â. But to simplify things I can
>>>>> reproduce this difference on the same build (even not bootstrapped),
>>>>> with option â âgâ and without it).
>>>>>
>>>>> Of course, the file compiled with âg option will contain debug
>>>>> information and will differ from the corresponding file without debug
>>>>> information. I mean there is the difference reported by
>>>>> contrib/compare-debug (I mean we talk about stripped files).
>>>>>
>>>>> Then I compared assemblers and lim dump files and made a conclusion
>>>>> the difference is:
>>>>>
>>>>> There is statement _484=something, which is conditionally moved out of
>>>>> loop X. In non debug cases no usages of _484 are left inside the loop
>>>>> X. In debug case, there is the single use in debug statement. So for
>>>>> debug case my patch generates phi statement for _484 to make it
>>>>> available inside the loop X. For non debug case, no such phi statement
>>>>> generated as there is no uses of _484.
>>>>>
>>>>> There is one more statement with lhs=_493, with exactly this pattern
>>>>> of use. But this is not important for our discussion.
>>>>>
>>>>>
>>>>>
>>>>> So, in my opinion itâs ok to generate additional phi node for debug
>>>>> case. But Iâm not a compiler expert and maybe there is some
>>>>> requirement that debug and non-debug versions should differ only by
>>>>> debug statements, I donât know.
>>>>>
>>>>>
>>>>>
>>>>> My temporary hack fix is just removing of such debug statements (no
>>>>> debug statements, no problems J).
>>>>>
>>>>>
>>>>>
>>>>> The alternatives I see are:
>>>>>
>>>>> -          Move debug statements outside the loop (not good in my opinion);
>>>>>
>>>>> -          Somehow reset values in debug statements (not good in my
>>>>> opinion, if I could provide correct information for them);
>>>>>
>>>>> -          Generate phi for debug statements and fix bootstrap (say,
>>>>> letâs have some list of files, which we have special check for. I mean
>>>>> for my case, the difference is not in stage2 and stage3, itâs in debug
>>>>> and non-debug). I like this variant. However, as I donât see such list
>>>>> of special files in the whole gcc, I think I might be missing
>>>>> something.
>>>>>
>>>>> What do you think?
>>>>>
>>>>>
>>>>>
>>>>> Thanks,
>>>>>
>>>>>
>>>>>
>>>>> Evgeniya



-- 
Thanks,

Evgeniya

perfstories.wordpress.com
References:
- Re: conditional lim
  - From: Richard Biener
- Re: conditional lim
  - From: Evgeniya Maenkova
- Re: conditional lim
  - From: Richard Biener
Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]