[patch] Lno branch merge part 12 -- induction variable optimizations

Sun Aug 29 09:07:00 GMT 2004

On Aug 28, 2004, at 1:58 PM, Zdenek Dvorak wrote:

> Hello,
>
>>> + /* Finds iterator for STMT.  */
>>> +
>>> + extern block_stmt_iterator
>>> + stmt_bsi (tree stmt)
>>> + {
>>> +   block_stmt_iterator bsi;
>>> +
>>> +   for (bsi = bsi_start (bb_for_stmt (stmt)); !bsi_end_p (bsi);
>>> bsi_next (&bsi))
>>> +     if (bsi_stmt (bsi) == stmt)
>>> +       return bsi;
>>> +
>>> +   abort ();
>>> + }
>>
>> I decided to not use this approach because of performance cost. I used
>> two alternatives instead, 1) supply bsi as parameter 2) use bitmap.
>
> 2) does not make any sense to me, with respect to what stmt_bsi does,
> could you please explain?

I use bitmap to mark statements that are candidate for removal. And 
then do one walk of statements to remove marked statements.

>
> 1) does not apply in some cases, especially if we obtained the 
> statement
> as a statement that defines a given ssa name.
>
>> If you think, performance implications are not considerable
>
> The performance definitely might be a problem, eventually.  I haven't
> seen the problems occur in practice yet, but I think that we should 
> have
> a way how to obtain a bsi for statement in a constant time (but this is
> something for a separate patch).
>
>> than make remove_statement() also externally visible :-).
>
> Why?

see above.

>  tree-ssa-loop-ivopts.c:remove_statement is a function that is
> fairly specific for what ivopts need, especially handling of ssa names
> in phi nodes is specifically tailored for it.  I.e. I do not think it
> is a good idea for other optimizers to use it.
>
>> Some of the new functions, e.g. force_gimple_operand(),
>> add_candidate_1(), get_address_cost(), does not have any comments
>> inside at all. And they are not small. Comments would be helpful to
>> someone down the road after few months.
>
> They are not small, but fairly straightforward (except for
> get_address_cost, which could definititely use some comments);
> it is hard to come up with comments that are not obvious
> in force_gimple_operand and add_candidate_1.
>
>>> + /* Per-ssa version information (induction variable descriptions,
>>> etc.).  */
>>> + struct version_info
>>> + {
>>
>> use prefix iv, iv_version_info ?
>
> Why? It would just make names longer.

other passes may want to use loop_data, version_info etc.. Using prefix 
makes it cleaner.

>
>>> + /* Information attached to loop.  */
>>> + struct loop_data
>>
>> same. few other structs also.
>>
>> You are already use iv prefix for some of them, e.g. iv_use, iv_cand
>> etc..
>
> Yes, for those that have "induction variable" as part of name; e.g.
> "induction variable use" and "induction variable candidate".

That's my point. Replacing 'loop_data' with 'iv_loop_data' or 
'ivopts_loop_data' is more meaningful.

>
>>> + /* Entry in a hashtable of already known costs for multiplication.
>>> */
>>> + struct mbc_entry
>>> + {
>>
>> Why not put all the data structures at the beginning?
>
> Why yes? This structure is for internal use of multiply_by_cost,
> so putting it somewhere else would just make things harder to read.

I think, someone asked me to do so. I do not know if there is a 
preferred way or not. If there isn't then it is up to you. I'd prefer 
to list them all at the beginning.

>>> + /* Returns cost of multiplication by constant CST in MODE.  */
>>> +
>>> + static unsigned
>>> + multiply_by_cost (HOST_WIDE_INT cst, enum machine_mode mode)
>>
>> rename as get_multiplication_cost() ?
>> No, comments explaining how the cost is calculated.
>
> ??? What would you like to write there?

You are intimately familiar with this code so everything is obvious for 
you. It is always better to write something than relay on other person 
to read your mind.

>  We check whether the result
> is not cached, and if it is not, we expand the multiplication to
> rtl and determine the cost.

Even this is useful. 100 seems like a "odd" (:-) number for initial 
size of hash table.

> I cannot think of comment that would not
> be of type "i++ /* Increase i by one.  */".
>> Diego asked me to put small example with input and output to explain
>> transformation at the beginning. I think it is a good idea.
>
> There already is a description of used algorithm in the beginning.
> I may put an example explaining what strength reduction is there,
> but I find it almost offensive to the reader :-)
>
>> Any data on SPEC or any other benchmark?
>
> We tested only whole new LNO branch optimizer (without vectorizer and
> loop interchange optimizations) versus the old rtl one.  Ivopts are the
> major tested component in this comparison.  People from IBM measured
> 2.2% improvement on specint and 4.3% on specfp on ppc.  I made some
> measurements on i686; I no longer can find exact numbers, but overall
> they seemed more or less unchanged (some tests were a bit better, some
> a bit worse).

Thank you,
-
Devang