RFA: Fix PR38785

Fri Mar 6 16:58:00 GMT 2009

On Fri, Mar 6, 2009 at 12:30 AM,  <amylaar@spamcop.net> wrote:
> Quoting Daniel Berlin <dberlin@dberlin.org>:
>
>> Uh
>>
>> +             /* If optimizing for size, insert at most one
>> +                new expression to avoid increasing code size.  */
>> +             if (optimize_function_for_speed_p (cfun)
>> +                 ? (1 || ppre_n_insert_for_speed_p (expr, block,
>> inserts_needed))
>> +                 : EDGE_COUNT (block->preds) - inserts_needed == 1)
>> +               new_stuff |=
>> +                 insert_into_preds_of_block (block,
>> +                                             get_expression_id (expr),
>> +                                             avail);
>>
>> 1 || ?
>
> IIRC this was the place where I initially thought I would have to throttle
> the transformation, from my understanding of what Steven Bosscher
> suggested.  But it didn't really help.  Then I found this other
> place in do_partial_partial_insertion where I had to apply the throttle
> to avoid the exponential explosion.
> That helped quite a lot, and bitmnp with this throttled partial-pre was
> faster even than with partial-pre completely turned off.
> Then I slipped in this "1 ||" before my original call site of
> ppre_n_insert_for_speed_p and asked my collegue to also run an EEMBC run
> for that version, and it turned out that there was no difference in
> performance for any of the EEMBC benchmarks within the accuracy of the
> measurements.  So I left this test disabled.  It still sort-of documents
> what the original idea for the input values for that heuristic was.
> It would be interesting if someone could find out if that site is actually
> a useful place for this or another heuristic.
>
>> BTW, I find it mildly humorous that RTL level PRE, if improved, would
>> do exactly what i separated out into an option as partial-partial PRE.
>> (because RTL level PRE is a complete PRE).
>
> I don't think that is the case.
It's provable.
What is called "partial partial PRE" is simply partial availablity +
partial anticipation.
LCM, which RTL PRE uses, does this by default, there is no way to turn
it off or remove it from the calculations.
Since RTL PRE has no cost metric at all, if the scanning was improved
to actually work sanely, it would indeed perform exactly the
optimizations you are turning off here.

--Dan