[PATCH PR62178]Improve candidate selecting in IVOPT, 2nd try.

Thu Dec 11 09:56:00 GMT 2014

On Wed, Dec 10, 2014 at 9:47 PM, Richard Biener
<richard.guenther@gmail.com> wrote:
> On Fri, Dec 5, 2014 at 1:15 PM, Bin Cheng <bin.cheng@arm.com> wrote:
>> Hi,
>> Though PR62178 is hidden by recent cost change in aarch64 backend, the ivopt
>> issue still exists.
>>
>> Current candidate selecting algorithm tends to select fewer candidates given
>> below reasons:
>>   1) to better handle loops with many induction uses but the best choice is
>> one generic basic induction variable;
>>   2) to keep compilation time low.
>>
>> One fundamental weakness of the strategy is the opposite situation can't be
>> handled properly sometimes.  For these cases the best choice is each
>> induction variable has its own candidate.
>> This patch fixes the problem by shuffling candidate set after fix-point is
>> reached by current implementation.  The reason why this strategy works is it
>> replaces candidate set by selecting local optimal candidate for some
>> induction uses, and the new candidate set (has lower cost) is exact what we
>> want in the mentioned case.  Instrumentation data shows this can find better
>> candidates set for ~6% loops in spec2006 on x86_64, and ~4% on aarch64.
>>
>> This patch actually is extension to the first version patch posted at
>> https://gcc.gnu.org/ml/gcc-patches/2014-09/msg02620.html, that only adds
>> another selecting pass with special seed set (more or less like the shuffled
>> set in this patch).  Data also confirms this patch can find optimal sets for
>> most loops found by the first one, as well as optimal sets for many new
>> loops.
>>
>> Bootstrap and test on x86_64, no regression on benchmarks.  Bootstrap and
>> test on aarch64.
>> Since this patch only selects candidate set with lower cost, any regressions
>> revealed are latent bugs of other components in GCC.
>> I also collected GCC bootstrap time on x86_64, no regression either.
>> Is this OK?
>
> The algorithm seems to be quadratic in the number of IV candidates
> (at least):
Yes, I worried about that too, that's why I measured the bootstrap
time.  One way is restrict this procedure one time for each loop.  I
already tried that and it can capture +90% loops.  Is this sounds
reasonable?

BTW, do we have some compilation time benchmarks for GCC?

Thanks,
bin
>
> +         for (i = 0; i < n_iv_cands (data); i++)
> +           {
> ...
> +             iv_ca_replace (data, ivs, cand, act_delta, &tmp_delta);
> ...
>
> and
>
> +static void
> +iv_ca_replace (struct ivopts_data *data, struct iv_ca *ivs,
> +              struct iv_cand *cand, struct iv_ca_delta *act_delta,
> +              struct iv_ca_delta **delta)
> +{
> ...
> +  for (i = 0; i < ivs->upto; i++)
> +    {
> ...
> +      if (data->consider_all_candidates)
> +       {
> +         for (j = 0; j < n_iv_cands (data); j++)
> +           {
>
> possibly cubic if ivs->upto is of similar value.
>
> I wonder if it is possible to restrict this to the single IV with
> the largest delta?  After all we are iterating try_improve_iv_set.
> Alternatively move the handling out of iteration completey,
> thus into the caller of try_improve_iv_set?
>
> Note that compile-time issues always arise in auto-generated code,
> not during GCC bootstrap.
>
> Richard.
>
>
>> 2014-12-03  Bin Cheng  bin.cheng@arm.com
>>
>>   PR tree-optimization/62178
>>   * tree-ssa-loop-ivopts.c (iv_ca_replace): New function.
>>   (try_improve_iv_set): Shuffle candidates set in order to handle
>>   case in which candidate wrto each iv use should be selected.
>>
>> gcc/testsuite/ChangeLog
>> 2014-12-03  Bin Cheng  bin.cheng@arm.com
>>
>>   PR tree-optimization/62178
>>   * gcc.target/aarch64/pr62178.c: New test.