[PATCH PR62178]Improve candidate selecting in IVOPT, 2nd try.

Wed Dec 10 06:10:00 GMT 2014

On Wed, Dec 10, 2014 at 6:58 AM, Jeff Law <law@redhat.com> wrote:
> On 12/05/14 05:15, Bin Cheng wrote:
>>
>> Hi,
>> Though PR62178 is hidden by recent cost change in aarch64 backend, the
>> ivopt
>> issue still exists.
>>
>> Current candidate selecting algorithm tends to select fewer candidates
>> given
>> below reasons:
>>    1) to better handle loops with many induction uses but the best choice
>> is
>> one generic basic induction variable;
>>    2) to keep compilation time low.
>>
>> One fundamental weakness of the strategy is the opposite situation can't
>> be
>> handled properly sometimes.  For these cases the best choice is each
>> induction variable has its own candidate.
>> This patch fixes the problem by shuffling candidate set after fix-point is
>> reached by current implementation.  The reason why this strategy works is
>> it
>> replaces candidate set by selecting local optimal candidate for some
>> induction uses, and the new candidate set (has lower cost) is exact what
>> we
>> want in the mentioned case.  Instrumentation data shows this can find
>> better
>> candidates set for ~6% loops in spec2006 on x86_64, and ~4% on aarch64.
>>
>> This patch actually is extension to the first version patch posted at
>> https://gcc.gnu.org/ml/gcc-patches/2014-09/msg02620.html, that only adds
>> another selecting pass with special seed set (more or less like the
>> shuffled
>> set in this patch).  Data also confirms this patch can find optimal sets
>> for
>> most loops found by the first one, as well as optimal sets for many new
>> loops.
>>
>> Bootstrap and test on x86_64, no regression on benchmarks.  Bootstrap and
>> test on aarch64.
>> Since this patch only selects candidate set with lower cost, any
>> regressions
>> revealed are latent bugs of other components in GCC.
>> I also collected GCC bootstrap time on x86_64, no regression either.
>> Is this OK?
>>
>> 2014-12-03  Bin Chengbin.cheng@arm.com
>>
>>    PR tree-optimization/62178
>>    * tree-ssa-loop-ivopts.c (iv_ca_replace): New function.
>>    (try_improve_iv_set): Shuffle candidates set in order to handle
>>    case in which candidate wrto each iv use should be selected.
>>
>> gcc/testsuite/ChangeLog
>> 2014-12-03  Bin Chengbin.cheng@arm.com
>>
>>    PR tree-optimization/62178
>>    * gcc.target/aarch64/pr62178.c: New test.
>>
>>
>> pr62178-20141202.txt
>>
>>
>> Index: gcc/tree-ssa-loop-ivopts.c
>> ===================================================================
>> --- gcc/tree-ssa-loop-ivopts.c  (revision 217828)
>> +++ gcc/tree-ssa-loop-ivopts.c  (working copy)
>> @@ -5718,6 +5718,85 @@ iv_ca_extend (struct ivopts_data *data, struct iv_
>>     return cost;
>>   }
>>
>> +/* Try replacing candidates in IVS which are recorded by list ACT_DELTA
>> to
>> +   lower cost candidates.  CAND is the one won't be replaced.
>> Replacement
>> +   of candidate is recorded in list DELTA.  */

Thanks for the review.

>
> Is this better written as:
>
> Try replacing candidates in IVS with IVs related to CAND (which is not
> changed) if doing so lowers the IV cost.  ACT_DELTA is the recorded
> list of candidates.  Replacement of candidate is recorded in list DELTA.
>
> ?

Will refine that.

>
>
>> +      if (data->consider_all_candidates)
>> +       {
>> +         for (j = 0; j < n_iv_cands (data); j++)
>> +           {
>> +             if (j == old_cp->cand->id)
>> +               continue;
>> +
>> +             cnd = iv_cand (data, j);
>> +             cp = get_use_iv_cost (data, use, cnd);
>> +             if (!cp)
>> +               continue;
>> +
>> +             if (best_cp == NULL || cheaper_cost_pair (cp, best_cp))
>> +               best_cp = cp;
>> +           }
>> +       }
>> +      else
>> +       {
>> +         EXECUTE_IF_SET_IN_BITMAP (use->related_cands, 0, j, bi)
>> +           {
>> +             if (j == old_cp->cand->id)
>> +               continue;
>> +
>> +             cnd = iv_cand (data, j);
>> +             cp = get_use_iv_cost (data, use, cnd);
>> +             if (!cp)
>> +               continue;
>> +
>> +             if (best_cp == NULL || cheaper_cost_pair (cp, best_cp))
>> +               best_cp = cp;
>> +           }
>> +       }
>
> The loop bodies here are duplicated.  Can you factor them into a function so
> that this looks something like
>
> if (data->consider_all_candidates)
>   {
>     for (...)
>       refactored_code (some arguments)
>   }
> else
>   {
>      EXECUTE_IF_SET_IN_BITMAP (...)
>        refactored_code (some arguments)
>

Will do that.

>> @@ -6042,8 +6121,50 @@ try_improve_iv_set (struct ivopts_data *data, stru
>>         /* Try removing the candidates from the set instead.  */
>>         best_cost = iv_ca_prune (data, ivs, NULL, &best_delta);
>>
>> -      /* Nothing more we can do.  */
>>         if (!best_delta)
>> +       {
>> +         /* So far candidate selecting algorithm tends to choose fewer
>> IVs
>> +            so that it can handle cases in which loops have many
>> variables
>> +            but the best choice is often to use only one general biv.
>> One
>> +            weakness is it can't handle opposite cases, in which
>> different
>> +            candidates should be chosen with respect to each use.  To
>> solve
>> +            the problem, we replace candidate of some uses with lower
>> cost
>> +            one, thus general algorithm can have a chance to find optimal
>> +            set for these cases.  */
>
> So, in essence we've computed a best cost with "minimal" IVs and you're
> using that result as an initial state for expanding the IV set.  You expand
> on a candidate by candidate basis if and only if the estimated cost lowers.
> Right?
Yes.

>
> At each expansion step you keep a single candidate fixed and you try to
> derive other IVs from that fixed IV if doing so lowers the cost?  Right?
More or less, it like below process:
Candidates set IVS (fixed point by current algorithm) --> Try extend
IVS with a new one CAND if it has lower local cost for any uses(There
will be a candidate set IVS', candidates in this subset of IVS are
replaced with CAND for some uses) --> Replace rest candidates in IVS
if they are in the set of IVS' with any other candidates having lower
local cost)  --> Prune the new set.

The first/last steps are actually from current implementation, I added
the middle step.  Generally yes, it expands fixed point got by current
algorithm to see if there is lower cost choice.  Maybe the comment
wasn't clear enough, I will see how to refine it.

>
>
> That's what it looks like to me when reading the code/comments.  Just want
> to make sure it's working the way I think it is.
>
> FWIW, if I read the slides correctly, GCC's IV code was called out as being
> one of the reasons GCC is generating better code for AArch64 than LLVM at a
> recent LLVM conference.  Glad to see that we're called out in that way and
> that we continue to try to improve in that space as well.
>
> Jeff

I don't know what to say...  IVOPT doesn't work very well on aarch64
and I have a not short todo list for it.  Of course some of them are
target indenpendent.

Thanks,
bin