[PATCH PR62178]Improve candidate selecting in IVOPT, 2nd try.
Bin.Cheng
amker.cheng@gmail.com
Wed Dec 10 06:10:00 GMT 2014
On Wed, Dec 10, 2014 at 6:58 AM, Jeff Law <law@redhat.com> wrote:
> On 12/05/14 05:15, Bin Cheng wrote:
>>
>> Hi,
>> Though PR62178 is hidden by recent cost change in aarch64 backend, the
>> ivopt
>> issue still exists.
>>
>> Current candidate selecting algorithm tends to select fewer candidates
>> given
>> below reasons:
>> 1) to better handle loops with many induction uses but the best choice
>> is
>> one generic basic induction variable;
>> 2) to keep compilation time low.
>>
>> One fundamental weakness of the strategy is the opposite situation can't
>> be
>> handled properly sometimes. For these cases the best choice is each
>> induction variable has its own candidate.
>> This patch fixes the problem by shuffling candidate set after fix-point is
>> reached by current implementation. The reason why this strategy works is
>> it
>> replaces candidate set by selecting local optimal candidate for some
>> induction uses, and the new candidate set (has lower cost) is exact what
>> we
>> want in the mentioned case. Instrumentation data shows this can find
>> better
>> candidates set for ~6% loops in spec2006 on x86_64, and ~4% on aarch64.
>>
>> This patch actually is extension to the first version patch posted at
>> https://gcc.gnu.org/ml/gcc-patches/2014-09/msg02620.html, that only adds
>> another selecting pass with special seed set (more or less like the
>> shuffled
>> set in this patch). Data also confirms this patch can find optimal sets
>> for
>> most loops found by the first one, as well as optimal sets for many new
>> loops.
>>
>> Bootstrap and test on x86_64, no regression on benchmarks. Bootstrap and
>> test on aarch64.
>> Since this patch only selects candidate set with lower cost, any
>> regressions
>> revealed are latent bugs of other components in GCC.
>> I also collected GCC bootstrap time on x86_64, no regression either.
>> Is this OK?
>>
>> 2014-12-03 Bin Chengbin.cheng@arm.com
>>
>> PR tree-optimization/62178
>> * tree-ssa-loop-ivopts.c (iv_ca_replace): New function.
>> (try_improve_iv_set): Shuffle candidates set in order to handle
>> case in which candidate wrto each iv use should be selected.
>>
>> gcc/testsuite/ChangeLog
>> 2014-12-03 Bin Chengbin.cheng@arm.com
>>
>> PR tree-optimization/62178
>> * gcc.target/aarch64/pr62178.c: New test.
>>
>>
>> pr62178-20141202.txt
>>
>>
>> Index: gcc/tree-ssa-loop-ivopts.c
>> ===================================================================
>> --- gcc/tree-ssa-loop-ivopts.c (revision 217828)
>> +++ gcc/tree-ssa-loop-ivopts.c (working copy)
>> @@ -5718,6 +5718,85 @@ iv_ca_extend (struct ivopts_data *data, struct iv_
>> return cost;
>> }
>>
>> +/* Try replacing candidates in IVS which are recorded by list ACT_DELTA
>> to
>> + lower cost candidates. CAND is the one won't be replaced.
>> Replacement
>> + of candidate is recorded in list DELTA. */
Thanks for the review.
>
> Is this better written as:
>
> Try replacing candidates in IVS with IVs related to CAND (which is not
> changed) if doing so lowers the IV cost. ACT_DELTA is the recorded
> list of candidates. Replacement of candidate is recorded in list DELTA.
>
> ?
Will refine that.
>
>
>> + if (data->consider_all_candidates)
>> + {
>> + for (j = 0; j < n_iv_cands (data); j++)
>> + {
>> + if (j == old_cp->cand->id)
>> + continue;
>> +
>> + cnd = iv_cand (data, j);
>> + cp = get_use_iv_cost (data, use, cnd);
>> + if (!cp)
>> + continue;
>> +
>> + if (best_cp == NULL || cheaper_cost_pair (cp, best_cp))
>> + best_cp = cp;
>> + }
>> + }
>> + else
>> + {
>> + EXECUTE_IF_SET_IN_BITMAP (use->related_cands, 0, j, bi)
>> + {
>> + if (j == old_cp->cand->id)
>> + continue;
>> +
>> + cnd = iv_cand (data, j);
>> + cp = get_use_iv_cost (data, use, cnd);
>> + if (!cp)
>> + continue;
>> +
>> + if (best_cp == NULL || cheaper_cost_pair (cp, best_cp))
>> + best_cp = cp;
>> + }
>> + }
>
> The loop bodies here are duplicated. Can you factor them into a function so
> that this looks something like
>
> if (data->consider_all_candidates)
> {
> for (...)
> refactored_code (some arguments)
> }
> else
> {
> EXECUTE_IF_SET_IN_BITMAP (...)
> refactored_code (some arguments)
>
Will do that.
>> @@ -6042,8 +6121,50 @@ try_improve_iv_set (struct ivopts_data *data, stru
>> /* Try removing the candidates from the set instead. */
>> best_cost = iv_ca_prune (data, ivs, NULL, &best_delta);
>>
>> - /* Nothing more we can do. */
>> if (!best_delta)
>> + {
>> + /* So far candidate selecting algorithm tends to choose fewer
>> IVs
>> + so that it can handle cases in which loops have many
>> variables
>> + but the best choice is often to use only one general biv.
>> One
>> + weakness is it can't handle opposite cases, in which
>> different
>> + candidates should be chosen with respect to each use. To
>> solve
>> + the problem, we replace candidate of some uses with lower
>> cost
>> + one, thus general algorithm can have a chance to find optimal
>> + set for these cases. */
>
> So, in essence we've computed a best cost with "minimal" IVs and you're
> using that result as an initial state for expanding the IV set. You expand
> on a candidate by candidate basis if and only if the estimated cost lowers.
> Right?
Yes.
>
> At each expansion step you keep a single candidate fixed and you try to
> derive other IVs from that fixed IV if doing so lowers the cost? Right?
More or less, it like below process:
Candidates set IVS (fixed point by current algorithm) --> Try extend
IVS with a new one CAND if it has lower local cost for any uses(There
will be a candidate set IVS', candidates in this subset of IVS are
replaced with CAND for some uses) --> Replace rest candidates in IVS
if they are in the set of IVS' with any other candidates having lower
local cost) --> Prune the new set.
The first/last steps are actually from current implementation, I added
the middle step. Generally yes, it expands fixed point got by current
algorithm to see if there is lower cost choice. Maybe the comment
wasn't clear enough, I will see how to refine it.
>
>
> That's what it looks like to me when reading the code/comments. Just want
> to make sure it's working the way I think it is.
>
> FWIW, if I read the slides correctly, GCC's IV code was called out as being
> one of the reasons GCC is generating better code for AArch64 than LLVM at a
> recent LLVM conference. Glad to see that we're called out in that way and
> that we continue to try to improve in that space as well.
>
> Jeff
I don't know what to say... IVOPT doesn't work very well on aarch64
and I have a not short todo list for it. Of course some of them are
target indenpendent.
Thanks,
bin
More information about the Gcc-patches
mailing list