This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH PR62178]Improve candidate selecting in IVOPT, 2nd try.
- From: "Bin.Cheng" <amker dot cheng at gmail dot com>
- To: Richard Biener <richard dot guenther at gmail dot com>
- Cc: Bin Cheng <bin dot cheng at arm dot com>, GCC Patches <gcc-patches at gcc dot gnu dot org>, Zdenek Dvorak <ook at ucw dot cz>
- Date: Tue, 16 Dec 2014 19:53:07 +0800
- Subject: Re: [PATCH PR62178]Improve candidate selecting in IVOPT, 2nd try.
- Authentication-results: sourceware.org; auth=none
- References: <001001d01085$2a1e7bc0$7e5b7340$ at arm dot com> <CAFiYyc1vQfnMMDT8ckTcgs_34vzTYz8untzK14Hkgaws_gOBgQ at mail dot gmail dot com> <CAHFci2_aGXwd20MeiVhhR-FX_RNz7DtJxe3jdKYEPDEjuGPxHQ at mail dot gmail dot com> <CAFiYyc1MvyggnVYr_diZnv=8pHCg8ygsWPHerVfTiCDP_kuZfA at mail dot gmail dot com> <CAHFci28-jC_nwzLXgdKCfHLSpJD29h51ygkPHxBdJFK757wvTQ at mail dot gmail dot com>
Please ignore this one, I will further refine it. Sorry for disturbing!
Thanks,
bin
On Tue, Dec 16, 2014 at 4:42 PM, Bin.Cheng <amker.cheng@gmail.com> wrote:
> On Thu, Dec 11, 2014 at 8:08 PM, Richard Biener
> <richard.guenther@gmail.com> wrote:
>> On Thu, Dec 11, 2014 at 10:56 AM, Bin.Cheng <amker.cheng@gmail.com> wrote:
>>> On Wed, Dec 10, 2014 at 9:47 PM, Richard Biener
>>> <richard.guenther@gmail.com> wrote:
>>>> On Fri, Dec 5, 2014 at 1:15 PM, Bin Cheng <bin.cheng@arm.com> wrote:
>>>>> Hi,
>>>>> Though PR62178 is hidden by recent cost change in aarch64 backend, the ivopt
>>>>> issue still exists.
>>>>>
>>>>> Current candidate selecting algorithm tends to select fewer candidates given
>>>>> below reasons:
>>>>> 1) to better handle loops with many induction uses but the best choice is
>>>>> one generic basic induction variable;
>>>>> 2) to keep compilation time low.
>>>>>
>>>>> One fundamental weakness of the strategy is the opposite situation can't be
>>>>> handled properly sometimes. For these cases the best choice is each
>>>>> induction variable has its own candidate.
>>>>> This patch fixes the problem by shuffling candidate set after fix-point is
>>>>> reached by current implementation. The reason why this strategy works is it
>>>>> replaces candidate set by selecting local optimal candidate for some
>>>>> induction uses, and the new candidate set (has lower cost) is exact what we
>>>>> want in the mentioned case. Instrumentation data shows this can find better
>>>>> candidates set for ~6% loops in spec2006 on x86_64, and ~4% on aarch64.
>>>>>
>>>>> This patch actually is extension to the first version patch posted at
>>>>> https://gcc.gnu.org/ml/gcc-patches/2014-09/msg02620.html, that only adds
>>>>> another selecting pass with special seed set (more or less like the shuffled
>>>>> set in this patch). Data also confirms this patch can find optimal sets for
>>>>> most loops found by the first one, as well as optimal sets for many new
>>>>> loops.
>>>>>
>>>>> Bootstrap and test on x86_64, no regression on benchmarks. Bootstrap and
>>>>> test on aarch64.
>>>>> Since this patch only selects candidate set with lower cost, any regressions
>>>>> revealed are latent bugs of other components in GCC.
>>>>> I also collected GCC bootstrap time on x86_64, no regression either.
>>>>> Is this OK?
>>>>
>>>> The algorithm seems to be quadratic in the number of IV candidates
>>>> (at least):
>>> Yes, I worried about that too, that's why I measured the bootstrap
>>> time. One way is restrict this procedure one time for each loop. I
>>> already tried that and it can capture +90% loops. Is this sounds
>>> reasonable?
>>
>> Yes. That's my suggestion to handle it in the caller of try_improve_iv_set?
>>
>>> BTW, do we have some compilation time benchmarks for GCC?
>>
>> There are various testcases linked from PR47344, I don't remember
>> any particular one putting load on IVOPTs (but I do remember seeing
>> IVOPTs in the ~25% area in -ftime-report for some testcases).
>
>
> Hi Jeff & Richard,
> I updated patch according to your review comments. Is this version looks good?
> I didn't find cases in PR47344 which exercising IVOPT, but I produced
> one case from PR53852 which runs ivopt for ~17% of total time (28s).
> This patch does increase IVOPT time to 18%. Unfortunately, I tried
> the other restriction, it doesn't work as well as this one on spec2k6,
> if I understood the method correctly.
>
> Hi Sebastian,
> Thanks for help! I managed to run llvm compilation time tests
> successfully as you suggested. Case
> Multisource/Benchmarks/mafft/pairlocalalign is regressed but I can't
> reproduce it in cmd. The running time of compilation of
> pairlocalalign.c is too small comparing to the results. I also tried
> to invoke it by using RunSafely.sh but no lucky either. So any
> documentation on this? Thanks very much!
>
> Thanks,
> bin
>
> 2014-12-16 Bin Cheng <bin.cheng@arm.com>
>
> PR tree-optimization/62178
> * tree-ssa-loop-ivopts.c (cheaper_cost_with_cand): New function.
> (iv_ca_replace): New function.
> (try_improve_iv_set): New parameter try_replace_p.
> Replace candidates in IVS by calling iv_ca_replace.
> (find_optimal_iv_set_1): Pass new argument to try_improve_iv_set.
>
> gcc/testsuite/ChangeLog
> 2014-12-16 Bin Cheng <bin.cheng@arm.com>
>
> PR tree-optimization/62178
> * gcc.target/aarch64/pr62178.c: New test.