This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Fwd: [PATCH PR62178]Improve candidate selecting in IVOPT, 2nd try.


CCing Sebastian.

Thanks,
bin

---------- Forwarded message ----------
From: Bin.Cheng <amker.cheng@gmail.com>
Date: Tue, Dec 16, 2014 at 4:42 PM
Subject: Re: [PATCH PR62178]Improve candidate selecting in IVOPT, 2nd try.
To: Richard Biener <richard.guenther@gmail.com>
Cc: Bin Cheng <bin.cheng@arm.com>, GCC Patches
<gcc-patches@gcc.gnu.org>, Zdenek Dvorak <ook@ucw.cz>


On Thu, Dec 11, 2014 at 8:08 PM, Richard Biener
<richard.guenther@gmail.com> wrote:
> On Thu, Dec 11, 2014 at 10:56 AM, Bin.Cheng <amker.cheng@gmail.com> wrote:
>> On Wed, Dec 10, 2014 at 9:47 PM, Richard Biener
>> <richard.guenther@gmail.com> wrote:
>>> On Fri, Dec 5, 2014 at 1:15 PM, Bin Cheng <bin.cheng@arm.com> wrote:
>>>> Hi,
>>>> Though PR62178 is hidden by recent cost change in aarch64 backend, the ivopt
>>>> issue still exists.
>>>>
>>>> Current candidate selecting algorithm tends to select fewer candidates given
>>>> below reasons:
>>>>   1) to better handle loops with many induction uses but the best choice is
>>>> one generic basic induction variable;
>>>>   2) to keep compilation time low.
>>>>
>>>> One fundamental weakness of the strategy is the opposite situation can't be
>>>> handled properly sometimes.  For these cases the best choice is each
>>>> induction variable has its own candidate.
>>>> This patch fixes the problem by shuffling candidate set after fix-point is
>>>> reached by current implementation.  The reason why this strategy works is it
>>>> replaces candidate set by selecting local optimal candidate for some
>>>> induction uses, and the new candidate set (has lower cost) is exact what we
>>>> want in the mentioned case.  Instrumentation data shows this can find better
>>>> candidates set for ~6% loops in spec2006 on x86_64, and ~4% on aarch64.
>>>>
>>>> This patch actually is extension to the first version patch posted at
>>>> https://gcc.gnu.org/ml/gcc-patches/2014-09/msg02620.html, that only adds
>>>> another selecting pass with special seed set (more or less like the shuffled
>>>> set in this patch).  Data also confirms this patch can find optimal sets for
>>>> most loops found by the first one, as well as optimal sets for many new
>>>> loops.
>>>>
>>>> Bootstrap and test on x86_64, no regression on benchmarks.  Bootstrap and
>>>> test on aarch64.
>>>> Since this patch only selects candidate set with lower cost, any regressions
>>>> revealed are latent bugs of other components in GCC.
>>>> I also collected GCC bootstrap time on x86_64, no regression either.
>>>> Is this OK?
>>>
>>> The algorithm seems to be quadratic in the number of IV candidates
>>> (at least):
>> Yes, I worried about that too, that's why I measured the bootstrap
>> time.  One way is restrict this procedure one time for each loop.  I
>> already tried that and it can capture +90% loops.  Is this sounds
>> reasonable?
>
> Yes.  That's my suggestion to handle it in the caller of try_improve_iv_set?
>
>> BTW, do we have some compilation time benchmarks for GCC?
>
> There are various testcases linked from PR47344, I don't remember
> any particular one putting load on IVOPTs (but I do remember seeing
> IVOPTs in the ~25% area in -ftime-report for some testcases).


Hi Jeff & Richard,
I updated patch according to your review comments.  Is this version looks good?
I didn't find cases in PR47344 which exercising IVOPT, but I produced
one case from PR53852 which runs ivopt for ~17% of total time (28s).
This patch does increase IVOPT time to 18%.  Unfortunately, I tried
the other restriction, it doesn't work as well as this one on spec2k6,
if I understood the method correctly.

Hi Sebastian,
Thanks for help!  I managed to run llvm compilation time tests
successfully as you suggested.  Case
Multisource/Benchmarks/mafft/pairlocalalign is regressed but I can't
reproduce it in cmd.  The running time of compilation of
pairlocalalign.c is too small comparing to the results.  I also tried
to invoke it by using RunSafely.sh but no lucky either.  So any
documentation on this?  Thanks very much!

Thanks,
bin

2014-12-16  Bin Cheng  <bin.cheng@arm.com>

    PR tree-optimization/62178
    * tree-ssa-loop-ivopts.c (cheaper_cost_with_cand): New function.
    (iv_ca_replace): New function.
    (try_improve_iv_set): New parameter try_replace_p.
    Replace candidates in IVS by calling iv_ca_replace.
    (find_optimal_iv_set_1): Pass new argument to try_improve_iv_set.

gcc/testsuite/ChangeLog
2014-12-16  Bin Cheng  <bin.cheng@arm.com>

    PR tree-optimization/62178
    * gcc.target/aarch64/pr62178.c: New test.

Attachment: pr62178-20141216.txt
Description: Text document


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]