This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |
Other format: | [Raw text] |
CCing Sebastian. Thanks, bin ---------- Forwarded message ---------- From: Bin.Cheng <amker.cheng@gmail.com> Date: Tue, Dec 16, 2014 at 4:42 PM Subject: Re: [PATCH PR62178]Improve candidate selecting in IVOPT, 2nd try. To: Richard Biener <richard.guenther@gmail.com> Cc: Bin Cheng <bin.cheng@arm.com>, GCC Patches <gcc-patches@gcc.gnu.org>, Zdenek Dvorak <ook@ucw.cz> On Thu, Dec 11, 2014 at 8:08 PM, Richard Biener <richard.guenther@gmail.com> wrote: > On Thu, Dec 11, 2014 at 10:56 AM, Bin.Cheng <amker.cheng@gmail.com> wrote: >> On Wed, Dec 10, 2014 at 9:47 PM, Richard Biener >> <richard.guenther@gmail.com> wrote: >>> On Fri, Dec 5, 2014 at 1:15 PM, Bin Cheng <bin.cheng@arm.com> wrote: >>>> Hi, >>>> Though PR62178 is hidden by recent cost change in aarch64 backend, the ivopt >>>> issue still exists. >>>> >>>> Current candidate selecting algorithm tends to select fewer candidates given >>>> below reasons: >>>> 1) to better handle loops with many induction uses but the best choice is >>>> one generic basic induction variable; >>>> 2) to keep compilation time low. >>>> >>>> One fundamental weakness of the strategy is the opposite situation can't be >>>> handled properly sometimes. For these cases the best choice is each >>>> induction variable has its own candidate. >>>> This patch fixes the problem by shuffling candidate set after fix-point is >>>> reached by current implementation. The reason why this strategy works is it >>>> replaces candidate set by selecting local optimal candidate for some >>>> induction uses, and the new candidate set (has lower cost) is exact what we >>>> want in the mentioned case. Instrumentation data shows this can find better >>>> candidates set for ~6% loops in spec2006 on x86_64, and ~4% on aarch64. >>>> >>>> This patch actually is extension to the first version patch posted at >>>> https://gcc.gnu.org/ml/gcc-patches/2014-09/msg02620.html, that only adds >>>> another selecting pass with special seed set (more or less like the shuffled >>>> set in this patch). Data also confirms this patch can find optimal sets for >>>> most loops found by the first one, as well as optimal sets for many new >>>> loops. >>>> >>>> Bootstrap and test on x86_64, no regression on benchmarks. Bootstrap and >>>> test on aarch64. >>>> Since this patch only selects candidate set with lower cost, any regressions >>>> revealed are latent bugs of other components in GCC. >>>> I also collected GCC bootstrap time on x86_64, no regression either. >>>> Is this OK? >>> >>> The algorithm seems to be quadratic in the number of IV candidates >>> (at least): >> Yes, I worried about that too, that's why I measured the bootstrap >> time. One way is restrict this procedure one time for each loop. I >> already tried that and it can capture +90% loops. Is this sounds >> reasonable? > > Yes. That's my suggestion to handle it in the caller of try_improve_iv_set? > >> BTW, do we have some compilation time benchmarks for GCC? > > There are various testcases linked from PR47344, I don't remember > any particular one putting load on IVOPTs (but I do remember seeing > IVOPTs in the ~25% area in -ftime-report for some testcases). Hi Jeff & Richard, I updated patch according to your review comments. Is this version looks good? I didn't find cases in PR47344 which exercising IVOPT, but I produced one case from PR53852 which runs ivopt for ~17% of total time (28s). This patch does increase IVOPT time to 18%. Unfortunately, I tried the other restriction, it doesn't work as well as this one on spec2k6, if I understood the method correctly. Hi Sebastian, Thanks for help! I managed to run llvm compilation time tests successfully as you suggested. Case Multisource/Benchmarks/mafft/pairlocalalign is regressed but I can't reproduce it in cmd. The running time of compilation of pairlocalalign.c is too small comparing to the results. I also tried to invoke it by using RunSafely.sh but no lucky either. So any documentation on this? Thanks very much! Thanks, bin 2014-12-16 Bin Cheng <bin.cheng@arm.com> PR tree-optimization/62178 * tree-ssa-loop-ivopts.c (cheaper_cost_with_cand): New function. (iv_ca_replace): New function. (try_improve_iv_set): New parameter try_replace_p. Replace candidates in IVS by calling iv_ca_replace. (find_optimal_iv_set_1): Pass new argument to try_improve_iv_set. gcc/testsuite/ChangeLog 2014-12-16 Bin Cheng <bin.cheng@arm.com> PR tree-optimization/62178 * gcc.target/aarch64/pr62178.c: New test.
Attachment:
pr62178-20141216.txt
Description: Text document
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |