Created attachment 40927 [details] reduced test from pr79930 This PR is a sequel of pr79930. On gcc6 and trunk (7.0) the run time is more than twice slower when the attached test is compiled with -fipa-cp-clone in addition to -O2 -fpeel-loops -finline-functions: % gfca sum_xy.f90 -O2 -fpeel-loops -finline-functions % ./a.out Using SUM, time: 9.28960443E-02 sum = -19609.087488337318 % gfca sum_xy.f90 -O2 -fpeel-loops -fipa-cp-clone -finline-functions % ./a.out Using SUM, time: 0.264802039 sum = -56108.398701889098 % gfc6 sum_xy.f90 -O2 -fpeel-loops -finline-functions % ./a.out Using SUM, time: 9.48660374E-02 sum = -27582.388828175150 % gfc6 sum_xy.f90 -O2 -fpeel-loops -fipa-cp-clone -finline-functions % ./a.out Using SUM, time: 0.279235005 sum = -37131.668569316826 % gfc5 sum_xy.f90 -O2 -fpeel-loops -finline-functions % ./a.out Using SUM, time: 0.106678963 sum = 6552.5940282615447 % gfc5 sum_xy.f90 -O2 -fpeel-loops -fipa-cp-clone -finline-functions % ./a.out Using SUM, time: 0.104918003 sum = 28692.268429922191 gfca is trunk at r245974, gfc6 is 6.3.1 at r245745, and gfc5 is 5.4.1 at r245752. Looking at my archives it seems that the regression appeared un gcc5 before branching, but has been reverted since, while always being in the gcc6 branch.
Confirmed. There are numbers for releases (OK means time < 0.15): 4.7.0 (93c5ebd73a4d1626)(22 Mar 2012 07:11): result: FAILED 0.520000041 4.7.1 (0e3097e7d505b7be)(14 Jun 2012 08:32): result: FAILED 0.508000016 4.7.2 (c9b304ada7111264)(20 Sep 2012 06:54): result: FAILED 0.512000024 4.7.3 (f22940cb824859bd)(11 Apr 2013 07:57): result: FAILED 0.516000032 4.7.4 (ae10eb82fe34c186)(12 Jun 2014 12:08): result: FAILED 0.515999913 4.8.0 (e9c762ec4671d77e)(22 Mar 2013 10:05): result: FAILED 0.300000012 4.8.1 (caa62b4636bfed71)(31 May 2013 09:02): result: FAILED 0.300000012 4.8.2 (9bcca88e24e64d4e)(16 Oct 2013 07:20): result: FAILED 0.300000012 4.8.3 (6bbf0dec66c0e719)(22 May 2014 09:10): result: FAILED 0.307999969 4.8.4 (1a97fa0bb3fa5669)(19 Dec 2014 11:43): result: FAILED 0.315999925 4.8.5 (cf82a597b0d18985)(23 Jun 2015 07:54): result: FAILED 0.324000061 4.9.0 (a7aa383874520cd5)(22 Apr 2014 09:43): result: FAILED 0.352000058 4.9.1 (c6fa1b4126635939)(16 Jul 2014 10:04): result: FAILED 0.356000006 4.9.2 (c1283af40b65f1ad)(30 Oct 2014 08:27): result: FAILED 0.359999955 4.9.3 (876d41ed80ce13e0)(26 Jun 2015 17:57): result: FAILED 0.368000031 4.9.4 (d3191480f376c780)(03 Aug 2016 05:07): result: FAILED 0.359999955 5.1.0 (d5ad84b309d0d97d)(22 Apr 2015 08:43): result: FAILED 0.23999995 5.2.0 (7b26e3896e268cd4)(16 Jul 2015 09:13): result: FAILED 0.23999995 5.3.0 (2bc376d60753a58b)(04 Dec 2015 10:45): result: OK 0.0920000076 5.4.0 (9d0507742960aa9f)(03 Jun 2016 08:41): result: OK 0.0920000076 6.1.0 (c441d9e8e0438dcf)(27 Apr 2016 08:20): result: FAILED 0.231999993 6.2.0 (6ac74a62ba725829)(22 Aug 2016 08:01): result: FAILED 0.244000018 6.3.0 (4b5e15daff8b5444)(21 Dec 2016 07:51): result: FAILED 0.232000053 Thus only 5.3.0 and 5.4.0 are fast. I'll isolate a commit that's responsible for that.
So the difference revision is r230550, where Richi added various back-ports: 2015-11-18 Richard Biener <rguenther@suse.de> Backport from mainline 2015-11-07 Jan Hubicka <hubicka@ucw.cz> PR ipa/68057 PR ipa/68220 * ipa-polymorphic-call.c (ipa_polymorphic_call_context::restrict_to_inner_type): Fix ordering issue when offset is out of range. (contains_type_p): Fix out of range check, clear dynamic flag. * g++.dg/torture/pr68220.C: New testcase. * g++.dg/lto/pr68057_0.C: Likewise. * g++.dg/lto/pr68057_1.C: Likewise. 2015-10-23 Jan Hubicka <hubicka@ucw.cz> PR ipa/pr67600 * ipa-polymorphic-call.c (ipa_polymorphic_call_context::get_dynamic_type): Do not confuse instance offset with offset of outer type. * g++.dg/torture/pr67600.C: New testcase. 2015-10-12 Richard Biener <rguenther@suse.de> PR ipa/67783 * ipa-inline-analysis.c (estimate_function_body_sizes): Re-add code that analyzes IVs on each stmt but in a cheaper way avoiding quadratic behavior. 2015-10-11 Jan Hubicka <hubicka@ucw.cz> PR ipa/67056 * ipa-polymorphic-call.c (possible_placement_new): If cur_offset is negative we don't know the type. (check_stmt_for_type_change): Skip constructors of non-polymorphic types as those won't help devirutalization. * g++.dg/ipa/pr67056.C: New testcase. 2015-08-11 Manuel López-Ibáñez <manu@gcc.gnu.org> PR c/66098 PR c/66711 * diagnostic.c (diagnostic_classify_diagnostic): Take -Werror into account when deciding what was the command-line status. * gcc.dg/pragma-diag-3.c: New test. * gcc.dg/pragma-diag-4.c: New test.
Inlining difference shows: Inlining mysum to tp_sum with frequency 1000 Inlining tp_sum to runtptests with frequency 99000 Inlining mysum to runtptests with frequency 100000 vs. just Inlining mysum.constprop to tp_sum with frequency 1000 We don't even consider to inline tp_sum when -fipa-cp-clone is enabled.
GCC 6.4 is being released, adjusting target milestone.
Fixed on trunk.
Author: hubicka Date: Sun Feb 4 17:15:36 2018 New Revision: 257367 URL: https://gcc.gnu.org/viewcvs?rev=257367&root=gcc&view=rev Log: PR middle-end/79966 * gfortran.dg/pr79966.f90: New testcase Added: trunk/gcc/testsuite/gfortran.dg/pr79966.f90 Modified: trunk/gcc/testsuite/ChangeLog
GCC 6 branch is being closed
The testcase fails for every target on trunk (gcc-9): FAIL: gfortran.dg/pr79966.f90 -O scan-ipa-dump inline "Inlined tp_sum/[0-9]+ into runtptests/[0-9]+"
GCC 9.1 has been released.
The run time on the 9 branch and trunk with/without -fipa-cp-clone is now as slow as for the 8 branch with -fipa-cp-clone: % gfc9 pr79966.f90 -O2 -fpeel-loops -finline-functions % time ./a.out Using SUM, time: 0.291947961
GCC 9.2 has been released.
GCC 9.3.0 has been released, adjusting target milestone.
I can confirm this, even on current trunk. The reason is that runtptests/3 -> tp_sum/5 is not inlined because it exceeds max-inline-insns-auto. I have to set the param to 43 - the default is 15 - for the function to be inlined and then the result is fast again. In GCC 8 the value of the param was 30. After a very superficial glance at the function in the release_ssa dump, the function does not seem to be any bigger than in the gcc 8 times.
And if I looked into the testcase that is in the repo, I would have discovered PR 88711 which is exactly about this. See Honza's comments about the problem there. (I'd prefer to keep that bug opened and close this one, but Honza decided otherwise, so I'm changing the name because it not an IPA-CP issue any more).
GCC 9.4 is being released, retargeting bugs to GCC 9.5.
GCC 9 branch is being closed
GCC 10.4 is being released, retargeting bugs to GCC 10.5.
GCC 10 branch is being closed.