Bug 79966 - [11/12/13/14 Regression] Test gfortran.dg/pr79966.f90 slow again, inliner hits max-inline-insns-auto
Summary: [11/12/13/14 Regression] Test gfortran.dg/pr79966.f90 slow again, inliner hit...
Status: NEW
Alias: None
Product: gcc
Classification: Unclassified
Component: ipa (show other bugs)
Version: 7.0.1
: P2 normal
Target Milestone: 11.5
Assignee: Not yet assigned to anyone
URL:
Keywords: missed-optimization
Depends on: 88711
Blocks:
  Show dependency treegraph
 
Reported: 2017-03-08 23:01 UTC by Dominique d'Humieres
Modified: 2023-07-07 10:32 UTC (History)
4 users (show)

See Also:
Host:
Target:
Build:
Known to work: 5.3.0, 5.4.0
Known to fail: 5.2.0, 6.3.0, 7.0
Last reconfirmed: 2017-03-09 00:00:00


Attachments
reduced test from pr79930 (1.14 KB, text/plain)
2017-03-08 23:01 UTC, Dominique d'Humieres
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Dominique d'Humieres 2017-03-08 23:01:08 UTC
Created attachment 40927 [details]
reduced test from pr79930

This PR is a sequel of pr79930. On gcc6 and trunk (7.0) the run time is more than twice slower when the attached test is compiled with -fipa-cp-clone in addition to -O2 -fpeel-loops -finline-functions:

% gfca sum_xy.f90 -O2 -fpeel-loops -finline-functions
% ./a.out
 Using SUM, time:          9.28960443E-02
 sum =  -19609.087488337318     
% gfca sum_xy.f90 -O2 -fpeel-loops -fipa-cp-clone -finline-functions
% ./a.out
 Using SUM, time:         0.264802039    
 sum =  -56108.398701889098     
% gfc6 sum_xy.f90 -O2 -fpeel-loops -finline-functions
% ./a.out
 Using SUM, time:          9.48660374E-02
 sum =  -27582.388828175150     
% gfc6 sum_xy.f90 -O2 -fpeel-loops -fipa-cp-clone -finline-functions
% ./a.out
 Using SUM, time:         0.279235005    
 sum =  -37131.668569316826     
% gfc5 sum_xy.f90 -O2 -fpeel-loops -finline-functions
% ./a.out
 Using SUM, time:         0.106678963    
 sum =   6552.5940282615447     
% gfc5 sum_xy.f90 -O2 -fpeel-loops -fipa-cp-clone -finline-functions
% ./a.out
 Using SUM, time:         0.104918003    
 sum =   28692.268429922191     

gfca is trunk at r245974, gfc6 is 6.3.1 at r245745, and gfc5 is 5.4.1 at r245752.

Looking at my archives it seems that the regression appeared un gcc5 before branching, but has been reverted since, while always being in the gcc6 branch.
Comment 1 Martin Liška 2017-03-09 08:54:36 UTC
Confirmed. There are numbers for releases (OK means time < 0.15):

  4.7.0 (93c5ebd73a4d1626)(22 Mar 2012 07:11):  result: FAILED
0.520000041
  4.7.1 (0e3097e7d505b7be)(14 Jun 2012 08:32):  result: FAILED
0.508000016
  4.7.2 (c9b304ada7111264)(20 Sep 2012 06:54):  result: FAILED
0.512000024
  4.7.3 (f22940cb824859bd)(11 Apr 2013 07:57):  result: FAILED
0.516000032
  4.7.4 (ae10eb82fe34c186)(12 Jun 2014 12:08):  result: FAILED
0.515999913
  4.8.0 (e9c762ec4671d77e)(22 Mar 2013 10:05):  result: FAILED
0.300000012
  4.8.1 (caa62b4636bfed71)(31 May 2013 09:02):  result: FAILED
0.300000012
  4.8.2 (9bcca88e24e64d4e)(16 Oct 2013 07:20):  result: FAILED
0.300000012
  4.8.3 (6bbf0dec66c0e719)(22 May 2014 09:10):  result: FAILED
0.307999969
  4.8.4 (1a97fa0bb3fa5669)(19 Dec 2014 11:43):  result: FAILED
0.315999925
  4.8.5 (cf82a597b0d18985)(23 Jun 2015 07:54):  result: FAILED
0.324000061
  4.9.0 (a7aa383874520cd5)(22 Apr 2014 09:43):  result: FAILED
0.352000058
  4.9.1 (c6fa1b4126635939)(16 Jul 2014 10:04):  result: FAILED
0.356000006
  4.9.2 (c1283af40b65f1ad)(30 Oct 2014 08:27):  result: FAILED
0.359999955
  4.9.3 (876d41ed80ce13e0)(26 Jun 2015 17:57):  result: FAILED
0.368000031
  4.9.4 (d3191480f376c780)(03 Aug 2016 05:07):  result: FAILED
0.359999955
  5.1.0 (d5ad84b309d0d97d)(22 Apr 2015 08:43):  result: FAILED
0.23999995
  5.2.0 (7b26e3896e268cd4)(16 Jul 2015 09:13):  result: FAILED
0.23999995
  5.3.0 (2bc376d60753a58b)(04 Dec 2015 10:45):  result: OK
0.0920000076
  5.4.0 (9d0507742960aa9f)(03 Jun 2016 08:41):  result: OK
0.0920000076
  6.1.0 (c441d9e8e0438dcf)(27 Apr 2016 08:20):  result: FAILED
0.231999993
  6.2.0 (6ac74a62ba725829)(22 Aug 2016 08:01):  result: FAILED
0.244000018
  6.3.0 (4b5e15daff8b5444)(21 Dec 2016 07:51):  result: FAILED
0.232000053

Thus only 5.3.0 and 5.4.0 are fast. I'll isolate a commit that's responsible for that.
Comment 2 Martin Liška 2017-03-09 09:33:29 UTC
So the difference revision is r230550, where Richi added various back-ports:

    2015-11-18  Richard Biener  <rguenther@suse.de>
    
            Backport from mainline
            2015-11-07  Jan Hubicka  <hubicka@ucw.cz>
    
            PR ipa/68057
            PR ipa/68220
            * ipa-polymorphic-call.c
            (ipa_polymorphic_call_context::restrict_to_inner_type): Fix ordering
            issue when offset is out of range.
            (contains_type_p): Fix out of range check, clear dynamic flag.
    
            * g++.dg/torture/pr68220.C: New testcase.
            * g++.dg/lto/pr68057_0.C: Likewise.
            * g++.dg/lto/pr68057_1.C: Likewise.
    
            2015-10-23  Jan Hubicka  <hubicka@ucw.cz>
    
            PR ipa/pr67600
            * ipa-polymorphic-call.c
            (ipa_polymorphic_call_context::get_dynamic_type): Do not confuse
            instance offset with offset of outer type.
    
            * g++.dg/torture/pr67600.C: New testcase.
    
            2015-10-12  Richard Biener  <rguenther@suse.de>
    
            PR ipa/67783
            * ipa-inline-analysis.c (estimate_function_body_sizes): Re-add
            code that analyzes IVs on each stmt but in a cheaper way avoiding
            quadratic behavior.
    
            2015-10-11  Jan Hubicka  <hubicka@ucw.cz>
    
            PR ipa/67056
            * ipa-polymorphic-call.c (possible_placement_new): If cur_offset
            is negative we don't know the type.
            (check_stmt_for_type_change): Skip constructors of non-polymorphic
            types as those won't help devirutalization.
    
            * g++.dg/ipa/pr67056.C: New testcase.
    
            2015-08-11  Manuel López-Ibáñez  <manu@gcc.gnu.org>
    
            PR c/66098
            PR c/66711
            * diagnostic.c (diagnostic_classify_diagnostic): Take -Werror into
            account when deciding what was the command-line status.
    
            * gcc.dg/pragma-diag-3.c: New test.
            * gcc.dg/pragma-diag-4.c: New test.
Comment 3 Richard Biener 2017-03-10 12:57:55 UTC
Inlining difference shows:

Inlining mysum to tp_sum with frequency 1000
Inlining tp_sum to runtptests with frequency 99000
Inlining mysum to runtptests with frequency 100000

vs. just

Inlining mysum.constprop to tp_sum with frequency 1000

We don't even consider to inline tp_sum when -fipa-cp-clone is enabled.
Comment 4 Richard Biener 2017-07-04 08:49:07 UTC
GCC 6.4 is being released, adjusting target milestone.
Comment 5 Jan Hubicka 2018-02-04 17:13:18 UTC
Fixed on trunk.
Comment 6 Jan Hubicka 2018-02-04 17:16:17 UTC
Author: hubicka
Date: Sun Feb  4 17:15:36 2018
New Revision: 257367

URL: https://gcc.gnu.org/viewcvs?rev=257367&root=gcc&view=rev
Log:

	PR middle-end/79966
	* gfortran.dg/pr79966.f90: New testcase

Added:
    trunk/gcc/testsuite/gfortran.dg/pr79966.f90
Modified:
    trunk/gcc/testsuite/ChangeLog
Comment 7 Jakub Jelinek 2018-10-26 10:11:31 UTC
GCC 6 branch is being closed
Comment 8 Uroš Bizjak 2019-01-31 12:42:27 UTC
The testcase fails for every target on trunk (gcc-9):

FAIL: gfortran.dg/pr79966.f90   -O   scan-ipa-dump inline "Inlined tp_sum/[0-9]+ into runtptests/[0-9]+"
Comment 9 Jakub Jelinek 2019-05-03 09:17:11 UTC
GCC 9.1 has been released.
Comment 10 Dominique d'Humieres 2019-05-21 12:37:49 UTC
The run time on the 9 branch and trunk with/without -fipa-cp-clone is now as slow as for the 8 branch with -fipa-cp-clone:

% gfc9 pr79966.f90 -O2 -fpeel-loops -finline-functions
% time ./a.out
 Using SUM, time:         0.291947961
Comment 11 Jakub Jelinek 2019-08-12 08:56:19 UTC
GCC 9.2 has been released.
Comment 12 Jakub Jelinek 2020-03-12 11:58:50 UTC
GCC 9.3.0 has been released, adjusting target milestone.
Comment 13 Martin Jambor 2021-01-07 15:53:42 UTC
I can confirm this, even on current trunk.

The reason is that runtptests/3 -> tp_sum/5 is not inlined because it
exceeds max-inline-insns-auto.  I have to set the param to 43 - the
default is 15 - for the function to be inlined and then the result is
fast again.  In GCC 8 the value of the param was 30.  After a very
superficial glance at the function in the release_ssa dump, the
function does not seem to be any bigger than in the gcc 8 times.
Comment 14 Martin Jambor 2021-01-07 16:20:52 UTC
And if I looked into the testcase that is in the repo, I would have
discovered PR 88711 which is exactly about this.  See Honza's comments
about the problem there.

(I'd prefer to keep that bug opened and close this one, but Honza
decided otherwise, so I'm changing the name because it not an IPA-CP
issue any more).
Comment 15 Richard Biener 2021-06-01 08:08:46 UTC
GCC 9.4 is being released, retargeting bugs to GCC 9.5.
Comment 16 Richard Biener 2022-05-27 09:37:07 UTC
GCC 9 branch is being closed
Comment 17 Jakub Jelinek 2022-06-28 10:33:07 UTC
GCC 10.4 is being released, retargeting bugs to GCC 10.5.
Comment 18 Richard Biener 2023-07-07 10:32:06 UTC
GCC 10 branch is being closed.