[Bug middle-end/54394] New: fatigue2 -flto run time regression
jamborm at gcc dot gnu.org
gcc-bugzilla@gcc.gnu.org
Tue Aug 28 22:32:00 GMT 2012
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54394
Bug #: 54394
Summary: fatigue2 -flto run time regression
Classification: Unclassified
Product: gcc
Version: 4.8.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: middle-end
AssignedTo: unassigned@gcc.gnu.org
ReportedBy: jamborm@gcc.gnu.org
CC: rguenth@gcc.gnu.org
Host: x86_64-linux-gnu
Target: x86_64-linux-gnu
Revision 190346 caused a large run time regression of fatigue2
polyhedron benchmark when run with -Ofast -flto. On a x86_64-linux
box, the run time went from 150 seconds to 215 seconds and there is a
similar percentage increase on my i686-linux desktop.
The commit leading to that revision is:
2012-08-13 Richard Guenther <rguenther@suse.de>
* basic-block.h (struct basic_block): Remove loop_depth
member, move flags and index members next to each other.
* cfgloop.h (bb_loop_depth): New inline function.
* cfghooks.c (split_block): Do not set loop_depth.
(duplicate_block): Likewise.
* cfgloop.c (flow_loop_nodes_find): Likewise.
(flow_loops_find): Likewise.
(add_bb_to_loop): Likewise.
(remove_bb_from_loops): Likewise.
* cfgrtl.c (force_nonfallthru_and_redirect): Likewise.
* gimple-streamer-in.c (input_bb): Do not stream loop_depth.
* gimple-streamer-out.c (output_bb): Likewise.
* bt-load.c: Include cfgloop.h.
(migrate_btr_defs): Use bb_loop_depth.
* cfg.c (dump_bb_info): Likewise.
* final.c (compute_alignments): Likewise.
* ira.c (update_equiv_regs): Likewise.
* tree-ssa-copy.c (init_copy_prop): Likewise.
* tree-ssa-dom.c (loop_depth_of_name): Likewise.
* tree-ssa-forwprop.c: Include cfgloop.h.
(forward_propagate_addr_expr): Use bb_loop_depth.
* tree-ssa-pre.c (insert_into_preds_of_block): Likewise.
* tree-ssa-sink.c (select_best_block): Likewise.
* ipa-inline-analysis.c: Include cfgloop.h.
(estimate_function_body_sizes): Use bb_loop_depth.
* Makefile.in (tree-ssa-forwprop.o): Depend on $(CFGLOOP_H).
(ipa-inline-analysis.o): Likewise.
(bt-load.o): Likewise.
* gcc.dg/tree-prof/update-loopch.c: Adjust.
I believe the patch was not supposed to alter compiler output in any
(significant) way. However, inlining decisions are different (file 1
is the dump before the patch, file 2 with it):
In file 1: extra inlining into function MAIN__.2477/17
Function __computer_time_m_MOD_computer_time/13 inlined 1 times (as opposed
to 0 times)
Function __perdida_m_MOD_perdida/16 inlined 1 times (as opposed to 0 times)
In file 2: extra inlining into function MAIN__.2477/17
Function __free_input_MOD_convert_lower_case/9 inlined 1 times (as opposed
to 0 times)
Function __free_input_MOD_convert_lower_case.part.2.2390/62 inlined 1 times
(as opposed to 0 times)
Function __read_input_m_MOD_read_input/12 inlined 1 times (as opposed to 0
times)
In file 2: extra un-inlined function __perdida_m_MOD_perdida/16
Callers: 1, Callees: 27, Inlinees: 0
In file 1: extra un-inlined function
__read_input_m_MOD_read_input.constprop.0/122
Originally a clone of __read_input_m_MOD_read_input/12
Callers: 1, Callees: 530, Inlinees: 22
At the same time this does not seem to be an LTO issue because the
inline dump of the compilation (as opposed to linking) before the
patch contains lines:
__perdida_m_MOD_perdida/9 function not considered for inlining
loop depth: 2 freq:53666 size:21 time: 30 callee size: 0 stack: 0
which the patch changes to:
__perdida_m_MOD_perdida/9 function not considered for inlining
loop depth: 0 freq:53666 size:21 time: 30 callee size: 0 stack: 0
LTO only makes the heuristics inline perdida as a function called just
once. Loop depth 0 makes the candidate look not beneficial/cold even
when we know there are no other callees.
Loop depth is zero because at the time of inlining analysis, the
bb->loop_father is NULL. So it seems we need to compute loops at the
beginning of inline summary generation?
More information about the Gcc-bugs
mailing list