[Bug rtl-optimization/67396] [4.9/5.0 regression] Performance regression compiling variadic function with many arguments
miyuki at gcc dot gnu.org
gcc-bugzilla@gcc.gnu.org
Mon Aug 31 09:20:00 GMT 2015
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67396
Mikhail Maltsev <miyuki at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Target|x86_64-*-* |
CC| |miyuki at gcc dot gnu.org
Version|5.2.0 |unknown
Target Milestone|4.9.4 |---
Summary|[4.9/5/6 regression] |[4.9/5.0 regression]
|Performance regression |Performance regression
|compiling variadic function |compiling variadic function
|with many arguments in RTL |with many arguments
|DSE |
--- Comment #7 from Mikhail Maltsev <miyuki at gcc dot gnu.org> ---
(In reply to Paul Pluzhnikov from comment #5)
> for j in 500 1000 2000 3000; do perl ./gen_bz18872.pl $j > t.c ; (time
> gcc-svn-r227326-rel/bin/gcc -c -O2 t.c) |& grep user; done
>
> user 0m1.488s
> user 0m12.010s
> user 1m40.353s
> user 5m47.668s
Rough fitting suggests that it is O(n^3).
I did some profiling and it looks like we get rather deep recursion in alias
analysis. Most time is spent in the following call chain: dse_step1 ->
scan_insn -> record_store -> canon_true_dependence -> true_dependence1. It has
2 callees, which start deep recursion: memrefs_conflict_p and find_base_term
(the latter is invoked twice, the profiler suggests that only the second call,
i.e.
rtx mem_base = find_base_term (true_mem_addr);
is a hot spot). Then we start to recur via these 2 (interleaving) calls:
f = val->locs;
/* Temporarily reset val->locs to avoid infinite recursion. */
val->locs = NULL;
for (l = f; l; l = l->next)
if (GET_CODE (l->loc) == VALUE
&& CSELIB_VAL_PTR (l->loc)->locs
&& !CSELIB_VAL_PTR (l->loc)->locs->next
&& CSELIB_VAL_PTR (l->loc)->locs->loc == x)
continue;
else if ((ret = find_base_term (l->loc)) != 0)
1865 else if ((ret = find_base_term (l->loc)) != 0)
(gdb) p l->loc
$6 = (rtx_def *) (plus:DI (value:DI 2:4321 @0x1abc518/0x1ab8a20)
(const_int -8 [0xfffffffffffffff8]))
and:
/* Go ahead and find the base term for both operands. If either base
term is from a pointer or is a named object or a special address
(like an argument or stack reference), then use it for the
base term. */
rtx base = find_base_term (tmp1);
if (base != NULL_RTX
&& ((REG_P (tmp1) && REG_POINTER (tmp1))
|| known_base_value_p (base)))
return base;
#1 0x000000000062b92a in find_base_term (x=<optimized out>) at
/home/miyuki/gcc/src/gcc/alias.c:1917
1917 rtx base = find_base_term (tmp1);
(gdb) p tmp1
$7 = (rtx_def *) (value:DI 2:4321 @0x1abc518/0x1ab8a20)
It's harder to say what happens in memrefs_conflict_p because it is
tail-recursive. This function has ~20 calls of itself. Perhaps basic block
profiling will tell more.
P.S. The profiler shows an estimate: alias.c:get_addr is invoked ~300000000
times (that is for 1000 printf arguments).
More information about the Gcc-bugs
mailing list