Created attachment 33409 [details]
For the attached testcase (from GHC) PRE blows up memory and compile-time wise
at -O2, -O2 -fno-tree-pre and -O1 are fine.
Usual suspect is AVAIL_OUT which was only eliminated for FRE.
Date: Fri Aug 29 12:39:50 2014
New Revision: 214727
2014-08-29 Richard Biener <firstname.lastname@example.org>
* tree-ssa-pre.c (sorted_array_from_bitmap_set): Reserve
exactly the vector size needed and use quick_push.
(phi_translate_1): Adjust comment.
(valid_in_sets): Remove block argument and remove pointless
checking of NAMEs.
(dependent_clean): Adjust for removal of block argument.
As said in the mail for the just applied patch:
"The second patch (once done) will refactor insertion phase
to do a dominator walk similar to what elimination does
computing AVAIL_OUT on-the-fly (and only keeping it live
for a single block). It's a bit more complicated than
for elimination as insertion looks at a block and all its
predecessors, it's also a bit more costly as we iterate
insertion. It's also more costly because computing
AVAIL_OUT on-the-fly requires looking at all statements
which insertion currently doesn't do at all.
So I've not yet settled on a final idea how to best re-factor it,
eventually insertion and elimination can be merged (and then
Ideas welcome ;)"
So what remains is insertion computing whether a value is available on some
predecessor (partial redundancy) or all (full redundancy) which needs AVAIL_OUT
for the predecessors and the immediate dominator.
Then the actual insertion code needs to lookup leaders for expression operands
it wants to insert. I think this part can be easily delayed until eliminate
if we store all necessary information in some new data structure.
The first part is harder - especially as it is that what needs to be iterated
(with taking into account newly inserted stuff). One could imagine
iterating over sucessors of a block (and immediate dominated-bys) and compute
"candidate" sets, both pruning and extending it during the iteration.
Btw, the testcase is absymally slow in compiling even with -O1 if you enable
glibc memory checking with both(!) MALLOC_CHECK_=3 and MALLOC_PERTURB_=69
I have no idea why... (constant overhead seems to be on the order of a
factor of tree for pure malloc/free/realloc workload, but compiling
the testcase goes from 30s to infinity - aborted after an hour).
We certainly do some stupid reallocs (c.f. vec::safe_grow[_cleared]
doing a reserve_exact and thus some vectors grow element-by-element).