Related to PR 15524 but unlike that case where we are the same compile time for that test, this one we are three times slower. #define CL0(a) case a: b=a+1; goto c; #define CL1(a) CL0(a##0) CL0(a##1) CL0(a##2) CL0(a##3) CL0(a##4) CL0(a##5) \ CL0(a##6) CL0(a##7) CL0(a##8) CL0(a##9) #define CL2(a) CL1(a##0) CL1(a##1) CL1(a##2) CL1(a##3) CL1(a##4) CL1(a##5) \ CL1(a##6) CL1(a##7) CL1(a##8) CL1(a##9) #define CL3(a) CL2(a##0) CL2(a##1) CL2(a##2) CL2(a##3) CL2(a##4) CL2(a##5) \ CL2(a##6) CL2(a##7) CL2(a##8) CL2(a##9) void f(int b); void a(int b) { c: switch (b) { CL3(1) CL3(2) } }
Most (95%) of the time is spent updating the dominator information.
The loop which is taking the time is tree-cfg.c:4085-4099.
If you call free_dominance_info in cleanup_tree_cfg.c right before calling thread_jumps, the compile time problem basically vanishes.
Zdenek since you added the code to thread_jumps to incremental update the DOM information, could you comment on this bug? (Note I have seen this in more than just this testcase which is superficial). I almost want to say if we reach a threshold to just free the DOM information since it is no longer profitable to do the incremental update.
A better way than my comment #3 would be to call free_dominance_info when the worklist in thread_jumps is nonempty (or contains more elements than some threshold as Andrew Pinski has suggested).
The code in this testcase looks basically this way: c: switch (b) { case 1: b = 2; goto c; case 2: b = 3; goto c; ... } What happens is that dom peforms jump threading over the switch, thus turning the code into switch (b) { case 1: goto l1; case 2: goto l2; case 3: goto l3; } l1: b = ...; goto l2; l2: b = ...; goto l3; l3: b = ...; goto l4; ... The assignments to b get removed in dce, and cfgcleanup jump starts propagating the edges from the switch statement over the chain of thousands of empty blocks. I.e. the first of them is threaded over 1 block, next over 2, ..., the last over 2000, resulting in a quadratic behavior by itself. The other source for quadratic behavior in this case indeed is updating of dominators (concretely the recount_dominators step for the last basic block in the chain -- its degree increases by one in each step, and recount_dominator is by itself linear in number of predecessors of a block. I will try to come up with something for at least the later problem. Throwing the dominance information away is of course an option, but I think we should be able to avoid it easily.
Patch: http://gcc.gnu.org/ml/gcc-patches/2004-11/msg02089.html
Subject: Bug 18601 CVSROOT: /cvs/gcc Module name: gcc Changes by: kazu@gcc.gnu.org 2004-12-06 20:22:02 Modified files: gcc : ChangeLog tree-cfg.c tree-flow.h Log message: PR tree-optimization/18601 * tree-cfg.c (thread_jumps, thread_jumps_from_bb): Removed. (tree_forwarder_block_p): Do not consider blocks that are its own successors forwarders. (cleanup_forwarder_blocks, remove_forwarder_block): New functions. (cleanup_tree_cfg): Use cleanup_forwarder_blocks instead of thread_jumps. * tree-flow.h (bb_ann_d): Remove forwardable. Patches: http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/ChangeLog.diff?cvsroot=gcc&r1=2.6726&r2=2.6727 http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/tree-cfg.c.diff?cvsroot=gcc&r1=2.132&r2=2.133 http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/tree-flow.h.diff?cvsroot=gcc&r1=2.73&r2=2.74
Just checked in a patch.