18601 – [4.0 Regression] tree cfglceanup is slow

Bug 18601 - [4.0 Regression] tree cfglceanup is slow

Summary: [4.0 Regression] tree cfglceanup is slow

Status:	RESOLVED FIXED

Alias:	None

Product:	gcc
Classification:	Unclassified
Component:	tree-optimization (show other bugs)
Version:	4.0.0

Importance:	P2 normal
Target Milestone:	4.0.0
Assignee:	Zdenek Dvorak

URL:
Keywords:	compile-time-hog, patch

Depends on:
Blocks:

Reported:	2004-11-21 21:08 UTC by Andrew Pinski
Modified:	2004-12-06 20:23 UTC (History)
CC List:	2 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed:	2004-11-21 21:42:35

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Andrew Pinski 2004-11-21 21:08:54 UTC

Related to PR 15524 but unlike that case where we are the same compile time for that test, this one we 
are three times slower.
#define CL0(a) case a: b=a+1;  goto c;
 #define CL1(a) CL0(a##0) CL0(a##1) CL0(a##2) CL0(a##3) CL0(a##4) CL0(a##5) \
 CL0(a##6) CL0(a##7) CL0(a##8) CL0(a##9)
 #define CL2(a) CL1(a##0) CL1(a##1) CL1(a##2) CL1(a##3) CL1(a##4) CL1(a##5) \
 CL1(a##6) CL1(a##7) CL1(a##8) CL1(a##9)
 #define CL3(a) CL2(a##0) CL2(a##1) CL2(a##2) CL2(a##3) CL2(a##4) CL2(a##5) \
 CL2(a##6) CL2(a##7) CL2(a##8) CL2(a##9)

 void f(int b);

 void a(int b) {
  c: switch (b) {
         CL3(1)
         CL3(2)
     }
 }

Comment 1 Andrew Pinski 2004-11-21 21:15:26 UTC

Most (95%) of the time is spent updating the dominator information.

Comment 2 Andrew Pinski 2004-11-21 21:16:37 UTC

The loop which is taking the time is tree-cfg.c:4085-4099.

Comment 3 Kazu Hirata 2004-11-21 21:43:25 UTC

If you call free_dominance_info in cleanup_tree_cfg.c right before
calling thread_jumps, the compile time problem basically vanishes.

Comment 4 Andrew Pinski 2004-11-22 06:39:29 UTC

Zdenek since you added the code to thread_jumps to incremental update the DOM information, could 
you comment on this bug?

(Note I have seen this in more than just this testcase which is superficial).

I almost want to say if we reach a threshold to just free the DOM information since it is no longer 
profitable to do the incremental update.

Comment 5 Kazu Hirata 2004-11-22 14:38:52 UTC

A better way than my comment #3 would be to call free_dominance_info
when the worklist in thread_jumps is nonempty (or contains more elements
than some threshold as Andrew Pinski has suggested).

Comment 6 Zdenek Dvorak 2004-11-22 16:45:51 UTC

The code in this testcase looks basically this way:

c:
switch (b)
  {
    case 1:
      b = 2; goto c;
    case 2:
      b = 3; goto c;
    ...
  }

What happens is that dom peforms jump threading over the switch, thus turning
the code into

switch (b)
  {
    case 1: goto l1;
    case 2: goto l2;
    case 3: goto l3;
  }

l1: b = ...; goto l2;
l2: b = ...; goto l3;
l3: b = ...; goto l4;
...

The assignments to b get removed in dce, and cfgcleanup jump starts propagating
the edges from the switch statement over the chain of thousands of empty blocks.
I.e. the first of them is threaded over 1 block, next over 2, ..., the last over
2000, resulting in a quadratic behavior by itself.

The other source for quadratic behavior in this case indeed is updating of
dominators (concretely the recount_dominators step for the last basic block
in the chain -- its degree increases by one in each step, and
recount_dominator is by itself linear in number of predecessors of a block.
I will try to come up with something for at least the later problem.  Throwing
the dominance information away is of course an option, but I think we should be
able to avoid it easily.

Comment 7 Zdenek Dvorak 2004-11-25 01:02:14 UTC

Patch:

http://gcc.gnu.org/ml/gcc-patches/2004-11/msg02089.html

Comment 8 GCC Commits 2004-12-06 20:22:09 UTC

Subject: Bug 18601

CVSROOT:	/cvs/gcc
Module name:	gcc
Changes by:	kazu@gcc.gnu.org	2004-12-06 20:22:02

Modified files:
	gcc            : ChangeLog tree-cfg.c tree-flow.h 

Log message:
	PR tree-optimization/18601
	* tree-cfg.c (thread_jumps, thread_jumps_from_bb): Removed.
	(tree_forwarder_block_p): Do not consider blocks that are its own
	successors forwarders.
	(cleanup_forwarder_blocks, remove_forwarder_block): New functions.
	(cleanup_tree_cfg): Use cleanup_forwarder_blocks instead of
	thread_jumps.
	* tree-flow.h (bb_ann_d): Remove forwardable.

Patches:
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/ChangeLog.diff?cvsroot=gcc&r1=2.6726&r2=2.6727
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/tree-cfg.c.diff?cvsroot=gcc&r1=2.132&r2=2.133
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/tree-flow.h.diff?cvsroot=gcc&r1=2.73&r2=2.74

Comment 9 Kazu Hirata 2004-12-06 20:23:13 UTC

Just checked in a patch.