Bug 32540

Summary: [4.3 Regression] Exponential time behavior in PRE
Product: gcc Reporter: Falk Hueffner <falk>
Component: tree-optimizationAssignee: Not yet assigned to anyone <unassigned>
Status: RESOLVED FIXED    
Severity: normal CC: dberlin, gcc-bugs, rguenth
Priority: P1 Keywords: compile-time-hog
Version: 4.3.0   
Target Milestone: 4.3.0   
Host: Target:
Build: Known to work:
Known to fail: Last reconfirmed: 2007-10-20 08:43:07
Attachments: Original test case

Description Falk Hueffner 2007-06-28 20:14:50 UTC
int f(void);
void acceptloop_th(int *t) {
    int options = 0;
    if (f()) options |= 0x1 << 0;
    if (f()) options |= 0x1 << 1;
    if (f()) options |= 0x1 << 2;
    if (f()) options |= 0x1 << 3;
    if (f()) options |= 0x1 << 4;
    if (f()) options |= 0x1 << 5;
    if (f()) options |= 0x1 << 6;
    if (f()) options |= 0x1 << 7;
    if (f()) options |= 0x1 << 8;
    if (f()) options |= 0x1 << 9;
    if (f()) options |= 0x1 << 10;
    if (f()) options |= 0x1 << 11;
    if (f()) options |= 0x1 << 12;
    if (f()) options |= 0x1 << 13;
    if (f()) options |= 0x1 << 14;
    if (f()) options |= 0x1 << 15;
    if (f()) options |= 0x1 << 16;
    if (f()) options |= 0x1 << 17;
    if (f()) options |= 0x1 << 18;
    if (f()) options |= 0x1 << 19;
    if (f()) options |= 0x1 << 20;
    if (f()) options |= 0x1 << 21;
    if (f()) options |= 0x1 << 22;
    if (f()) options |= 0x1 << 23;
    if (f()) options |= 0x1 << 24;
    if (f()) options |= 0x1 << 25;
    if (f()) options |= 0x1 << 26;
    *t = options;
}

takes many minutes to compile with 4.3. No problem with 4.2. Timing report shows PRE, and -fno-tree-pre makes it go really fast.

Some timings, where the first number is the number of "if"-lines:
10 gcc -c -O3 mini2.c -DN=$n  0.17s user 0.01s system 97% cpu 0.184 total
11 gcc -c -O3 mini2.c -DN=$n  0.20s user 0.02s system 97% cpu 0.221 total
12 gcc -c -O3 mini2.c -DN=$n  0.28s user 0.01s system 95% cpu 0.306 total
13 gcc -c -O3 mini2.c -DN=$n  0.42s user 0.03s system 97% cpu 0.463 total
14 gcc -c -O3 mini2.c -DN=$n  0.76s user 0.02s system 97% cpu 0.805 total
15 gcc -c -O3 mini2.c -DN=$n  1.52s user 0.03s system 97% cpu 1.604 total
16 gcc -c -O3 mini2.c -DN=$n  3.24s user 0.07s system 97% cpu 3.396 total
17 gcc -c -O3 mini2.c -DN=$n  7.00s user 0.12s system 97% cpu 7.314 total
18 gcc -c -O3 mini2.c -DN=$n  15.01s user 0.21s system 96% cpu 15.748 total
19 gcc -c -O3 mini2.c -DN=$n  33.21s user 0.49s system 94% cpu 35.600 total
20 gcc -c -O3 mini2.c -DN=$n  76.29s user 0.87s system 96% cpu 1:19.94 total

I'll also attach the original source this test case was extracted from.
Comment 1 Falk Hueffner 2007-06-28 20:15:57 UTC
Created attachment 13801 [details]
Original test case
Comment 2 Richard Biener 2007-06-28 20:25:29 UTC
Confirmed.
Comment 3 Daniel Berlin 2007-06-30 14:15:43 UTC
Subject: Bug 32540

Author: dberlin
Date: Sat Jun 30 14:15:26 2007
New Revision: 126149

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=126149
Log:
2007-06-30  Daniel Berlin  <dberlin@dberlin.org>
	
	Fix PR tree-optimization/32540
	Fix PR tree-optimization/31651

	* tree-ssa-sccvn.c: New file.

	* tree-ssa-sccvn.h: Ditto.
	
	* tree-vn.c: Include tree-ssa-sccvn.h
	(val_expr_paid_d): Removed.
	(value_table): Ditto.
	(vn_compute): Ditto.
	(val_expr_pair_hash): Ditto.
	(val_expr_pair_expr_eq): Ditto.
	(copy_vuses_from_stmt): Ditto.
	(vn_delete): Ditto.
	(vn_init): Ditto.
	(shared_vuses_from_stmt): Ditto.
	(print_creation_to_file): Moved up.
	(sort_vuses): Ditto.
	(sort_vuses_heap): Ditto.
	(set_value_handle): Make non-static.
	(make_value_handle): Ditto.
	(vn_add): Rewritten to use sccvn lookups.
	(vn_add_with_vuses): Ditto.
	(vn_lookup): Ditto (and second argument removed).
	(vn_lookup_with_vuses): Ditto.
	(vn_lookup_or_add): Ditto (and second argument removed);
	(vn_lookup_or_add_with_vuses): Ditto.
	(vn_lookup_with_stmt): New.
	(vn_lookup_or_add_with_stmt): Ditto.
	(create_value_handle_for_expr): Ditto.

	* tree-ssa-pre.c: Include tree-ssa-sccvn.h.
	(seen_during_translate): New function.
	(phi_trans_lookup): Use iterative_hash_expr, not vn_compute.
	(phi_trans_add): Ditto.
	(constant_expr_p): FIELD_DECL is always constant.
	(phi_translate_1): Renamed from phi_translate, add seen bitmap.
	Use constant_expr_p.
	Avoid infinite recursion on mutually valued expressions.
	Change callers of vn_lookup_or_add.
	(phi_translate): New function.
	(compute_antic_safe): Allow phi nodes.
	(create_component_ref_by_pieces): Update for FIELD_DECL change.
	(find_or_generate_expression): Rewrite slightly.
	(create_expression_by_pieces): Updated for vn_lookup_or_add
	change.
	Update VN_INFO for new names.
	(insert_into_preds_of_block): Update for new names.
	(add_to_exp_gen): New function.
	(add_to_sets): Use vn_lookup_or_add_with_stmt.
	(find_existing_value_expr): Rewrite to changed vn_lookup.
	(create_value_expr_from): Ditto, and use add_to_exp_gen.
	(try_look_through_load): Removed.
	(try_combine_conversion): Ditto.
	(get_sccvn_value): New function.
	(make_values_for_phi): Ditto.
	(make_values_for_stmt): Ditto.
	(compute_avail): Rewritten for vn_lookup_or_add changes and to use
	SCCVN.
	(init_pre): Update for SCCVN changes.
	(fini_pre): Ditto.
	(execute_pre): Ditto.

	* tree-flow.h (make_value_handle): Declare.
	(set_value_handle): Ditto.
	(sort_vuses_heap): Ditto.
	(vn_lookup_or_add_with_stmt): Ditto.
	(vn_lookup_with_stmt): Ditto.
	(vn_compute): Remove.
	(vn_init): Ditto.
	(vn_delete): Ditto.
	(vn_lookup): Update arguments.

	* Makefile.in (tree-ssa-pre.o): Add tree-ssa-sccvn.h
	(tree-vn.o): Ditto.
	(tree-ssa-sccvn.o): New.
	(OBJS-common): Add tree-ssa-sccvn.o
	


Added:
    trunk/gcc/testsuite/gcc.dg/tree-ssa/ssa-sccvn-1.c
      - copied unchanged from r125553, branches/gcc-pre-vn/gcc/testsuite/gcc.dg/tree-ssa/ssa-sccvn-1.c
    trunk/gcc/testsuite/gcc.dg/tree-ssa/ssa-sccvn-2.c
      - copied unchanged from r125553, branches/gcc-pre-vn/gcc/testsuite/gcc.dg/tree-ssa/ssa-sccvn-2.c
    trunk/gcc/testsuite/gcc.dg/tree-ssa/ssa-sccvn-3.c
      - copied unchanged from r125553, branches/gcc-pre-vn/gcc/testsuite/gcc.dg/tree-ssa/ssa-sccvn-3.c
    trunk/gcc/testsuite/gcc.dg/tree-ssa/ssa-sccvn-4.c
      - copied unchanged from r125553, branches/gcc-pre-vn/gcc/testsuite/gcc.dg/tree-ssa/ssa-sccvn-4.c
    trunk/gcc/tree-ssa-sccvn.c
      - copied, changed from r125553, branches/gcc-pre-vn/gcc/tree-ssa-sccvn.c
    trunk/gcc/tree-ssa-sccvn.h
      - copied, changed from r125553, branches/gcc-pre-vn/gcc/tree-ssa-sccvn.h
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/Makefile.in
    trunk/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-1.c
    trunk/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-3.c
    trunk/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-4.c
    trunk/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-5.c
    trunk/gcc/tree-flow.h
    trunk/gcc/tree-ssa-operands.h
    trunk/gcc/tree-ssa-pre.c
    trunk/gcc/tree-vn.c

Comment 4 Daniel Berlin 2007-06-30 14:16:53 UTC
Fixed
Comment 5 Falk Hueffner 2007-09-24 20:18:26 UTC
As noted by Edvin Török, the bug is not fixed for the original test case (although it is fixed for the small test case). A small test case that still fails is

int f(void);
void acceptloop_th(int *t) {
    int options = 0;
    if (f()) options |= 0x1 << 0;
    if (f()) options |= 0x1 << 1;
    if (f()) options |= 0x1 << 2;
    if (f()) options |= 0x1 << 3;
    if (f()) options |= 0x1 << 4;
    if (f()) options |= 0x1 << 5;
    if (f()) options |= 0x1 << 6;
    if (f()) options |= 0x1 << 7;
    if (f()) options |= 0x1 << 8;
    if (f()) options |= 0x1 << 9;
    if (f()) options |= 0x1 << 10;
    if (f()) options |= 0x1 << 11;
    if (f()) options |= 0x1 << 12;
    if (f()) options |= 0x1 << 13;
    if (f()) options |= 0x1 << 14;
    if (f()) options |= 0x1 << 15;
    if (f()) options |= 0x1 << 16;
    if (f()) options |= 0x1 << 17;
    if (f()) options |= 0x1 << 18;
    if (f()) options |= 0x1 << 19;
    if (f()) options |= 0x1 << 20;
    if (f()) options |= 0x1 << 21;
    if (f()) options |= 0x1 << 22;
    if (f()) options |= 0x1 << 23;
    if (f()) options |= 0x1 << 24;
    if (f()) options |= 0x1 << 25;
    if (f()) options |= 0x1 << 26;
    if (f()) *t = options;
}

(the only change is that the last line is conditional).
Comment 6 Martin Michlmayr 2007-10-20 08:02:52 UTC
Adding Danny to CC since this is not yet fixed.
Comment 7 Richard Biener 2007-10-20 10:14:53 UTC
I guess we just compute all 2**26 constants that can end up at the conditional
store.  And indeed, the number of 'Created value .*' in the dump matches this
(modulo some constant offset).  This is PPRE at work, which probably should
be limited to a sub-CFG somehow.
Comment 8 Daniel Berlin 2007-10-20 20:49:42 UTC
Subject: Re:  [4.3 Regression] Exponential time behavior in PRE

We may just want to disable PPRE of constants entirely :)


On 20 Oct 2007 10:14:53 -0000, rguenth at gcc dot gnu dot org
<gcc-bugzilla@gcc.gnu.org> wrote:
>
>
> ------- Comment #7 from rguenth at gcc dot gnu dot org  2007-10-20 10:14 -------
> I guess we just compute all 2**26 constants that can end up at the conditional
> store.  And indeed, the number of 'Created value .*' in the dump matches this
> (modulo some constant offset).  This is PPRE at work, which probably should
> be limited to a sub-CFG somehow.
>
>
> --
>
>
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32540
>
> ------- You are receiving this mail because: -------
> You are on the CC list for the bug, or are watching someone who is.
>
Comment 9 rguenther@suse.de 2007-10-20 20:52:00 UTC
Subject: Re:  [4.3 Regression] Exponential time
 behavior in PRE

On Sat, 20 Oct 2007, dberlin at dberlin dot org wrote:

> ------- Comment #8 from dberlin at gcc dot gnu dot org  2007-10-20 20:49 -------
> Subject: Re:  [4.3 Regression] Exponential time behavior in PRE
> 
> We may just want to disable PPRE of constants entirely :)

If you don't initialize the variable to zero we or in all the bits then
the value will be not constant, but still 'var | constant' with 2**24
different constants...

Richard.
Comment 10 Richard Biener 2007-11-01 15:26:31 UTC
That is, the following testcase:

int f(void);
void acceptloop_th(int *t, int options) {
    if (f()) options |= 0x1 << 0;
    if (f()) options |= 0x1 << 1;
    if (f()) options |= 0x1 << 2;
    if (f()) options |= 0x1 << 3;
    if (f()) options |= 0x1 << 4;
    if (f()) options |= 0x1 << 5;
    if (f()) options |= 0x1 << 6;
    if (f()) options |= 0x1 << 7;
    if (f()) options |= 0x1 << 8;
    if (f()) options |= 0x1 << 9;
    if (f()) options |= 0x1 << 10;
    if (f()) options |= 0x1 << 11;
    if (f()) options |= 0x1 << 12;
    if (f()) options |= 0x1 << 13;
    if (f()) options |= 0x1 << 14;
    if (f()) options |= 0x1 << 15;
    if (f()) options |= 0x1 << 16;
    if (f()) options |= 0x1 << 17;
    if (f()) options |= 0x1 << 18;
    if (f()) options |= 0x1 << 19;
    if (f()) options |= 0x1 << 20;
    if (f()) options |= 0x1 << 21;
    if (f()) options |= 0x1 << 22;
    if (f()) options |= 0x1 << 23;
    if (f()) options |= 0x1 << 24;
    if (f()) options |= 0x1 << 25;
    if (f()) options |= 0x1 << 26;
    if (f()) *t = options;
}

it's interesting that this one is not different from the zero-initialized
options case just because PHIOPT which runs before PRE changes

<bb 2>:
  D.1179_5 = f ();
  if (D.1179_5 != 0)
    goto <bb 3>;
  else
    goto <bb 4>;

<bb 3>:

<bb 4>:
  # options_1 = PHI <0(2), 1(3)>

to

<bb 2>:
  D.1179_5 = f ();
  options_1 = D.1179_5 != 0;

which then causes PHI translation to not be able to figure out the constants
(but create value expressions).  With PHIOPT disabled we even perform PRE
on this testcase ;)  So, take the following modified testcase as well:

int f(void);
void acceptloop_th(int *t) {
    int options = 0;
    if (f()) options |= 0x1 << 1;
    if (f()) options |= 0x1 << 2;
    if (f()) options |= 0x1 << 3;
    if (f()) options |= 0x1 << 4;
    if (f()) options |= 0x1 << 5;
    if (f()) options |= 0x1 << 6;
    if (f()) options |= 0x1 << 7;
    if (f()) options |= 0x1 << 8;
    if (f()) options |= 0x1 << 9;
    if (f()) options |= 0x1 << 10;
    if (f()) options |= 0x1 << 11;
    if (f()) options |= 0x1 << 12;
    if (f()) options |= 0x1 << 13;
    if (f()) options |= 0x1 << 14;
    if (f()) options |= 0x1 << 15;
    if (f()) options |= 0x1 << 16;
    if (f()) options |= 0x1 << 17;
    if (f()) options |= 0x1 << 18;
    if (f()) options |= 0x1 << 19;
    if (f()) options |= 0x1 << 20;
    if (f()) options |= 0x1 << 21;
    if (f()) options |= 0x1 << 22;
    if (f()) options |= 0x1 << 23;
    if (f()) options |= 0x1 << 24;
    if (f()) options |= 0x1 << 25;
    if (f()) options |= 0x1 << 26;
    if (f()) *t = options;
}

which causes an exponential number of PHI nodes to be inserted.  Like for
example (shortened testcase):

<bb 4>:
  # prephitmp.9_30 = PHI <16(13), 18(3)>
  # prephitmp.9_29 = PHI <24(13), 26(3)>
  # prephitmp.9_28 = PHI <8(13), 10(3)>
  # prephitmp.9_27 = PHI <20(13), 22(3)>
  # prephitmp.9_26 = PHI <28(13), 30(3)>
  # prephitmp.9_25 = PHI <12(13), 14(3)>
  # prephitmp.9_24 = PHI <4(13), 6(3)>
  # options_1 = PHI <0(13), 2(3)>
  # SMT.4_18 = VDEF <SMT.4_17>
  D.1180_8 = f ();
  if (D.1180_8 != 0)
    goto <bb 5>;
  else
    goto <bb 14>;

where all the computed constants materialize! :)

How interesting.
Comment 11 Jakub Jelinek 2007-11-01 21:16:45 UTC
This isn't just a compile time hog, but sometimes (see PR33922) it creates many times bigger and far slower code as well.
Comment 12 Daniel Berlin 2007-11-01 21:24:20 UTC
Subject: Re:  [4.3 Regression] Exponential time behavior in PRE

Yes, the heuristics can sometimes generate a very large number of
copies to eliminate a single redundancy.
This is jsut the way the standard PRE heuristics work.
If you want to try to come up with a better one, you are welcome to :)


On 1 Nov 2007 21:16:45 -0000, jakub at gcc dot gnu dot org
<gcc-bugzilla@gcc.gnu.org> wrote:
>
>
> ------- Comment #11 from jakub at gcc dot gnu dot org  2007-11-01 21:16 -------
> This isn't just a compile time hog, but sometimes (see PR33922) it creates many
> times bigger and far slower code as well.
>
>
> --
>
>
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32540
>
> ------- You are receiving this mail because: -------
> You are on the CC list for the bug, or are watching someone who is.
>
Comment 13 sebpop@gmail.com 2007-11-03 05:19:06 UTC
Subject: Re:  [4.3 Regression] Exponential time behavior in PRE

> Yes, the heuristics can sometimes generate a very large number of
> copies to eliminate a single redundancy.
> This is jsut the way the standard PRE heuristics work.
> If you want to try to come up with a better one, you are welcome to :)
>

What about stopping the computation when we see that there are too
many values that are anticipable?  Here is a patch that restores the
compile time on all the reported testcases.  The constant should be a
param, and the default value should be higher probably.

Index: tree-ssa-pre.c
===================================================================
--- tree-ssa-pre.c	(revision 129775)
+++ tree-ssa-pre.c	(working copy)
@@ -1847,6 +1847,13 @@ compute_partial_antic_aux (basic_block b
   if (block_has_abnormal_pred_edge)
     goto maybe_dump_sets;

+  /* If there are too many partially anticipatable values in the
+     block, phi_translate_set can take an exponential time: stop
+     before the translation starts.  */
+  if (single_succ_p (block)
+      && bitmap_count_bits (PA_IN (single_succ (block))->expressions) > 10)
+    goto maybe_dump_sets;
+
   old_PA_IN = PA_IN (block);
   PA_OUT = bitmap_set_new ();
Comment 14 sebpop@gmail.com 2007-11-03 05:26:13 UTC
Subject: Re:  [4.3 Regression] Exponential time behavior in PRE

With the patch, compile time goes down also for PR33922.
Comment 15 sebpop@gmail.com 2007-11-03 05:54:42 UTC
Subject: Re:  [4.3 Regression] Exponential time behavior in PRE

And I just saw that there is already a patch for this bug attached
unfortunately to PR32575.
Comment 16 Sebastian Pop 2007-11-05 15:42:38 UTC
Subject: Bug 32540

Author: spop
Date: Mon Nov  5 15:42:30 2007
New Revision: 129901

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=129901
Log:
2007-11-05  Nick Clifton  <nickc@redhat.com>
	    Sebastian Pop  <sebastian.pop@amd.com>

	PR tree-optimization/32540
	PR tree-optimization/33922
	* doc/invoke.texi: Document PARAM_MAX_PARTIAL_ANTIC_LENGTH.
	* tree-ssa-pre.c: Include params.h.
	(compute_partial_antic_aux): Use PARAM_MAX_PARTIAL_ANTIC_LENGTH
	to limit the maximum length of the PA set for a given block.
	* Makefile.in: Add a dependency upon params.h for tree-ssa-pre.c
	* params.def (PARAM_MAX_PARTIAL_ANTIC_LENGTH): New parameter.

	* gcc.dg/tree-ssa/pr32540-1.c: New.
	* gcc.dg/tree-ssa/pr32540-2.c: New.
	* gcc.dg/tree-ssa/pr33922.c: New.


Added:
    trunk/gcc/testsuite/gcc.dg/tree-ssa/pr32540-1.c
    trunk/gcc/testsuite/gcc.dg/tree-ssa/pr32540-2.c
    trunk/gcc/testsuite/gcc.dg/tree-ssa/pr33922.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/Makefile.in
    trunk/gcc/doc/invoke.texi
    trunk/gcc/params.def
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/tree-ssa-pre.c

Comment 17 Sebastian Pop 2007-11-05 15:44:58 UTC
Fixed.