int f(void); void acceptloop_th(int *t) { int options = 0; if (f()) options |= 0x1 << 0; if (f()) options |= 0x1 << 1; if (f()) options |= 0x1 << 2; if (f()) options |= 0x1 << 3; if (f()) options |= 0x1 << 4; if (f()) options |= 0x1 << 5; if (f()) options |= 0x1 << 6; if (f()) options |= 0x1 << 7; if (f()) options |= 0x1 << 8; if (f()) options |= 0x1 << 9; if (f()) options |= 0x1 << 10; if (f()) options |= 0x1 << 11; if (f()) options |= 0x1 << 12; if (f()) options |= 0x1 << 13; if (f()) options |= 0x1 << 14; if (f()) options |= 0x1 << 15; if (f()) options |= 0x1 << 16; if (f()) options |= 0x1 << 17; if (f()) options |= 0x1 << 18; if (f()) options |= 0x1 << 19; if (f()) options |= 0x1 << 20; if (f()) options |= 0x1 << 21; if (f()) options |= 0x1 << 22; if (f()) options |= 0x1 << 23; if (f()) options |= 0x1 << 24; if (f()) options |= 0x1 << 25; if (f()) options |= 0x1 << 26; *t = options; } takes many minutes to compile with 4.3. No problem with 4.2. Timing report shows PRE, and -fno-tree-pre makes it go really fast. Some timings, where the first number is the number of "if"-lines: 10 gcc -c -O3 mini2.c -DN=$n 0.17s user 0.01s system 97% cpu 0.184 total 11 gcc -c -O3 mini2.c -DN=$n 0.20s user 0.02s system 97% cpu 0.221 total 12 gcc -c -O3 mini2.c -DN=$n 0.28s user 0.01s system 95% cpu 0.306 total 13 gcc -c -O3 mini2.c -DN=$n 0.42s user 0.03s system 97% cpu 0.463 total 14 gcc -c -O3 mini2.c -DN=$n 0.76s user 0.02s system 97% cpu 0.805 total 15 gcc -c -O3 mini2.c -DN=$n 1.52s user 0.03s system 97% cpu 1.604 total 16 gcc -c -O3 mini2.c -DN=$n 3.24s user 0.07s system 97% cpu 3.396 total 17 gcc -c -O3 mini2.c -DN=$n 7.00s user 0.12s system 97% cpu 7.314 total 18 gcc -c -O3 mini2.c -DN=$n 15.01s user 0.21s system 96% cpu 15.748 total 19 gcc -c -O3 mini2.c -DN=$n 33.21s user 0.49s system 94% cpu 35.600 total 20 gcc -c -O3 mini2.c -DN=$n 76.29s user 0.87s system 96% cpu 1:19.94 total I'll also attach the original source this test case was extracted from.
Created attachment 13801 [details] Original test case
Confirmed.
Subject: Bug 32540 Author: dberlin Date: Sat Jun 30 14:15:26 2007 New Revision: 126149 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=126149 Log: 2007-06-30 Daniel Berlin <dberlin@dberlin.org> Fix PR tree-optimization/32540 Fix PR tree-optimization/31651 * tree-ssa-sccvn.c: New file. * tree-ssa-sccvn.h: Ditto. * tree-vn.c: Include tree-ssa-sccvn.h (val_expr_paid_d): Removed. (value_table): Ditto. (vn_compute): Ditto. (val_expr_pair_hash): Ditto. (val_expr_pair_expr_eq): Ditto. (copy_vuses_from_stmt): Ditto. (vn_delete): Ditto. (vn_init): Ditto. (shared_vuses_from_stmt): Ditto. (print_creation_to_file): Moved up. (sort_vuses): Ditto. (sort_vuses_heap): Ditto. (set_value_handle): Make non-static. (make_value_handle): Ditto. (vn_add): Rewritten to use sccvn lookups. (vn_add_with_vuses): Ditto. (vn_lookup): Ditto (and second argument removed). (vn_lookup_with_vuses): Ditto. (vn_lookup_or_add): Ditto (and second argument removed); (vn_lookup_or_add_with_vuses): Ditto. (vn_lookup_with_stmt): New. (vn_lookup_or_add_with_stmt): Ditto. (create_value_handle_for_expr): Ditto. * tree-ssa-pre.c: Include tree-ssa-sccvn.h. (seen_during_translate): New function. (phi_trans_lookup): Use iterative_hash_expr, not vn_compute. (phi_trans_add): Ditto. (constant_expr_p): FIELD_DECL is always constant. (phi_translate_1): Renamed from phi_translate, add seen bitmap. Use constant_expr_p. Avoid infinite recursion on mutually valued expressions. Change callers of vn_lookup_or_add. (phi_translate): New function. (compute_antic_safe): Allow phi nodes. (create_component_ref_by_pieces): Update for FIELD_DECL change. (find_or_generate_expression): Rewrite slightly. (create_expression_by_pieces): Updated for vn_lookup_or_add change. Update VN_INFO for new names. (insert_into_preds_of_block): Update for new names. (add_to_exp_gen): New function. (add_to_sets): Use vn_lookup_or_add_with_stmt. (find_existing_value_expr): Rewrite to changed vn_lookup. (create_value_expr_from): Ditto, and use add_to_exp_gen. (try_look_through_load): Removed. (try_combine_conversion): Ditto. (get_sccvn_value): New function. (make_values_for_phi): Ditto. (make_values_for_stmt): Ditto. (compute_avail): Rewritten for vn_lookup_or_add changes and to use SCCVN. (init_pre): Update for SCCVN changes. (fini_pre): Ditto. (execute_pre): Ditto. * tree-flow.h (make_value_handle): Declare. (set_value_handle): Ditto. (sort_vuses_heap): Ditto. (vn_lookup_or_add_with_stmt): Ditto. (vn_lookup_with_stmt): Ditto. (vn_compute): Remove. (vn_init): Ditto. (vn_delete): Ditto. (vn_lookup): Update arguments. * Makefile.in (tree-ssa-pre.o): Add tree-ssa-sccvn.h (tree-vn.o): Ditto. (tree-ssa-sccvn.o): New. (OBJS-common): Add tree-ssa-sccvn.o Added: trunk/gcc/testsuite/gcc.dg/tree-ssa/ssa-sccvn-1.c - copied unchanged from r125553, branches/gcc-pre-vn/gcc/testsuite/gcc.dg/tree-ssa/ssa-sccvn-1.c trunk/gcc/testsuite/gcc.dg/tree-ssa/ssa-sccvn-2.c - copied unchanged from r125553, branches/gcc-pre-vn/gcc/testsuite/gcc.dg/tree-ssa/ssa-sccvn-2.c trunk/gcc/testsuite/gcc.dg/tree-ssa/ssa-sccvn-3.c - copied unchanged from r125553, branches/gcc-pre-vn/gcc/testsuite/gcc.dg/tree-ssa/ssa-sccvn-3.c trunk/gcc/testsuite/gcc.dg/tree-ssa/ssa-sccvn-4.c - copied unchanged from r125553, branches/gcc-pre-vn/gcc/testsuite/gcc.dg/tree-ssa/ssa-sccvn-4.c trunk/gcc/tree-ssa-sccvn.c - copied, changed from r125553, branches/gcc-pre-vn/gcc/tree-ssa-sccvn.c trunk/gcc/tree-ssa-sccvn.h - copied, changed from r125553, branches/gcc-pre-vn/gcc/tree-ssa-sccvn.h Modified: trunk/gcc/ChangeLog trunk/gcc/Makefile.in trunk/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-1.c trunk/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-3.c trunk/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-4.c trunk/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-5.c trunk/gcc/tree-flow.h trunk/gcc/tree-ssa-operands.h trunk/gcc/tree-ssa-pre.c trunk/gcc/tree-vn.c
Fixed
As noted by Edvin Török, the bug is not fixed for the original test case (although it is fixed for the small test case). A small test case that still fails is int f(void); void acceptloop_th(int *t) { int options = 0; if (f()) options |= 0x1 << 0; if (f()) options |= 0x1 << 1; if (f()) options |= 0x1 << 2; if (f()) options |= 0x1 << 3; if (f()) options |= 0x1 << 4; if (f()) options |= 0x1 << 5; if (f()) options |= 0x1 << 6; if (f()) options |= 0x1 << 7; if (f()) options |= 0x1 << 8; if (f()) options |= 0x1 << 9; if (f()) options |= 0x1 << 10; if (f()) options |= 0x1 << 11; if (f()) options |= 0x1 << 12; if (f()) options |= 0x1 << 13; if (f()) options |= 0x1 << 14; if (f()) options |= 0x1 << 15; if (f()) options |= 0x1 << 16; if (f()) options |= 0x1 << 17; if (f()) options |= 0x1 << 18; if (f()) options |= 0x1 << 19; if (f()) options |= 0x1 << 20; if (f()) options |= 0x1 << 21; if (f()) options |= 0x1 << 22; if (f()) options |= 0x1 << 23; if (f()) options |= 0x1 << 24; if (f()) options |= 0x1 << 25; if (f()) options |= 0x1 << 26; if (f()) *t = options; } (the only change is that the last line is conditional).
Adding Danny to CC since this is not yet fixed.
I guess we just compute all 2**26 constants that can end up at the conditional store. And indeed, the number of 'Created value .*' in the dump matches this (modulo some constant offset). This is PPRE at work, which probably should be limited to a sub-CFG somehow.
Subject: Re: [4.3 Regression] Exponential time behavior in PRE We may just want to disable PPRE of constants entirely :) On 20 Oct 2007 10:14:53 -0000, rguenth at gcc dot gnu dot org <gcc-bugzilla@gcc.gnu.org> wrote: > > > ------- Comment #7 from rguenth at gcc dot gnu dot org 2007-10-20 10:14 ------- > I guess we just compute all 2**26 constants that can end up at the conditional > store. And indeed, the number of 'Created value .*' in the dump matches this > (modulo some constant offset). This is PPRE at work, which probably should > be limited to a sub-CFG somehow. > > > -- > > > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32540 > > ------- You are receiving this mail because: ------- > You are on the CC list for the bug, or are watching someone who is. >
Subject: Re: [4.3 Regression] Exponential time behavior in PRE On Sat, 20 Oct 2007, dberlin at dberlin dot org wrote: > ------- Comment #8 from dberlin at gcc dot gnu dot org 2007-10-20 20:49 ------- > Subject: Re: [4.3 Regression] Exponential time behavior in PRE > > We may just want to disable PPRE of constants entirely :) If you don't initialize the variable to zero we or in all the bits then the value will be not constant, but still 'var | constant' with 2**24 different constants... Richard.
That is, the following testcase: int f(void); void acceptloop_th(int *t, int options) { if (f()) options |= 0x1 << 0; if (f()) options |= 0x1 << 1; if (f()) options |= 0x1 << 2; if (f()) options |= 0x1 << 3; if (f()) options |= 0x1 << 4; if (f()) options |= 0x1 << 5; if (f()) options |= 0x1 << 6; if (f()) options |= 0x1 << 7; if (f()) options |= 0x1 << 8; if (f()) options |= 0x1 << 9; if (f()) options |= 0x1 << 10; if (f()) options |= 0x1 << 11; if (f()) options |= 0x1 << 12; if (f()) options |= 0x1 << 13; if (f()) options |= 0x1 << 14; if (f()) options |= 0x1 << 15; if (f()) options |= 0x1 << 16; if (f()) options |= 0x1 << 17; if (f()) options |= 0x1 << 18; if (f()) options |= 0x1 << 19; if (f()) options |= 0x1 << 20; if (f()) options |= 0x1 << 21; if (f()) options |= 0x1 << 22; if (f()) options |= 0x1 << 23; if (f()) options |= 0x1 << 24; if (f()) options |= 0x1 << 25; if (f()) options |= 0x1 << 26; if (f()) *t = options; } it's interesting that this one is not different from the zero-initialized options case just because PHIOPT which runs before PRE changes <bb 2>: D.1179_5 = f (); if (D.1179_5 != 0) goto <bb 3>; else goto <bb 4>; <bb 3>: <bb 4>: # options_1 = PHI <0(2), 1(3)> to <bb 2>: D.1179_5 = f (); options_1 = D.1179_5 != 0; which then causes PHI translation to not be able to figure out the constants (but create value expressions). With PHIOPT disabled we even perform PRE on this testcase ;) So, take the following modified testcase as well: int f(void); void acceptloop_th(int *t) { int options = 0; if (f()) options |= 0x1 << 1; if (f()) options |= 0x1 << 2; if (f()) options |= 0x1 << 3; if (f()) options |= 0x1 << 4; if (f()) options |= 0x1 << 5; if (f()) options |= 0x1 << 6; if (f()) options |= 0x1 << 7; if (f()) options |= 0x1 << 8; if (f()) options |= 0x1 << 9; if (f()) options |= 0x1 << 10; if (f()) options |= 0x1 << 11; if (f()) options |= 0x1 << 12; if (f()) options |= 0x1 << 13; if (f()) options |= 0x1 << 14; if (f()) options |= 0x1 << 15; if (f()) options |= 0x1 << 16; if (f()) options |= 0x1 << 17; if (f()) options |= 0x1 << 18; if (f()) options |= 0x1 << 19; if (f()) options |= 0x1 << 20; if (f()) options |= 0x1 << 21; if (f()) options |= 0x1 << 22; if (f()) options |= 0x1 << 23; if (f()) options |= 0x1 << 24; if (f()) options |= 0x1 << 25; if (f()) options |= 0x1 << 26; if (f()) *t = options; } which causes an exponential number of PHI nodes to be inserted. Like for example (shortened testcase): <bb 4>: # prephitmp.9_30 = PHI <16(13), 18(3)> # prephitmp.9_29 = PHI <24(13), 26(3)> # prephitmp.9_28 = PHI <8(13), 10(3)> # prephitmp.9_27 = PHI <20(13), 22(3)> # prephitmp.9_26 = PHI <28(13), 30(3)> # prephitmp.9_25 = PHI <12(13), 14(3)> # prephitmp.9_24 = PHI <4(13), 6(3)> # options_1 = PHI <0(13), 2(3)> # SMT.4_18 = VDEF <SMT.4_17> D.1180_8 = f (); if (D.1180_8 != 0) goto <bb 5>; else goto <bb 14>; where all the computed constants materialize! :) How interesting.
This isn't just a compile time hog, but sometimes (see PR33922) it creates many times bigger and far slower code as well.
Subject: Re: [4.3 Regression] Exponential time behavior in PRE Yes, the heuristics can sometimes generate a very large number of copies to eliminate a single redundancy. This is jsut the way the standard PRE heuristics work. If you want to try to come up with a better one, you are welcome to :) On 1 Nov 2007 21:16:45 -0000, jakub at gcc dot gnu dot org <gcc-bugzilla@gcc.gnu.org> wrote: > > > ------- Comment #11 from jakub at gcc dot gnu dot org 2007-11-01 21:16 ------- > This isn't just a compile time hog, but sometimes (see PR33922) it creates many > times bigger and far slower code as well. > > > -- > > > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32540 > > ------- You are receiving this mail because: ------- > You are on the CC list for the bug, or are watching someone who is. >
Subject: Re: [4.3 Regression] Exponential time behavior in PRE > Yes, the heuristics can sometimes generate a very large number of > copies to eliminate a single redundancy. > This is jsut the way the standard PRE heuristics work. > If you want to try to come up with a better one, you are welcome to :) > What about stopping the computation when we see that there are too many values that are anticipable? Here is a patch that restores the compile time on all the reported testcases. The constant should be a param, and the default value should be higher probably. Index: tree-ssa-pre.c =================================================================== --- tree-ssa-pre.c (revision 129775) +++ tree-ssa-pre.c (working copy) @@ -1847,6 +1847,13 @@ compute_partial_antic_aux (basic_block b if (block_has_abnormal_pred_edge) goto maybe_dump_sets; + /* If there are too many partially anticipatable values in the + block, phi_translate_set can take an exponential time: stop + before the translation starts. */ + if (single_succ_p (block) + && bitmap_count_bits (PA_IN (single_succ (block))->expressions) > 10) + goto maybe_dump_sets; + old_PA_IN = PA_IN (block); PA_OUT = bitmap_set_new ();
Subject: Re: [4.3 Regression] Exponential time behavior in PRE With the patch, compile time goes down also for PR33922.
Subject: Re: [4.3 Regression] Exponential time behavior in PRE And I just saw that there is already a patch for this bug attached unfortunately to PR32575.
Subject: Bug 32540 Author: spop Date: Mon Nov 5 15:42:30 2007 New Revision: 129901 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=129901 Log: 2007-11-05 Nick Clifton <nickc@redhat.com> Sebastian Pop <sebastian.pop@amd.com> PR tree-optimization/32540 PR tree-optimization/33922 * doc/invoke.texi: Document PARAM_MAX_PARTIAL_ANTIC_LENGTH. * tree-ssa-pre.c: Include params.h. (compute_partial_antic_aux): Use PARAM_MAX_PARTIAL_ANTIC_LENGTH to limit the maximum length of the PA set for a given block. * Makefile.in: Add a dependency upon params.h for tree-ssa-pre.c * params.def (PARAM_MAX_PARTIAL_ANTIC_LENGTH): New parameter. * gcc.dg/tree-ssa/pr32540-1.c: New. * gcc.dg/tree-ssa/pr32540-2.c: New. * gcc.dg/tree-ssa/pr33922.c: New. Added: trunk/gcc/testsuite/gcc.dg/tree-ssa/pr32540-1.c trunk/gcc/testsuite/gcc.dg/tree-ssa/pr32540-2.c trunk/gcc/testsuite/gcc.dg/tree-ssa/pr33922.c Modified: trunk/gcc/ChangeLog trunk/gcc/Makefile.in trunk/gcc/doc/invoke.texi trunk/gcc/params.def trunk/gcc/testsuite/ChangeLog trunk/gcc/tree-ssa-pre.c
Fixed.