User account creation filtered due to spam.

Bug 23286 - Missed code hoisting optimization
Missed code hoisting optimization
 Status: RESOLVED FIXED None gcc Unclassified tree-optimization (show other bugs) 4.1.0 P2 enhancement 7.0 Not yet assigned to anyone missed-optimization (view as bug list) 5738 15559 11832 21485 24568 29144 33315 21676 24001 33828 67879 70159 Show dependency tree / graph

 Reported: 2005-08-08 17:46 UTC by Paolo Bonzini 2016-07-12 13:32 UTC (History) 17 users (show) aldot astrange carrot d.g.gorbachev dann dberlin dimhen gcc-bugs glisse icera-sdkteam-gnu jingyu paulo pmike2001 rakdver rguenth steven xinliangli 4.1.0 2006-01-28 00:49:03

Attachments
Proof-of-concept patch (1.44 KB, patch)
2008-11-23 13:07 UTC, Steven Bosscher
Details | Diff
less unpolished version of tree code hoisting (3.74 KB, patch)
2008-11-27 15:26 UTC, Steven Bosscher
Details | Diff
patch to implement code hoisting in tree-ssa-pre.c (7.79 KB, patch)
2008-12-01 22:00 UTC, Steven Bosscher
Details | Diff
updated patch (6.90 KB, patch)
2010-01-17 12:34 UTC, Richard Biener
Details | Diff
forward ported patch (7.29 KB, patch)
2012-02-15 15:55 UTC, Richard Biener
Details | Diff

 Note You need to log in before you can comment on or make changes to this bug.
 Paolo Bonzini 2005-08-08 17:46:32 UTC ```In this code, a <<= 1 is fully redundant unsigned short f(unsigned short a) { if (a & 0x8000) a <<= 1, a = a ^ 0x1021; else a <<= 1; return a; } the body should be turned into unsigned short f(unsigned short a) { unsigned short b = a << 1; if (a & 0x8000) a = b, a = a ^ 0x1021; else a = b; return a; } or something similar. However PRE leaves the GIMPLE unchanged: f (a) { int D.1267; short int a.0; : a.0_3 = (short int) a_2; if (a.0_3 < 0) goto ; else goto ; :; a_7 = a_2 << 1; a_8 = a_7 ^ 4129; goto (); :; a_6 = a_2 << 1; # a_1 = PHI ; :; D.1267_4 = (int) a_1; return D.1267_4; } On PPC, this is only caught after reload. Paolo``` Andrew Pinski 2005-08-08 18:42:32 UTC ```Confirmed, for some reason the following is caught though: unsigned short f(unsigned short a) { unsigned short b = a <<1; if (a & 0x8000) a <<= 1, a = a ^ 0x1021; else a = b; return a; } as a<<1 is caught being redundant. FRE catches the above on the mainline. Hmm there is only one VH for the expression: Created value VH.0 for a_2 Created value VH.1 for (short int) VH.0 Created value VH.2 for VH.0 << 1 Created value VH.3 for VH.2 ^ 4129 Created value VH.4 for a_1 Created value VH.5 for (int) VH.4 Created value VH.6 for _5 exp_gen[-1] := { } tmp_gen[-1] := { a_2 (VH.0) } avail_out[-1] := { a_2 (VH.0) } exp_gen[0] := { a_2 (VH.0) , (short int) VH.0 (VH.1) } tmp_gen[0] := { a.0_3 (VH.1) } avail_out[0] := { a_2 (VH.0) , a.0_3 (VH.1) } exp_gen[1] := { a_2 (VH.0) , VH.0 << 1 (VH.2) , VH.2 ^ 4129 (VH.3) } tmp_gen[1] := { a_7 (VH.2) , a_8 (VH.3) } avail_out[1] := { a_2 (VH.0) , a.0_3 (VH.1) , a_7 (VH.2) , a_8 (VH.3) } exp_gen[2] := { a_2 (VH.0) , VH.0 << 1 (VH.2) } tmp_gen[2] := { a_6 (VH.2) } avail_out[2] := { a_2 (VH.0) , a.0_3 (VH.1) , a_6 (VH.2) } exp_gen[3] := { a_1 (VH.4) , (int) VH.4 (VH.5) } tmp_gen[3] := { D.1280_4 (VH.5) , _5 (VH.6) } avail_out[3] := { a_1 (VH.4) , a_2 (VH.0) , a.0_3 (VH.1) , D.1280_4 (VH.5) , _5 (VH.6) } exp_gen[-2] := { } tmp_gen[-2] := { } avail_out[-2] := { } ``` Daniel Berlin 2005-08-08 18:53:34 UTC ```(In reply to comment #1) > Confirmed, for some reason the following is caught though: because it's a fully redundant expression, that is available when we go to eliminate, as opposed to the original, which would require hoisting, and is *not* partially redundant. > Hmm there is only one VH for the expression: As one would expect. This is not a missed optimization for PRE. PRE is not a generic hoister. In fact, in this case, it doesn't actually save anything but size to perform this "optimization". All it will do is make b live over the use of a, adding another register to allocate that has a conflict. I don't see this as a bug at all, except maybe at -Os. Even then, it's a a hoisting issue, using very busy expressions, not a PRE issue.``` Daniel Berlin 2005-08-08 18:56:31 UTC ```(In reply to comment #1) > Confirmed, for some reason the following is caught though: If you thought about it, you'd notice that your example below has two computations of a = b along the if branch, which is not optimal. The original has one computation on each path which is already optimal. > unsigned short f(unsigned short a) > { > unsigned short b = a <<1; > if (a & 0x8000) > a <<= 1, a = a ^ 0x1021; > else > a = b; > > return a; > } ``` Andrew Pinski 2005-08-08 19:34:21 UTC `This is basically PR 15559 which was marked as invalid.` Andrew Pinski 2005-08-08 19:36:43 UTC ```and a dup of bug 5738 really. I almost want to close this fully as a dup of bug 5738.``` Andrew Pinski 2005-08-08 19:54:23 UTC ```Here is a stupid testcase which can be sped up by pulling the reduandant expressions up: int ii; static inline int f(int i, int ii) { return i/ ii; } int h(int) __attribute__((pure,const)); int g(int i) { int j, j1 = i; for (j = 0; j <1000; j++) { int ii1 = ii; if (h(j)) i = j1/ii1 + 2; else i = j1/ii1; } return i; } As we then pull the division before the loop. As I said this was stupid as this was made up and I don't know how often this happens in real life.``` Daniel Berlin 2005-08-08 20:29:40 UTC ```Subject: Re: missed fully redundant expression On Mon, 2005-08-08 at 19:54 +0000, pinskia at gcc dot gnu dot org wrote: > ------- Additional Comments From pinskia at gcc dot gnu dot org 2005-08-08 19:54 ------- > Here is a stupid testcase which can be sped up by pulling the reduandant expressions up: > int ii; > static inline int f(int i, int ii) > { > return i/ ii; > } > > int h(int) __attribute__((pure,const)); > > int g(int i) > { > int j, j1 = i; > for (j = 0; j <1000; j++) > { > int ii1 = ii; > if (h(j)) > i = j1/ii1 + 2; > else > i = j1/ii1; > } > return i; > } > Yes, if we hoisted this, we would then pull it out of the loop. ``` Daniel Berlin 2005-08-08 20:40:02 UTC ```Subject: Re: missed fully redundant expression On Mon, 2005-08-08 at 19:54 +0000, pinskia at gcc dot gnu dot org wrote: > ------- Additional Comments From pinskia at gcc dot gnu dot org 2005-08-08 19:54 ------- > Here is a stupid testcase which can be sped up by pulling the reduandant expressions up: > int ii; > static inline int f(int i, int ii) > { > return i/ ii; > } > > int h(int) __attribute__((pure,const)); > > int g(int i) > { > int j, j1 = i; > for (j = 0; j <1000; j++) > { > int ii1 = ii; > if (h(j)) > i = j1/ii1 + 2; > else > i = j1/ii1; > } > return i; > } > You are also going to have to hoist the ii1 = ii because it has a vuse. ``` Andrew Pinski 2005-08-29 12:33:15 UTC `*** Bug 23619 has been marked as a duplicate of this bug. ***` Andrew Pinski 2005-10-28 18:43:09 UTC ```Here is another example which comes from PR 24568 (which really comes from thedailywtf): int f(int i) { if (i < 0) return i/10+ i; return i/10 + i; } int f2(int i) { return i/10 + i; }``` Steven Bosscher 2007-10-10 22:59:32 UTC `*** Bug 32590 has been marked as a duplicate of this bug. ***` Steven Bosscher 2007-10-16 13:33:42 UTC `Does not really "block" 24001, but the test case for that bug would be fixed if code hoisting would be implemented properly.` Alexander Strange 2007-10-26 04:08:54 UTC `The closed #32590 is a 4.3 regression, while this is only enhancement.` Andrew Pinski 2008-02-23 05:22:43 UTC `*** Bug 35303 has been marked as a duplicate of this bug. ***` Steven Bosscher 2008-11-21 06:41:50 UTC `*** Bug 38204 has been marked as a duplicate of this bug. ***` Steven Bosscher 2008-11-23 13:07:55 UTC ```Created attachment 16751 [details] Proof-of-concept patch It is not terribly complicated to add hoisting to tree-ssa-pre.c. I have attached the result of roughly half a day of hacking. It is a proof-of-concept patch that demonstrates how hoisting could be added. It's not finished, tested, or even correct ;-) but it shows the general idea. For a "production-quality" version, one would have to make sure not to hoist expressions up too far (e.g. not further up than above the first block that generates the expression).``` Steven Bosscher 2008-11-23 13:11:06 UTC ```For the test case of comment #0, the proof-of-concept patch does the following in the .084t.pre dump (relevant excerpts only): VBEOUT[2] := { {lshift_expr,a_2(D),1} (0004) } Inserting expression 5 into AVAIL_OUT[2] Inserted pretmp.12_8 = a_2(D) << 1; in predecessor 2 AVAIL_OUT[2] := { a_2(D) (0002), a.0_3 (0003), pretmp.12_8 (0004) } Successor 1 not dominated by 5 -> empty set VBEOUT[5] := { } AVAIL_OUT[5] := { a_2(D) (0002), a.0_3 (0003), a_1 (0001), pretmp.12_8 (0004) } Successor 5 not dominated by 4 -> empty set VBEOUT[4] := { } AVAIL_OUT[4] := { a_2(D) (0002), a.0_3 (0003), pretmp.12_8 (0004) } Successor 5 not dominated by 3 -> empty set VBEOUT[3] := { } AVAIL_OUT[3] := { a_2(D) (0002), a.0_3 (0003), a_5 (0005), pretmp.12_8 (0004) } Replaced a_2(D) << 1 with pretmp.12_8 in a_4 = a_2(D) << 1; Replaced a_2(D) << 1 with pretmp.12_8 in a_6 = a_2(D) << 1; ... f (short unsigned int a) { short unsigned int pretmp.12; short int a.0; : a.0_3 = (short int) a_2(D); pretmp.12_8 = a_2(D) << 1; if (a.0_3 < 0) goto ; else goto ; : a_4 = pretmp.12_8; a_5 = a_4 ^ 4129; goto ; : a_6 = pretmp.12_8; : # a_1 = PHI return a_1; } which eventually leads to the following in the .126t.final_cleanup dump: ;; Function f (f) f (short unsigned int a) { short unsigned int a.22; : a.22 = a << 1; if ((short int) a < 0) goto ; else goto ; : a.22 = a.22 ^ 4129; : return a.22; } ``` Steven Bosscher 2008-11-23 13:22:44 UTC ```The test case of PR38204 shows one of the problems with proof-of-concept patch, namely the "don't move up too much" problem. The .pre dump looks like this: test (int a, int b, int c, int g) { int pretmp.11; int e; int d; int D.1257; int D.1256; : pretmp.11_11 = b_3(D) * c_4(D); pretmp.11_12 = g_8(D) + pretmp.11_11; // No need to move this up to here. if (a_2(D) != 0) goto ; else goto ; : d_5 = pretmp.11_11; goto ; : d_6 = b_3(D) - c_4(D); : # d_1 = PHI D.1256_7 = pretmp.11_11; e_9 = pretmp.11_12; D.1257_10 = e_9 + d_1; return D.1257_10; } Eventually this gives (in the .final_cleanup dump): ;; Function test (test) test (int a, int b, int c, int g) { int d.21; int d; : d.21 = c * b; if (a != 0) goto ; else goto ; : d = d.21; goto ; : d = b - c; : return (d.21 + g) + d; } ``` Richard Biener 2008-11-23 13:32:25 UTC `Nice.` Richard Biener 2008-11-23 13:43:47 UTC ```We also need to make sure not to do hoisting where we should do sinking like for int foo(int b, int i) { int res; if (b) res = i + 1; else res = i + 1; return res; } (add some more code that shows the increased life-range of res would hurt) I guess sinking doesn't really fit the PRE framework.``` Steven Bosscher 2008-11-23 14:20:42 UTC `I'll work on something that bootstraps and passes testing. But cost-related decisions (like the one from comment #20) are not on my TODO list right now. The pass that should do this is called sched1 ;-)` Daniel Berlin 2008-11-23 18:30:27 UTC ```Subject: Re: missed fully redundant expression Sinking fits into the reverse framework. Apparently the SSUPRE person plans on submitting when 4.5 opens, and you can fit sinking frameworks into there. On Sun, Nov 23, 2008 at 8:43 AM, rguenth at gcc dot gnu dot org wrote: > > > ------- Comment #20 from rguenth at gcc dot gnu dot org 2008-11-23 13:43 ------- > We also need to make sure not to do hoisting where we should do sinking like > for > > int foo(int b, int i) > { > int res; > if (b) > res = i + 1; > else > res = i + 1; > return res; > } > > (add some more code that shows the increased life-range of res would hurt) > > I guess sinking doesn't really fit the PRE framework. > > > -- > > > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23286 > > ------- You are receiving this mail because: ------- > You are on the CC list for the bug, or are watching someone who is. > ``` Steven Bosscher 2008-11-27 15:26:20 UTC ```Created attachment 16784 [details] less unpolished version of tree code hoisting This fixes some bugs in the proof-of-concept patch: 1. Don't hoist things out of loops. We have to look at ANTIC_IN of the block we hoist into, not the union of ANTIC_IN of all its successors. 2. Add a constraint that the hoisted expression must be computed in one of the successors of the block we're hoisting too. Also added comment so that the uninitiated don't have their eye balls dropping out in confusion ;-) I might actually post this for GCC 4.5 if it subsumes the RTL code hoisting pass.``` Steven Bosscher 2008-12-01 22:00:50 UTC ```Created attachment 16803 [details] patch to implement code hoisting in tree-ssa-pre.c This passes bootstrap+testing on ia64-linux and amd64-linux. It causes a few vectorizer failures (4 on ia64) that I have not analyzed yet.``` Steven Bosscher 2009-08-04 09:05:53 UTC `*** Bug 40956 has been marked as a duplicate of this bug. ***` Richard Biener 2010-01-17 12:34:28 UTC ```Created attachment 19634 [details] updated patch I updated the patch to apply to current trunk. I also notice that on the testcase for PR21485 comment #36: /* { dg-do compile } */ /* { dg-options "-O2 -fdump-tree-pre" } */ long NumSift (long *array, int b, unsigned long k) { if (b) if (array[k] < array[k + 1L]) ++k; return array[k]; } /* There should be two loads left. */ /* { dg-final { scan-tree-dump-times "= \\\*" 2 "pre" } } */ /* { dg-final { cleanup-tree-dump "pre" } } */ we properly hoist array[k] before if (b) but still insert on the else edge of that branch for eliminating a partial redundancy (and elimination doesn't remove that again because the value-numbers of the inserted expressions are not equal). This is because while we put hoisted expressions into AVAIL_OUT of the block we inserted to we do not propagate this to its successors so PRE insertion does not find leaders for it. In this case, starting from : if (b_2(D) != 0) goto ; else goto ; : goto ; : D.1955_4 = k_3(D) * 4; D.1956_6 = array_5(D) + D.1955_4; D.1957_7 = *D.1956_6; k_8 = k_3(D) + 1; D.1959_9 = k_8 * 4; D.1960_10 = array_5(D) + D.1959_9; D.1961_11 = *D.1960_10; if (D.1957_7 < D.1961_11) goto ; else goto ; : goto ; : : # k_1 = PHI D.1955_13 = k_1 * 4; D.1956_14 = array_5(D) + D.1955_13; D.1964_15 = *D.1956_14; return D.1964_15; we hoist into block 2 and then PRE-insert for block 5, in pred 6 the expressions are not yet available. In reality they are available but they didn't go through NEW set processing yet. I wonder if we have to extend NEW set processing to either cover immediate dominators of dominators or succs with single preds we hoisted to (no idea if either would confuse PRE though).``` Richard Biener 2010-01-17 12:35:44 UTC `Hoh, patches from steven but steven not in CC ...` Richard Biener 2010-01-17 12:56:44 UTC ```sth like @@ -3893,7 +3893,29 @@ insert_aux (basic_block block, bool do_p /* Insert expressions for hoisting. */ if (do_hoist && EDGE_COUNT (block->succs) >= 2) - new_stuff |= do_hoist_insertion (block); + { + edge e; + edge_iterator ei; + + new_stuff |= do_hoist_insertion (block); + + /* Immediately update AVAIL_OUT of immediately dominated blocks + with the hoisted expressions. */ + newset = NEW_SETS (block); + if (newset) + FOR_EACH_EDGE (e, ei, block->succs) + { + if (EDGE_COUNT (e->dest->preds) != 1) + continue; + + FOR_EACH_EXPR_ID_IN_SET (newset, i, bi) + { + pre_expr expr = expression_for_id (i); + bitmap_value_replace_in_set (NEW_SETS (e->dest), expr); + bitmap_value_replace_in_set (AVAIL_OUT (e->dest), expr); + } + } + } } } for (son = first_dom_son (CDI_DOMINATORS, block); on top of the last patch``` Steven Bosscher 2010-01-17 13:00:09 UTC ```I intended to work on this for GCC 4.6. There are many vectorizer failures with the v3 patch. My local copy of the patch is at v6 :-) I'll see if I can make some time to "forward-port" the differences between my v3 and v6 to Richi's updated patch. ``` Steven Bosscher 2010-02-24 09:40:33 UTC `*** Bug 43159 has been marked as a duplicate of this bug. ***` Richard Biener 2010-11-16 14:39:50 UTC ```(In reply to comment #29) > I intended to work on this for GCC 4.6. There are many vectorizer failures > with the v3 patch. My local copy of the patch is at v6 :-) I'll see if I can > make some time to "forward-port" the differences between my v3 and v6 to > Richi's updated patch. I don't remember if you did send me your v6 privately (I at least can't find it), can you simply dump your most current variant here? Thx.``` Andrew Pinski 2011-07-20 00:40:39 UTC `Anyone working on this? I have found this happens internally a bit (after expanding of bitfields accesses).` Andrew Pinski 2011-07-29 20:11:29 UTC ```(In reply to comment #26) > Created attachment 19634 [details] > updated patch This patch fails on the trunk now with: /data/src/gcc-cavium/gcc-4.6-upgrade/src/libstdc++-v3/src/strstream.cc: In destructor ‘virtual std::strstream::~strstream()’: /data/src/gcc-cavium/gcc-4.6-upgrade/src/libstdc++-v3/src/strstream.cc:399:3: error: statement marked for throw in middle of block # .MEM_23 = VDEF <.MEM_22> std::strstreambuf::~strstreambuf (D.20970_7);``` Richard Biener 2012-02-15 11:19:55 UTC `*** Bug 52256 has been marked as a duplicate of this bug. ***` Richard Biener 2012-02-15 15:55:52 UTC ```Created attachment 26665 [details] forward ported patch I made the patch apply and build again (unbootstrapped/tested). Steven, if you happen to have a "latest" patch around from yours somewhere I'd appreciate if you can make it appear in my inbox somehow ... I'll integrate it with my changes. Thx.``` Steven Bosscher 2012-02-15 18:37:40 UTC `The patch was on one of the gsyprf machines, which are gone (didn't I already tell you this before??). So the "latest" patch is lost...` rguenther@suse.de 2012-02-16 09:37:54 UTC ```On Wed, 15 Feb 2012, steven at gcc dot gnu.org wrote: > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23286 > > --- Comment #36 from Steven Bosscher 2012-02-15 18:37:40 UTC --- > The patch was on one of the gsyprf machines, which are gone (didn't I already > tell you this before??). So the "latest" patch is lost... Not sure, I might simply have forgotten it ;) I'm going to make the patch bootstrap again (still fails with the EH problem right now).``` Richard Biener 2012-02-16 12:44:31 UTC ```Incremental patch to fix the EH related ICEs: Index: gcc/tree-ssa-pre.c =================================================================== --- gcc/tree-ssa-pre.c.orig 2012-02-16 12:08:57.000000000 +0100 +++ gcc/tree-ssa-pre.c 2012-02-16 12:08:50.000000000 +0100 @@ -3937,7 +3937,8 @@ do_hoist_insertion (basic_block block AT find_or_generate_expression (block, expr, &stmts, NULL); last = gsi_last_bb (block); - if (gsi_end_p (last) || is_ctrl_stmt (gsi_stmt (last))) + if (gsi_end_p (last) + || stmt_ends_bb_p (gsi_stmt (last))) gsi_insert_seq_before (&last, stmts, GSI_SAME_STMT); else gsi_insert_seq_after (&last, stmts, GSI_SAME_STMT); when hoisting into # BLOCK 19 # PRED: 18 (fallthru,exec) [LP 15] D.2722_10 = CYapfBaseT::PfGetSettings (D.2723_9); goto ; # SUCC: 20 (fallthru,exec) 113 (eh,exec) but of course we'd have to verify we can hoist across such throwing stmt (it might be a store clobbering what we insert), so maybe disabling hoisting in that case would be more appropriate. With Index: gcc/tree-ssa-pre.c =================================================================== --- gcc/tree-ssa-pre.c.orig 2012-02-16 12:28:04.000000000 +0100 +++ gcc/tree-ssa-pre.c 2012-02-16 12:27:37.000000000 +0100 @@ -3874,6 +3874,7 @@ do_hoist_insertion (basic_block block AT bitmap_iterator bi; bitmap_set_t hoistable_set; bitmap availout_in_some; + gimple_stmt_iterator last; /* At least two successors, or else... */ gcc_assert (EDGE_COUNT (block->succs) >= 2); @@ -3886,6 +3887,14 @@ do_hoist_insertion (basic_block block AT if (! single_pred_p (e->dest)) return false; + /* Determine the insertion point. If we cannot safely insert before + the last stmt if we'd have to, bail out. */ + last = gsi_last_bb (block); + if (!gsi_end_p (last) + && !is_ctrl_stmt (gsi_stmt (last)) + && stmt_ends_bb_p (gsi_stmt (last))) + return false; + hoistable_set = bitmap_set_new (); availout_in_some = BITMAP_ALLOC (&grand_bitmap_obstack); @@ -3916,7 +3925,6 @@ do_hoist_insertion (basic_block block AT pre_expr expr = expression_for_id (i); unsigned int value_id = get_expr_value_id (expr); gimple_seq stmts = NULL; - gimple_stmt_iterator last; /* If the value of this expression is not available in at least one successor, do not hoist the value. */ @@ -3936,7 +3944,6 @@ do_hoist_insertion (basic_block block AT find_or_generate_expression (block, expr, &stmts, NULL); - last = gsi_last_bb (block); if (gsi_end_p (last) || is_ctrl_stmt (gsi_stmt (last))) gsi_insert_seq_before (&last, stmts, GSI_SAME_STMT); else Which leaves us with /abuild/rguenther/obj/./gcc/xgcc -B/abuild/rguenther/obj/./gcc/ -B/usr/local/x86_64-unknown-linux-gnu/bin/ -B/usr/local/x86_64-unknown-linux-gnu/lib/ -isystem /usr/local/x86_64-unknown-linux-gnu/include -isystem /usr/local/x86_64-unknown-linux-gnu/sys-include -c -g -O2 -m32 -fpic -W -Wall -gnatpg -nostdinc -m32 a-nlrear.ads -o a-nlrear.o raised STORAGE_ERROR : stack overflow or erroneous memory access make[9]: *** [a-nlrear.o] Error 1 during stage3 libada build ...``` Richard Biener 2012-02-20 11:50:00 UTC ```Apart from that libada bootstrap issue (reproduces with the stage1 compiler), the testsuite is clean apart from vectorizer testcases which show that do_hoist_insertion is hoisting stmts across loops for (i = 0; i < N; i++) { diff = 0; for (j = k; j < M; j+=4) { diff += in[j+i]*coeff[j]; } out[i] = out[i] + diff; } is turned into for (i = 0; i < N; i++) { diff = 0; tem = out[i]; for (j = k; j < M; j+=4) { diff += in[j+i]*coeff[j]; } out[i] = tem + diff; } which confuses outer loop vectorization enough to make it fail for FAIL: gcc.dg/vect/vect-outer-fir-big-array.c -flto scan-tree-dump-times vect "OU TER LOOP VECTORIZED" 2 FAIL: gcc.dg/vect/vect-outer-fir-lb-big-array.c -flto scan-tree-dump-times vect "OUTER LOOP VECTORIZED" 2 FAIL: gcc.dg/vect/vect-outer-fir-lb.c -flto scan-tree-dump-times vect "OUTER LOO P VECTORIZED" 2 FAIL: gcc.dg/vect/vect-outer-fir.c -flto scan-tree-dump-times vect "OUTER LOOP V ECTORIZED" 2``` Bernhard Reutner-Fischer 2012-02-23 13:34:37 UTC ```The ATTRIBUTE_UNUSED of do_hoist_insertion can be removed. diff --git a/gcc/tree-ssa-pre.c b/gcc/tree-ssa-pre.c index 0f777b4..bfc7a92 100644 --- a/gcc/tree-ssa-pre.c +++ b/gcc/tree-ssa-pre.c @@ -3865,7 +3865,7 @@ do_pre_partial_partial_insertion (basic_block block, basic_block dom) The caller has to make sure that BLOCK has at least two successors. */ static bool -do_hoist_insertion (basic_block block ATTRIBUTE_UNUSED) +do_hoist_insertion (basic_block block) { edge e; edge_iterator ei; @@ -3878,6 +3878,13 @@ do_hoist_insertion (basic_block block ATTRIBUTE_UNUSED) /* At least two successors, or else... */ gcc_assert (EDGE_COUNT (block->succs) >= 2); + /* We cheat about AVAIL_OUT in the first block + so pretend we are done in the second iteration. */ + if (block->prev_bb + && block->prev_bb->index == ENTRY_BLOCK + && pre_stats.hoist_insert) + return false; + /* Check that all successors of BLOCK are dominated by block. We could use dominated_by_p() for this, but actually there is a much quicker check: any successor that is dominated by BLOCK can't have @@ -3890,9 +3897,12 @@ do_hoist_insertion (basic_block block ATTRIBUTE_UNUSED) availout_in_some = BITMAP_ALLOC (&grand_bitmap_obstack); /* A hoistable value must be in ANTIC_IN(block) - but not in AVAIL_OUT(BLOCK). */ + but not in AVAIL_OUT(BLOCK). + To give more opportunity to hoisting, + cheat by disregarding AVAIL_OUT of the ENTRY_BLOCK. */ bitmap_set_copy (hoistable_set, ANTIC_IN (block)); - bitmap_set_subtract_values (hoistable_set, AVAIL_OUT (block)); + if (block->prev_bb && block->prev_bb->index != ENTRY_BLOCK) + bitmap_set_subtract_values (hoistable_set, AVAIL_OUT (block)); /* Short-cut for a common case: hoistable_set is empty. */ if (bitmap_empty_p (&hoistable_set->values)) so for a simplified PR5738 \$ cat pr5738.c struct foo { unsigned short *p; }; #define foo_s s void func (struct foo *foo_s, unsigned int *coord, _Bool delta) { unsigned short change; if (delta) { change = *((foo_s)->p++); *coord += change; } else { change = *((foo_s)->p++); *coord += change; // *coord += *((foo_s)->p++) << 8; } } we end up a little bit better, with something like func (struct foo * sD.1705, unsigned intD.9 * coordD.1706, _BoolD.1685 deltaD.1707) { unsigned intD.9 pretmp.6D.1727; short unsigned intD.16 * pretmp.5D.1726; short unsigned intD.16 pretmp.4D.1725; short unsigned intD.16 * pretmp.3D.1724; short unsigned intD.16 changeD.1710; unsigned intD.9 D.1718; unsigned intD.9 D.1717; unsigned intD.9 D.1716; short unsigned intD.16 * D.1715; short unsigned intD.16 * D.1714; # BLOCK 2 freq:10000 # PRED: ENTRY [100.0%] (fallthru,exec) # VUSE <.MEMD.1720_17(D)> # PT = nonlocal escaped pretmp.3D.1724_22 = sD.1705_2(D)->pD.1704; # VUSE <.MEMD.1720_17(D)> pretmp.4D.1725_23 = *pretmp.3D.1724_22; # PT = nonlocal escaped pretmp.5D.1726_24 = pretmp.3D.1724_22 + 2; # VUSE <.MEMD.1720_17(D)> pretmp.6D.1727_25 = *coordD.1706_6(D); pretmp.6D.1727_26 = (unsigned intD.9) pretmp.4D.1725_23; pretmp.6D.1727_27 = pretmp.6D.1727_25 + pretmp.6D.1727_26; if (deltaD.1707_1(D) != 0) goto ; else goto ; # SUCC: 3 [39.0%] (true,exec) 4 [61.0%] (false,exec) # BLOCK 3 freq:3900 # PRED: 2 [39.0%] (true,exec) # .MEMD.1720_18 = VDEF <.MEMD.1720_17(D)> sD.1705_2(D)->pD.1704 = pretmp.5D.1726_24; # .MEMD.1720_19 = VDEF <.MEMD.1720_18> *coordD.1706_6(D) = pretmp.6D.1727_27; goto ; # SUCC: 5 [100.0%] (fallthru,exec) # BLOCK 4 freq:6100 # PRED: 2 [61.0%] (false,exec) # .MEMD.1720_20 = VDEF <.MEMD.1720_17(D)> sD.1705_2(D)->pD.1704 = pretmp.5D.1726_24; # .MEMD.1720_21 = VDEF <.MEMD.1720_20> *coordD.1706_6(D) = pretmp.6D.1727_27; # SUCC: 5 [100.0%] (fallthru,exec) # BLOCK 5 freq:10000 # PRED: 3 [100.0%] (fallthru,exec) 4 [100.0%] (fallthru,exec) # .MEMD.1720_16 = PHI <.MEMD.1720_19(3), .MEMD.1720_21(4)> # VUSE <.MEMD.1720_16> return; # SUCC: EXIT [100.0%] } which translates to nearly proper code: func: .LFB0: .cfi_startproc movq (%rdi), %rax # sD.1705_2(D)->pD.1704, pretmp.3D.1724 leaq 2(%rax), %rcx #, pretmp.5D.1726 movzwl (%rax), %eax # *pretmp.3D.1724_22, pretmp.6D.1727 addl (%rsi), %eax # *coordD.1706_6(D), pretmp.6D.1727 testb %dl, %dl # deltaD.1707 movq %rcx, (%rdi) # pretmp.5D.1726, sD.1705_2(D)->pD.1704 movl %eax, (%rsi) # pretmp.6D.1727, *coordD.1706_6(D) je .L2 #, ret .L2: ret .cfi_endproc where the expected code would be something like (i think): func: .LFB0: .cfi_startproc movq (%rdi), %rax # sD.1705_2(D)->pD.1704, D.1714 movzwl (%rax), %edx #* D.1714, changeD.1710 addq \$2, %rax #, tmp77 movq %rax, (%rdi) # tmp77, sD.1705_2(D)->pD.1704 addl %edx, (%rsi) # changeD.1710, *coordD.1706_6(D) ret .cfi_endproc .LFE0: So we just need to recognize that BB3 and BB4 are identical (everything in BB3 can be hoisted and BB4 is dead).``` Richard Biener 2016-07-12 13:32:36 UTC ```Author: rguenth Date: Tue Jul 12 13:32:04 2016 New Revision: 238242 URL: https://gcc.gnu.org/viewcvs?rev=238242&root=gcc&view=rev Log: 2016-07-12 Steven Bosscher Richard Biener PR tree-optimization/23286 PR tree-optimization/70159 * doc/invoke.texi: Document -fcode-hoisting. * common.opt (fcode-hoisting): New flag. * opts.c (default_options_table): Enable -fcode-hoisting at -O2+. * tree-ssa-pre.c (pre_stats): Add hoist_insert. (do_regular_insertion): Rename to ... (do_pre_regular_insertion): ... this and amend general comments on insertion strathegy. (do_partial_partial_insertion): Rename to ... (do_pre_partial_partial_insertion): ... this. (do_hoist_insertion): New function. (insert_aux): Take flags on whether to do PRE and/or hoist insertion and call do_hoist_insertion properly. (insert): Adjust. (pass_pre::gate): Enable also if -fcode-hoisting is enabled. (pass_pre::execute): Register hoist_insert stats. * gcc.dg/tree-ssa/ssa-pre-11.c: Disable code hosting. * gcc.dg/tree-ssa/ssa-pre-27.c: Likewise. * gcc.dg/tree-ssa/ssa-pre-28.c: Likewise. * gcc.dg/tree-ssa/ssa-pre-2.c: Likewise. * gcc.dg/tree-ssa/pr35286.c: Likewise. * gcc.dg/tree-ssa/pr35287.c: Likewise. * gcc.dg/hoist-register-pressure-1.c: Likewise. * gcc.dg/hoist-register-pressure-2.c: Likewise. * gcc.dg/hoist-register-pressure-3.c: Likewise. * gcc.dg/pr51879-12.c: Likewise. * gcc.dg/strlenopt-9.c: Likewise. * gcc.dg/tree-ssa/pr47392.c: Likewise. * gcc.dg/tree-ssa/pr68619-4.c: Likewise. * gcc.dg/tree-ssa/split-path-5.c: Likewise. * gcc.dg/tree-ssa/slsr-35.c: Likewise. * gcc.dg/tree-ssa/slsr-36.c: Likewise. * gcc.dg/tree-ssa/loadpre3.c: Adjust so hosting doesn't apply. * gcc.dg/tree-ssa/pr43491.c: Scan optimized dump for desired result. * gcc.dg/tree-ssa/ssa-pre-31.c: Adjust expected outcome for hoisting. * gcc.dg/tree-ssa/ssa-hoist-1.c: New testcase. * gcc.dg/tree-ssa/ssa-hoist-2.c: New testcase. * gcc.dg/tree-ssa/ssa-hoist-3.c: New testcase. * gcc.dg/tree-ssa/ssa-hoist-4.c: New testcase. * gcc.dg/tree-ssa/ssa-hoist-5.c: New testcase. * gcc.dg/tree-ssa/ssa-hoist-6.c: New testcase. * gfortran.dg/pr43984.f90: Adjust expected outcome. Added: trunk/gcc/testsuite/gcc.dg/tree-ssa/ssa-hoist-1.c trunk/gcc/testsuite/gcc.dg/tree-ssa/ssa-hoist-2.c trunk/gcc/testsuite/gcc.dg/tree-ssa/ssa-hoist-3.c trunk/gcc/testsuite/gcc.dg/tree-ssa/ssa-hoist-4.c trunk/gcc/testsuite/gcc.dg/tree-ssa/ssa-hoist-6.c Modified: trunk/gcc/ChangeLog trunk/gcc/common.opt trunk/gcc/doc/invoke.texi trunk/gcc/opts.c trunk/gcc/testsuite/ChangeLog trunk/gcc/testsuite/gcc.dg/hoist-register-pressure-1.c trunk/gcc/testsuite/gcc.dg/hoist-register-pressure-2.c trunk/gcc/testsuite/gcc.dg/hoist-register-pressure-3.c trunk/gcc/testsuite/gcc.dg/pr51879-12.c trunk/gcc/testsuite/gcc.dg/strlenopt-9.c trunk/gcc/testsuite/gcc.dg/tree-ssa/loadpre3.c trunk/gcc/testsuite/gcc.dg/tree-ssa/pr35286.c trunk/gcc/testsuite/gcc.dg/tree-ssa/pr35287.c trunk/gcc/testsuite/gcc.dg/tree-ssa/pr43491.c trunk/gcc/testsuite/gcc.dg/tree-ssa/pr47392.c trunk/gcc/testsuite/gcc.dg/tree-ssa/pr68619-4.c trunk/gcc/testsuite/gcc.dg/tree-ssa/slsr-35.c trunk/gcc/testsuite/gcc.dg/tree-ssa/slsr-36.c trunk/gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-11.c trunk/gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-2.c trunk/gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-27.c trunk/gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-28.c trunk/gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-31.c trunk/gcc/testsuite/gfortran.dg/pr43984.f90 trunk/gcc/tree-ssa-pre.c``` Richard Biener 2016-07-12 13:32:57 UTC `Fixed for GCC 7.`