Bug 23286 - missed fully redundant expression
missed fully redundant expression
Status: NEW
Product: gcc
Classification: Unclassified
Component: tree-optimization
4.1.0
: P2 enhancement
: ---
Assigned To: Not yet assigned to anyone
: missed-optimization
: 32590 35303 38204 43159 52256 (view as bug list)
Depends on: 5738 15559
Blocks: 11832 21485 24568 29144 33315 21676 24001 33828
  Show dependency treegraph
 
Reported: 2005-08-08 17:46 UTC by Paolo Bonzini
Modified: 2013-04-26 12:00 UTC (History)
17 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail: 4.1.0
Last reconfirmed: 2006-01-28 00:49:03


Attachments
Proof-of-concept patch (4.16 KB, patch)
2008-11-23 13:07 UTC, Steven Bosscher
Details | Diff
less unpolished version of tree code hoisting (12.60 KB, patch)
2008-11-27 15:26 UTC, Steven Bosscher
Details | Diff
patch to implement code hoisting in tree-ssa-pre.c (25.27 KB, patch)
2008-12-01 22:00 UTC, Steven Bosscher
Details | Diff
updated patch (26.65 KB, patch)
2010-01-17 12:34 UTC, Richard Biener
Details | Diff
forward ported patch (28.45 KB, patch)
2012-02-15 15:55 UTC, Richard Biener
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Paolo Bonzini 2005-08-08 17:46:32 UTC
In this code, a <<= 1 is fully redundant

  unsigned short f(unsigned short a)
  {
    if (a & 0x8000)
      a <<= 1, a = a ^ 0x1021;
    else
      a <<= 1;

    return a;
  }

the body should be turned into

  unsigned short f(unsigned short a)
  {
    unsigned short b = a << 1;
    if (a & 0x8000)
      a = b, a = a ^ 0x1021;
    else
      a = b;
  
    return a;
  }

or something similar.  However PRE leaves the GIMPLE unchanged:

f (a)
{
  int D.1267;
  short int a.0;
<bb 0>:
  a.0_3 = (short int) a_2;
  if (a.0_3 < 0) goto <L0>; else goto <L1>;

<L0>:;
  a_7 = a_2 << 1;
  a_8 = a_7 ^ 4129;
  goto <bb 3> (<L2>);
<L1>:;
  a_6 = a_2 << 1;

  # a_1 = PHI <a_8(1), a_6(2)>;
<L2>:;
  D.1267_4 = (int) a_1;
  return D.1267_4;
}

On PPC, this is only caught after reload.

Paolo
Comment 1 Andrew Pinski 2005-08-08 18:42:32 UTC
Confirmed, for some reason the following is caught though:
  unsigned short f(unsigned short a)
  {
    unsigned short b = a <<1;
    if (a & 0x8000)
      a <<= 1, a = a ^ 0x1021;
    else
      a = b;

    return a;
  }
as a<<1 is caught being redundant.  FRE catches the above on the mainline.

Hmm there is only one VH for the expression:
Created value VH.0 for a_2
Created value VH.1 for (short int) VH.0
Created value VH.2 for VH.0 << 1
Created value VH.3 for VH.2 ^ 4129
Created value VH.4 for a_1
Created value VH.5 for (int) VH.4
Created value VH.6 for <retval>_5
exp_gen[-1] := {  }
tmp_gen[-1] := { a_2 (VH.0)  }
avail_out[-1] := { a_2 (VH.0)  }
exp_gen[0] := { a_2 (VH.0) , (short int) VH.0 (VH.1)  }
tmp_gen[0] := { a.0_3 (VH.1)  }
avail_out[0] := { a_2 (VH.0) , a.0_3 (VH.1)  }
exp_gen[1] := { a_2 (VH.0) , VH.0 << 1 (VH.2) , VH.2 ^ 4129 (VH.3)  }
tmp_gen[1] := { a_7 (VH.2) , a_8 (VH.3)  }
avail_out[1] := { a_2 (VH.0) , a.0_3 (VH.1) , a_7 (VH.2) , a_8 (VH.3)  }
exp_gen[2] := { a_2 (VH.0) , VH.0 << 1 (VH.2)  }
tmp_gen[2] := { a_6 (VH.2)  }
avail_out[2] := { a_2 (VH.0) , a.0_3 (VH.1) , a_6 (VH.2)  }
exp_gen[3] := { a_1 (VH.4) , (int) VH.4 (VH.5)  }
tmp_gen[3] := { D.1280_4 (VH.5) , <retval>_5 (VH.6)  }
avail_out[3] := { a_1 (VH.4) , a_2 (VH.0) , a.0_3 (VH.1) , D.1280_4 (VH.5) , <retval>_5 (VH.6)  }
exp_gen[-2] := {  }
tmp_gen[-2] := {  }
avail_out[-2] := {  }
Comment 2 Daniel Berlin 2005-08-08 18:53:34 UTC
(In reply to comment #1)
> Confirmed, for some reason the following is caught though:
because it's a fully redundant expression, that is available when we go to
eliminate, as opposed to the original, which would require hoisting, and is
*not* partially redundant.

> Hmm there is only one VH for the expression:

As one would expect.

This is not a missed optimization for PRE.
PRE is not a generic hoister.
In fact, in this case, it doesn't actually save anything but size to perform
this "optimization".
All it will do is make b live over the use of a, adding another register to
allocate that has a conflict.
I don't see this as a bug at all, except maybe at -Os.
Even then, it's a a hoisting issue, using very busy expressions, not a PRE issue.
Comment 3 Daniel Berlin 2005-08-08 18:56:31 UTC
(In reply to comment #1)
> Confirmed, for some reason the following is caught though:

If you thought about it, you'd notice that your example below has two
computations of a = b along the if branch, which is not optimal.
The original has one computation on each path which is already optimal.
>   unsigned short f(unsigned short a)
>   {
>     unsigned short b = a <<1;
>     if (a & 0x8000)
>       a <<= 1, a = a ^ 0x1021;
>     else
>       a = b;
> 
>     return a;
>   }
Comment 4 Andrew Pinski 2005-08-08 19:34:21 UTC
This is basically PR 15559 which was marked as invalid.
Comment 5 Andrew Pinski 2005-08-08 19:36:43 UTC
and a dup of bug 5738 really.

I almost want to close this fully as a dup of bug 5738.
Comment 6 Andrew Pinski 2005-08-08 19:54:23 UTC
Here is a stupid testcase which can be sped up by pulling the reduandant expressions up:
int ii;
static inline int f(int i, int ii)
{
  return i/ ii;
}

int h(int) __attribute__((pure,const));

int g(int i)
{
  int j, j1 = i;
  for (j = 0; j <1000; j++)
    {
      int ii1 = ii;
      if (h(j))
        i = j1/ii1 + 2;
      else
        i = j1/ii1;
    }
  return i;
}

As we then pull the division before the loop.  As I said this was stupid as this was made up and I don't 
know how often this happens in real life.
Comment 7 Daniel Berlin 2005-08-08 20:29:40 UTC
Subject: Re:  missed fully redundant expression

On Mon, 2005-08-08 at 19:54 +0000, pinskia at gcc dot gnu dot org wrote:
> ------- Additional Comments From pinskia at gcc dot gnu dot org  2005-08-08 19:54 -------
> Here is a stupid testcase which can be sped up by pulling the reduandant expressions up:
> int ii;
> static inline int f(int i, int ii)
> {
>   return i/ ii;
> }
> 
> int h(int) __attribute__((pure,const));
> 
> int g(int i)
> {
>   int j, j1 = i;
>   for (j = 0; j <1000; j++)
>     {
>       int ii1 = ii;
>       if (h(j))
>         i = j1/ii1 + 2;
>       else
>         i = j1/ii1;
>     }
>   return i;
> }
> 

Yes, if we hoisted this, we would then pull it out of the loop.


Comment 8 Daniel Berlin 2005-08-08 20:40:02 UTC
Subject: Re:  missed fully redundant expression

On Mon, 2005-08-08 at 19:54 +0000, pinskia at gcc dot gnu dot org wrote:
> ------- Additional Comments From pinskia at gcc dot gnu dot org  2005-08-08 19:54 -------
> Here is a stupid testcase which can be sped up by pulling the reduandant expressions up:
> int ii;
> static inline int f(int i, int ii)
> {
>   return i/ ii;
> }
> 
> int h(int) __attribute__((pure,const));
> 
> int g(int i)
> {
>   int j, j1 = i;
>   for (j = 0; j <1000; j++)
>     {
>       int ii1 = ii;
>       if (h(j))
>         i = j1/ii1 + 2;
>       else
>         i = j1/ii1;
>     }
>   return i;
> }
> 
You are also going to have to hoist the ii1 = ii because it has a vuse.


Comment 9 Andrew Pinski 2005-08-29 12:33:15 UTC
*** Bug 23619 has been marked as a duplicate of this bug. ***
Comment 10 Andrew Pinski 2005-10-28 18:43:09 UTC
Here is another example which comes from PR 24568 (which really comes from thedailywtf):
int f(int i)
{
  if (i < 0) return i/10+ i;
  return i/10 + i;
}
int f2(int i)
{
  return i/10 + i;
}
Comment 11 Steven Bosscher 2007-10-10 22:59:32 UTC
*** Bug 32590 has been marked as a duplicate of this bug. ***
Comment 12 Steven Bosscher 2007-10-16 13:33:42 UTC
Does not really "block" 24001, but the test case for that bug would be fixed if code hoisting would be implemented properly.
Comment 13 Alexander Strange 2007-10-26 04:08:54 UTC
The closed #32590 is a 4.3 regression, while this is only enhancement.
Comment 14 Andrew Pinski 2008-02-23 05:22:43 UTC
*** Bug 35303 has been marked as a duplicate of this bug. ***
Comment 15 Steven Bosscher 2008-11-21 06:41:50 UTC
*** Bug 38204 has been marked as a duplicate of this bug. ***
Comment 16 Steven Bosscher 2008-11-23 13:07:55 UTC
Created attachment 16751 [details]
Proof-of-concept patch

It is not terribly complicated to add hoisting to tree-ssa-pre.c.  I have attached the result of roughly half a day of hacking.  It is a proof-of-concept patch that demonstrates how hoisting could be added.  It's not finished, tested, or even correct ;-) but it shows the general idea.  For a "production-quality" version, one would have to make sure not to hoist expressions up too far (e.g. not further up than above the first block that generates the expression).
Comment 17 Steven Bosscher 2008-11-23 13:11:06 UTC
For the test case of comment #0, the proof-of-concept patch does the following in the .084t.pre dump (relevant excerpts only):

VBEOUT[2] := { {lshift_expr,a_2(D),1} (0004) }
  Inserting expression 5 into AVAIL_OUT[2]
Inserted pretmp.12_8 = a_2(D) << 1;
 in predecessor 2
AVAIL_OUT[2] := { a_2(D) (0002), a.0_3 (0003), pretmp.12_8 (0004) }
Successor 1 not dominated by 5 -> empty set
VBEOUT[5] := {  }
AVAIL_OUT[5] := { a_2(D) (0002), a.0_3 (0003), a_1 (0001), pretmp.12_8 (0004) }
Successor 5 not dominated by 4 -> empty set
VBEOUT[4] := {  }
AVAIL_OUT[4] := { a_2(D) (0002), a.0_3 (0003), pretmp.12_8 (0004) }
Successor 5 not dominated by 3 -> empty set
VBEOUT[3] := {  }
AVAIL_OUT[3] := { a_2(D) (0002), a.0_3 (0003), a_5 (0005), pretmp.12_8 (0004) }
Replaced a_2(D) << 1 with pretmp.12_8 in a_4 = a_2(D) << 1;
Replaced a_2(D) << 1 with pretmp.12_8 in a_6 = a_2(D) << 1;

...

f (short unsigned int a)
{
  short unsigned int pretmp.12;
  short int a.0;

<bb 2>:
  a.0_3 = (short int) a_2(D);
  pretmp.12_8 = a_2(D) << 1;
  if (a.0_3 < 0)
    goto <bb 3>;
  else
    goto <bb 4>;

<bb 3>:
  a_4 = pretmp.12_8;
  a_5 = a_4 ^ 4129;
  goto <bb 5>;

<bb 4>:
  a_6 = pretmp.12_8;

<bb 5>:
  # a_1 = PHI <a_5(3), a_6(4)>
  return a_1;

}


which eventually leads to the following in the .126t.final_cleanup dump:


;; Function f (f)

f (short unsigned int a)
{
  short unsigned int a.22;

<bb 2>:
  a.22 = a << 1;
  if ((short int) a < 0)
    goto <bb 3>;
  else
    goto <bb 4>;

<bb 3>:
  a.22 = a.22 ^ 4129;

<bb 4>:
  return a.22;

}

Comment 18 Steven Bosscher 2008-11-23 13:22:44 UTC
The test case of PR38204 shows one of the problems with proof-of-concept patch, namely the "don't move up too much" problem.  The .pre dump looks like this:

test (int a, int b, int c, int g)
{
  int pretmp.11;
  int e;
  int d;
  int D.1257;
  int D.1256;

<bb 2>:
  pretmp.11_11 = b_3(D) * c_4(D);
  pretmp.11_12 = g_8(D) + pretmp.11_11;  // No need to move this up to here.
  if (a_2(D) != 0)
    goto <bb 3>;
  else
    goto <bb 4>;

<bb 3>:
  d_5 = pretmp.11_11;
  goto <bb 5>;

<bb 4>:
  d_6 = b_3(D) - c_4(D);

<bb 5>:
  # d_1 = PHI <d_5(3), d_6(4)>
  D.1256_7 = pretmp.11_11;
  e_9 = pretmp.11_12;
  D.1257_10 = e_9 + d_1;
  return D.1257_10;

}


Eventually this gives (in the .final_cleanup dump):

;; Function test (test)

test (int a, int b, int c, int g)
{
  int d.21;
  int d;

<bb 2>:
  d.21 = c * b;
  if (a != 0)
    goto <bb 3>;
  else
    goto <bb 4>;

<bb 3>:
  d = d.21;
  goto <bb 5>;

<bb 4>:
  d = b - c;

<bb 5>:
  return (d.21 + g) + d;

}

Comment 19 Richard Biener 2008-11-23 13:32:25 UTC
Nice.
Comment 20 Richard Biener 2008-11-23 13:43:47 UTC
We also need to make sure not to do hoisting where we should do sinking like for

int foo(int b, int i)
{
  int res;
  if (b)
    res = i + 1;
  else
    res = i + 1;
  return res;
}

(add some more code that shows the increased life-range of res would hurt)

I guess sinking doesn't really fit the PRE framework.
Comment 21 Steven Bosscher 2008-11-23 14:20:42 UTC
I'll work on something that bootstraps and passes testing.  But cost-related decisions (like the one from comment #20) are not on my TODO list right now.  The pass that should do this is called sched1 ;-)
Comment 22 Daniel Berlin 2008-11-23 18:30:27 UTC
Subject: Re:  missed fully redundant expression

Sinking fits into the reverse framework.
Apparently the SSUPRE person plans on submitting when 4.5 opens, and
you can fit sinking frameworks into there.


On Sun, Nov 23, 2008 at 8:43 AM, rguenth at gcc dot gnu dot org
<gcc-bugzilla@gcc.gnu.org> wrote:
>
>
> ------- Comment #20 from rguenth at gcc dot gnu dot org  2008-11-23 13:43 -------
> We also need to make sure not to do hoisting where we should do sinking like
> for
>
> int foo(int b, int i)
> {
>  int res;
>  if (b)
>    res = i + 1;
>  else
>    res = i + 1;
>  return res;
> }
>
> (add some more code that shows the increased life-range of res would hurt)
>
> I guess sinking doesn't really fit the PRE framework.
>
>
> --
>
>
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23286
>
> ------- You are receiving this mail because: -------
> You are on the CC list for the bug, or are watching someone who is.
>
Comment 23 Steven Bosscher 2008-11-27 15:26:20 UTC
Created attachment 16784 [details]
less unpolished version of tree code hoisting

This fixes some bugs in the proof-of-concept patch:

1. Don't hoist things out of loops.  We have to look at ANTIC_IN of the block we hoist into, not the union of ANTIC_IN of all its successors.

2. Add a constraint that the hoisted expression must be computed in one of the successors of the block we're hoisting too.

Also added comment so that the uninitiated don't have their eye balls dropping out in confusion ;-)

I might actually post this for GCC 4.5 if it subsumes the RTL code hoisting pass.
Comment 24 Steven Bosscher 2008-12-01 22:00:50 UTC
Created attachment 16803 [details]
patch to implement code hoisting in tree-ssa-pre.c

This passes bootstrap+testing on ia64-linux and amd64-linux.  It causes a few vectorizer failures (4 on ia64) that I have not analyzed yet.
Comment 25 Steven Bosscher 2009-08-04 09:05:53 UTC
*** Bug 40956 has been marked as a duplicate of this bug. ***
Comment 26 Richard Biener 2010-01-17 12:34:28 UTC
Created attachment 19634 [details]
updated patch

I updated the patch to apply to current trunk.

I also notice that on the testcase for PR21485 comment #36:

/* { dg-do compile } */
/* { dg-options "-O2 -fdump-tree-pre" } */

long
NumSift (long *array, int b, unsigned long k)
{
  if (b)
    if (array[k] < array[k + 1L])
      ++k;
  return array[k];
}

/* There should be two loads left.  */

/* { dg-final { scan-tree-dump-times "= \\\*" 2 "pre" } } */
/* { dg-final { cleanup-tree-dump "pre" } } */


we properly hoist array[k] before if (b) but still insert on the else edge
of that branch for eliminating a partial redundancy (and elimination doesn't
remove that again because the value-numbers of the inserted expressions are
not equal).

This is because while we put hoisted expressions into AVAIL_OUT of the
block we inserted to we do not propagate this to its successors so
PRE insertion does not find leaders for it.

In this case, starting from

<bb 2>:
  if (b_2(D) != 0)
    goto <bb 3>;
  else
    goto <bb 6>;

<bb 6>:
  goto <bb 5>;

<bb 3>:
  D.1955_4 = k_3(D) * 4;
  D.1956_6 = array_5(D) + D.1955_4;
  D.1957_7 = *D.1956_6;
  k_8 = k_3(D) + 1;
  D.1959_9 = k_8 * 4;
  D.1960_10 = array_5(D) + D.1959_9;
  D.1961_11 = *D.1960_10;
  if (D.1957_7 < D.1961_11)
    goto <bb 4>;
  else
    goto <bb 7>;

<bb 7>:
  goto <bb 5>;

<bb 4>:

<bb 5>:
  # k_1 = PHI <k_3(D)(6), k_3(D)(7), k_8(4)>
  D.1955_13 = k_1 * 4;
  D.1956_14 = array_5(D) + D.1955_13;
  D.1964_15 = *D.1956_14;
  return D.1964_15;

we hoist into block 2 and then PRE-insert for block 5, in pred 6 the
expressions are not yet available.  In reality
they are available but they didn't go through NEW set processing yet.

I wonder if we have to extend NEW set processing to either cover
immediate dominators of dominators or succs with single preds we
hoisted to (no idea if either would confuse PRE though).
Comment 27 Richard Biener 2010-01-17 12:35:44 UTC
Hoh, patches from steven but steven not in CC ...
Comment 28 Richard Biener 2010-01-17 12:56:44 UTC
sth like

@@ -3893,7 +3893,29 @@ insert_aux (basic_block block, bool do_p
 
 	  /* Insert expressions for hoisting.  */
 	  if (do_hoist && EDGE_COUNT (block->succs) >= 2)
-	    new_stuff |= do_hoist_insertion (block);
+	    {
+	      edge e;
+	      edge_iterator ei;
+
+	      new_stuff |= do_hoist_insertion (block);
+
+	      /* Immediately update AVAIL_OUT of immediately dominated blocks
+	         with the hoisted expressions.  */
+	      newset = NEW_SETS (block);
+	      if (newset)
+		FOR_EACH_EDGE (e, ei, block->succs)
+		  {
+		    if (EDGE_COUNT (e->dest->preds) != 1)
+		      continue;
+
+		    FOR_EACH_EXPR_ID_IN_SET (newset, i, bi)
+		      {
+			pre_expr expr = expression_for_id (i);
+			bitmap_value_replace_in_set (NEW_SETS (e->dest), expr);
+			bitmap_value_replace_in_set (AVAIL_OUT (e->dest), expr);
+		      }
+		  }
+	    }
 	}
     }
   for (son = first_dom_son (CDI_DOMINATORS, block);


on top of the last patch
Comment 29 Steven Bosscher 2010-01-17 13:00:09 UTC
I intended to work on this for GCC 4.6.  There are many vectorizer failures with the v3 patch. My local copy of the patch is at v6 :-)  I'll see if I can make some time to "forward-port" the differences between my v3 and v6 to Richi's updated patch.
Comment 30 Steven Bosscher 2010-02-24 09:40:33 UTC
*** Bug 43159 has been marked as a duplicate of this bug. ***
Comment 31 Richard Biener 2010-11-16 14:39:50 UTC
(In reply to comment #29)
> I intended to work on this for GCC 4.6.  There are many vectorizer failures
> with the v3 patch. My local copy of the patch is at v6 :-)  I'll see if I can
> make some time to "forward-port" the differences between my v3 and v6 to
> Richi's updated patch.

I don't remember if you did send me your v6 privately (I at least can't find it),
can you simply dump your most current variant here?

Thx.
Comment 32 Andrew Pinski 2011-07-20 00:40:39 UTC
Anyone working on this?  I have found this happens internally a bit (after expanding of bitfields accesses).
Comment 33 Andrew Pinski 2011-07-29 20:11:29 UTC
(In reply to comment #26)
> Created attachment 19634 [details]
> updated patch
This patch fails on the trunk now with:
/data/src/gcc-cavium/gcc-4.6-upgrade/src/libstdc++-v3/src/strstream.cc: In destructor ‘virtual std::strstream::~strstream()’:
/data/src/gcc-cavium/gcc-4.6-upgrade/src/libstdc++-v3/src/strstream.cc:399:3: error: statement marked for throw in middle of block
# .MEM_23 = VDEF <.MEM_22>
std::strstreambuf::~strstreambuf (D.20970_7);
Comment 34 Richard Biener 2012-02-15 11:19:55 UTC
*** Bug 52256 has been marked as a duplicate of this bug. ***
Comment 35 Richard Biener 2012-02-15 15:55:52 UTC
Created attachment 26665 [details]
forward ported patch

I made the patch apply and build again (unbootstrapped/tested).

Steven, if you happen to have a "latest" patch around from yours somewhere
I'd appreciate if you can make it appear in my inbox somehow ... I'll integrate
it with my changes.  Thx.
Comment 36 Steven Bosscher 2012-02-15 18:37:40 UTC
The patch was on one of the gsyprf machines, which are gone (didn't I already tell you this before??). So the "latest" patch is lost...
Comment 37 rguenther@suse.de 2012-02-16 09:37:54 UTC
On Wed, 15 Feb 2012, steven at gcc dot gnu.org wrote:

> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23286
> 
> --- Comment #36 from Steven Bosscher <steven at gcc dot gnu.org> 2012-02-15 18:37:40 UTC ---
> The patch was on one of the gsyprf machines, which are gone (didn't I already
> tell you this before??). So the "latest" patch is lost...

Not sure, I might simply have forgotten it ;)  I'm going to make the
patch bootstrap again (still fails with the EH problem right now).
Comment 38 Richard Biener 2012-02-16 12:44:31 UTC
Incremental patch to fix the EH related ICEs:

Index: gcc/tree-ssa-pre.c
===================================================================
--- gcc/tree-ssa-pre.c.orig     2012-02-16 12:08:57.000000000 +0100
+++ gcc/tree-ssa-pre.c  2012-02-16 12:08:50.000000000 +0100
@@ -3937,7 +3937,8 @@ do_hoist_insertion (basic_block block AT
       find_or_generate_expression (block, expr, &stmts, NULL);
 
       last = gsi_last_bb (block);
-      if (gsi_end_p (last) || is_ctrl_stmt (gsi_stmt (last)))
+      if (gsi_end_p (last)
+         || stmt_ends_bb_p (gsi_stmt (last)))
        gsi_insert_seq_before (&last, stmts, GSI_SAME_STMT);
       else
        gsi_insert_seq_after (&last, stmts, GSI_SAME_STMT);

when hoisting into

  # BLOCK 19
  # PRED: 18 (fallthru,exec)
  [LP 15] D.2722_10 = CYapfBaseT::PfGetSettings (D.2723_9);
  goto <bb 20>;
  # SUCC: 20 (fallthru,exec) 113 (eh,exec)

but of course we'd have to verify we can hoist across such throwing
stmt (it might be a store clobbering what we insert),
so maybe disabling hoisting in that case would be more appropriate.
With

Index: gcc/tree-ssa-pre.c
===================================================================
--- gcc/tree-ssa-pre.c.orig     2012-02-16 12:28:04.000000000 +0100
+++ gcc/tree-ssa-pre.c  2012-02-16 12:27:37.000000000 +0100
@@ -3874,6 +3874,7 @@ do_hoist_insertion (basic_block block AT
   bitmap_iterator bi;
   bitmap_set_t hoistable_set;
   bitmap availout_in_some;
+  gimple_stmt_iterator last;
 
   /* At least two successors, or else...  */
   gcc_assert (EDGE_COUNT (block->succs) >= 2);
@@ -3886,6 +3887,14 @@ do_hoist_insertion (basic_block block AT
     if (! single_pred_p (e->dest))
       return false;
 
+  /* Determine the insertion point.  If we cannot safely insert before
+     the last stmt if we'd have to, bail out.  */
+  last = gsi_last_bb (block);
+  if (!gsi_end_p (last)
+      && !is_ctrl_stmt (gsi_stmt (last))
+      && stmt_ends_bb_p (gsi_stmt (last)))
+    return false;
+
   hoistable_set = bitmap_set_new ();
   availout_in_some = BITMAP_ALLOC (&grand_bitmap_obstack);
 
@@ -3916,7 +3925,6 @@ do_hoist_insertion (basic_block block AT
       pre_expr expr = expression_for_id (i);
       unsigned int value_id = get_expr_value_id (expr);
       gimple_seq stmts = NULL;
-      gimple_stmt_iterator last;
 
       /* If the value of this expression is not available in at least one
         successor, do not hoist the value.  */
@@ -3936,7 +3944,6 @@ do_hoist_insertion (basic_block block AT
 
       find_or_generate_expression (block, expr, &stmts, NULL);
 
-      last = gsi_last_bb (block);
       if (gsi_end_p (last) || is_ctrl_stmt (gsi_stmt (last)))
        gsi_insert_seq_before (&last, stmts, GSI_SAME_STMT);
       else

Which leaves us with

/abuild/rguenther/obj/./gcc/xgcc -B/abuild/rguenther/obj/./gcc/ -B/usr/local/x86_64-unknown-linux-gnu/bin/ -B/usr/local/x86_64-unknown-linux-gnu/lib/ -isystem /usr/local/x86_64-unknown-linux-gnu/include -isystem /usr/local/x86_64-unknown-linux-gnu/sys-include    -c -g -O2 -m32 -fpic  -W -Wall -gnatpg -nostdinc -m32  a-nlrear.ads -o a-nlrear.o

raised STORAGE_ERROR : stack overflow or erroneous memory access
make[9]: *** [a-nlrear.o] Error 1

during stage3 libada build ...
Comment 39 Richard Biener 2012-02-20 11:50:00 UTC
Apart from that libada bootstrap issue (reproduces with the stage1 compiler),
the testsuite is clean apart from vectorizer testcases which show that
do_hoist_insertion is hoisting stmts across loops


  for (i = 0; i < N; i++) {
    diff = 0;
    for (j = k; j < M; j+=4) {
      diff += in[j+i]*coeff[j];
    }
    out[i] = out[i] + diff;
  }

is turned into

  for (i = 0; i < N; i++) {
    diff = 0;
    tem = out[i];
    for (j = k; j < M; j+=4) {
      diff += in[j+i]*coeff[j];
    }
    out[i] = tem + diff;
  }

which confuses outer loop vectorization enough to make it fail for

FAIL: gcc.dg/vect/vect-outer-fir-big-array.c -flto scan-tree-dump-times vect "OU
TER LOOP VECTORIZED" 2
FAIL: gcc.dg/vect/vect-outer-fir-lb-big-array.c -flto scan-tree-dump-times vect 
"OUTER LOOP VECTORIZED" 2
FAIL: gcc.dg/vect/vect-outer-fir-lb.c -flto scan-tree-dump-times vect "OUTER LOO
P VECTORIZED" 2
FAIL: gcc.dg/vect/vect-outer-fir.c -flto scan-tree-dump-times vect "OUTER LOOP V
ECTORIZED" 2
Comment 40 Bernhard Reutner-Fischer 2012-02-23 13:34:37 UTC
The ATTRIBUTE_UNUSED of do_hoist_insertion can be removed.

diff --git a/gcc/tree-ssa-pre.c b/gcc/tree-ssa-pre.c
index 0f777b4..bfc7a92 100644
--- a/gcc/tree-ssa-pre.c
+++ b/gcc/tree-ssa-pre.c
@@ -3865,7 +3865,7 @@ do_pre_partial_partial_insertion (basic_block block, basic_block dom)
    The caller has to make sure that BLOCK has at least two successors.  */
 
 static bool
-do_hoist_insertion (basic_block block ATTRIBUTE_UNUSED)
+do_hoist_insertion (basic_block block)
 {
   edge e;
   edge_iterator ei;
@@ -3878,6 +3878,13 @@ do_hoist_insertion (basic_block block ATTRIBUTE_UNUSED)
   /* At least two successors, or else...  */
   gcc_assert (EDGE_COUNT (block->succs) >= 2);
 
+  /* We cheat about AVAIL_OUT in the first block
+     so pretend we are done in the second iteration.  */
+  if (block->prev_bb
+      && block->prev_bb->index == ENTRY_BLOCK
+      && pre_stats.hoist_insert)
+    return false;
+
   /* Check that all successors of BLOCK are dominated by block.
      We could use dominated_by_p() for this, but actually there is a much
      quicker check: any successor that is dominated by BLOCK can't have
@@ -3890,9 +3897,12 @@ do_hoist_insertion (basic_block block ATTRIBUTE_UNUSED)
   availout_in_some = BITMAP_ALLOC (&grand_bitmap_obstack);
 
   /* A hoistable value must be in ANTIC_IN(block)
-     but not in AVAIL_OUT(BLOCK).  */
+     but not in AVAIL_OUT(BLOCK).
+     To give more opportunity to hoisting,
+     cheat by disregarding AVAIL_OUT of the ENTRY_BLOCK.  */
   bitmap_set_copy (hoistable_set, ANTIC_IN (block));
-  bitmap_set_subtract_values (hoistable_set, AVAIL_OUT (block));
+  if (block->prev_bb && block->prev_bb->index != ENTRY_BLOCK)
+    bitmap_set_subtract_values (hoistable_set, AVAIL_OUT (block));
 
   /* Short-cut for a common case: hoistable_set is empty.  */
   if (bitmap_empty_p (&hoistable_set->values))


so for a simplified PR5738
$ cat pr5738.c
struct foo
{
  unsigned short *p;
};

#define foo_s s
void
func (struct foo *foo_s, unsigned int *coord, _Bool delta)
{
  unsigned short change;

  if (delta)
    {
      change = *((foo_s)->p++);
      *coord += change;
    }
  else
    {
      change = *((foo_s)->p++);
      *coord += change;
//      *coord += *((foo_s)->p++) << 8;
    }
}

we end up a little bit better, with something like

func (struct foo * sD.1705, unsigned intD.9 * coordD.1706, _BoolD.1685 deltaD.1707)
{
  unsigned intD.9 pretmp.6D.1727;
  short unsigned intD.16 * pretmp.5D.1726;
  short unsigned intD.16 pretmp.4D.1725;
  short unsigned intD.16 * pretmp.3D.1724;
  short unsigned intD.16 changeD.1710;
  unsigned intD.9 D.1718;
  unsigned intD.9 D.1717;
  unsigned intD.9 D.1716;
  short unsigned intD.16 * D.1715;
  short unsigned intD.16 * D.1714;

  # BLOCK 2 freq:10000
  # PRED: ENTRY [100.0%]  (fallthru,exec)
  # VUSE <.MEMD.1720_17(D)>
  # PT = nonlocal escaped 
  pretmp.3D.1724_22 = sD.1705_2(D)->pD.1704;
  # VUSE <.MEMD.1720_17(D)>
  pretmp.4D.1725_23 = *pretmp.3D.1724_22;
  # PT = nonlocal escaped 
  pretmp.5D.1726_24 = pretmp.3D.1724_22 + 2;
  # VUSE <.MEMD.1720_17(D)>
  pretmp.6D.1727_25 = *coordD.1706_6(D);
  pretmp.6D.1727_26 = (unsigned intD.9) pretmp.4D.1725_23;
  pretmp.6D.1727_27 = pretmp.6D.1727_25 + pretmp.6D.1727_26;
  if (deltaD.1707_1(D) != 0)
    goto <bb 3>;
  else
    goto <bb 4>;
  # SUCC: 3 [39.0%]  (true,exec) 4 [61.0%]  (false,exec)

  # BLOCK 3 freq:3900
  # PRED: 2 [39.0%]  (true,exec)
  # .MEMD.1720_18 = VDEF <.MEMD.1720_17(D)>
  sD.1705_2(D)->pD.1704 = pretmp.5D.1726_24;
  # .MEMD.1720_19 = VDEF <.MEMD.1720_18>
  *coordD.1706_6(D) = pretmp.6D.1727_27;
  goto <bb 5>;
  # SUCC: 5 [100.0%]  (fallthru,exec)

  # BLOCK 4 freq:6100
  # PRED: 2 [61.0%]  (false,exec)
  # .MEMD.1720_20 = VDEF <.MEMD.1720_17(D)>
  sD.1705_2(D)->pD.1704 = pretmp.5D.1726_24;
  # .MEMD.1720_21 = VDEF <.MEMD.1720_20>
  *coordD.1706_6(D) = pretmp.6D.1727_27;
  # SUCC: 5 [100.0%]  (fallthru,exec)

  # BLOCK 5 freq:10000
  # PRED: 3 [100.0%]  (fallthru,exec) 4 [100.0%]  (fallthru,exec)
  # .MEMD.1720_16 = PHI <.MEMD.1720_19(3), .MEMD.1720_21(4)>
  # VUSE <.MEMD.1720_16>
  return;
  # SUCC: EXIT [100.0%] 

}
which translates to nearly proper code:

func:
.LFB0:
        .cfi_startproc
        movq    (%rdi), %rax    # sD.1705_2(D)->pD.1704, pretmp.3D.1724
        leaq    2(%rax), %rcx   #, pretmp.5D.1726
        movzwl  (%rax), %eax    # *pretmp.3D.1724_22, pretmp.6D.1727
        addl    (%rsi), %eax    # *coordD.1706_6(D), pretmp.6D.1727
        testb   %dl, %dl        # deltaD.1707
        movq    %rcx, (%rdi)    # pretmp.5D.1726, sD.1705_2(D)->pD.1704
        movl    %eax, (%rsi)    # pretmp.6D.1727, *coordD.1706_6(D)
        je      .L2     #,
        ret
.L2:
        ret
        .cfi_endproc

where the expected code would be something like (i think):

func:
.LFB0:
        .cfi_startproc
        movq    (%rdi), %rax    # sD.1705_2(D)->pD.1704, D.1714
        movzwl  (%rax), %edx    #* D.1714, changeD.1710
        addq    $2, %rax        #, tmp77
        movq    %rax, (%rdi)    # tmp77, sD.1705_2(D)->pD.1704
        addl    %edx, (%rsi)    # changeD.1710, *coordD.1706_6(D)
        ret
        .cfi_endproc
.LFE0:

So we just need to recognize that BB3 and BB4 are identical (everything in BB3 can be hoisted and BB4 is dead).