User account creation filtered due to spam.

Bug 78812 - [5/6 Regression] Wrong code generation due to hoisting memory load across function call
Summary: [5/6 Regression] Wrong code generation due to hoisting memory load across fun...
Status: NEW
Alias: None
Product: gcc
Classification: Unclassified
Component: rtl-optimization (show other bugs)
Version: 6.2.1
: P2 normal
Target Milestone: 5.5
Assignee: Jeffrey A. Law
URL:
Keywords: wrong-code
Depends on:
Blocks:
 
Reported: 2016-12-14 16:19 UTC by Ulrich Weigand
Modified: 2017-01-06 07:44 UTC (History)
8 users (show)

See Also:
Host:
Target: s390x-linux
Build:
Known to work:
Known to fail:
Last reconfirmed: 2016-12-15 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Ulrich Weigand 2016-12-14 16:19:24 UTC
Compiling the following test case with g++ -Os -fpic on s390x-ibm-linux results in abort() being called unexpectedly:

extern "C" void abort (void) __attribute__ ((__noreturn__));

class Transaction
{
public:
  bool Aborted;

  Transaction () : Aborted (false) { }

  ~Transaction ()
  {
    if (!Aborted)
      abort ();
  }
};

void test (Transaction &Trans) __attribute__ ((noinline));
void test (Transaction &Trans)
{
  Trans.Aborted = true;
}

int main (void)
{
  Transaction T;
  test (T);
}


What happens is that the destructor for Transaction is inlined into main (twice, once in the regular exit and once in the exception path).  The load of the Aborted member is then moved by the code hoisting pass (pass_rtl_hoist) in gcse.c to a single location in basic block 2.

However, that basic block ends with the call to the "test" routine, and for some reason, code hoisting thinks it therefore needs to move the hoisted instruction to *before* that call (see the CALL_P case of insert_insn_end_basic_block).  This causes wrong code generation since that call actually modifies the memory that is being loaded.

There seems to be other code that appears intended to recognize that case and avoid hoisting then (see prune_expressions), but that apparently only looks for abnormal edges, which we don't have here.

Not sure how this is supposed to work correctly ...
Comment 1 Jakub Jelinek 2016-12-15 09:21:41 UTC
Doesn't seem to be a recent regression, at least already r187763 is doing that.
Comment 2 Jakub Jelinek 2016-12-15 11:41:04 UTC
But in the end it actually is a regression that started r166555 (previously it has been latent).

Slightly adjusted testcase:
struct T
{
  bool a;
  T () : a (false) {}
  ~T () { if (!a) __builtin_abort (); }
};

__attribute__((noinline))
void
test (T &x)
{
  x.a = true;
}

int
main ()
{
  T T;
  test (T);
}
Comment 3 Jakub Jelinek 2016-12-15 16:30:15 UTC
The problem is that the hoisting code seems to assume the hoisting would be performed at the end of a bb, but that is not what insert_insn_end_basic_block will then actually do.  But by the time insert_insn_end_basic_block is called, it is too late to punt or find another bb to hoist it to.
compute_transp only analyzes transparency across the whole bb (to my surprise, it considers all calls as potentially clobbering MEMs, not just non-const/pure functions).
So I think we want to have some predicate for what insert_insn_end_basic_block does to emit before the end of bb while searching for which bb to hoist to, and if that predicate is true, if it is a CALL (non-const/pure?) then not consider that bb for expressions containing MEMs (dunno if jumps could clobber MEMs too).
Or just check for this and occrs_to_hoist.release (); if it happens in the
if (hoistable > 1 && dbg_cnt (hoist_insn)) block or so?
Comment 4 Richard Biener 2016-12-16 09:44:10 UTC
Can't we insert on the edge and commit edge insertions late?  Or simply split the block for this case?
Comment 5 Jakub Jelinek 2016-12-16 09:50:30 UTC
(In reply to Richard Biener from comment #4)
> Can't we insert on the edge and commit edge insertions late?  Or simply
> split the block for this case?

Well, for the case of calls there typically are 2 edges, not just one (the abnormal one and normal fallthru), except for noreturn throwing calls.
So inserting on the edges would usually mean undoing the hoisting.
Note if the expression doesn't contain MEM (and isn't clobbered by the call, but that should just affect hard registers), hoisting it before the call is just fine.
Comment 6 rguenther@suse.de 2016-12-16 09:56:15 UTC
On Fri, 16 Dec 2016, jakub at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78812
> 
> --- Comment #5 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
> (In reply to Richard Biener from comment #4)
> > Can't we insert on the edge and commit edge insertions late?  Or simply
> > split the block for this case?
> 
> Well, for the case of calls there typically are 2 edges, not just one (the
> abnormal one and normal fallthru), except for noreturn throwing calls.
> So inserting on the edges would usually mean undoing the hoisting.
> Note if the expression doesn't contain MEM (and isn't clobbered by the call,
> but that should just affect hard registers), hoisting it before the call is
> just fine.

But then sth is wrong with the dataflow problem ... maybe it somehow
assumes that a signle outgoing edge is always fallthru (and not EH)?
That is, inserting on EH edges is "tricky" at best...
Comment 7 Jakub Jelinek 2016-12-16 10:10:24 UTC
(In reply to rguenther@suse.de from comment #6)
> On Fri, 16 Dec 2016, jakub at gcc dot gnu.org wrote:
> 
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78812
> > 
> > --- Comment #5 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
> > (In reply to Richard Biener from comment #4)
> > > Can't we insert on the edge and commit edge insertions late?  Or simply
> > > split the block for this case?
> > 
> > Well, for the case of calls there typically are 2 edges, not just one (the
> > abnormal one and normal fallthru), except for noreturn throwing calls.
> > So inserting on the edges would usually mean undoing the hoisting.
> > Note if the expression doesn't contain MEM (and isn't clobbered by the call,
> > but that should just affect hard registers), hoisting it before the call is
> > just fine.
> 
> But then sth is wrong with the dataflow problem ... maybe it somehow
> assumes that a signle outgoing edge is always fallthru (and not EH)?
> That is, inserting on EH edges is "tricky" at best...

Dunno, haven't looked into it in detail.  The hoisting has been from 2 spots, one after the call's fallthrough and another from the EH landing pad of the call (which is why emitting the expression after the call's fallthrough and duplicating after the EH landing pad would effectively undo the optimization).
Comment 8 rguenther@suse.de 2016-12-16 11:30:49 UTC
On Fri, 16 Dec 2016, jakub at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78812
> 
> --- Comment #7 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
> (In reply to rguenther@suse.de from comment #6)
> > On Fri, 16 Dec 2016, jakub at gcc dot gnu.org wrote:
> > 
> > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78812
> > > 
> > > --- Comment #5 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
> > > (In reply to Richard Biener from comment #4)
> > > > Can't we insert on the edge and commit edge insertions late?  Or simply
> > > > split the block for this case?
> > > 
> > > Well, for the case of calls there typically are 2 edges, not just one (the
> > > abnormal one and normal fallthru), except for noreturn throwing calls.
> > > So inserting on the edges would usually mean undoing the hoisting.
> > > Note if the expression doesn't contain MEM (and isn't clobbered by the call,
> > > but that should just affect hard registers), hoisting it before the call is
> > > just fine.
> > 
> > But then sth is wrong with the dataflow problem ... maybe it somehow
> > assumes that a signle outgoing edge is always fallthru (and not EH)?
> > That is, inserting on EH edges is "tricky" at best...
> 
> Dunno, haven't looked into it in detail.  The hoisting has been from 2 spots,
> one after the call's fallthrough and another from the EH landing pad of the
> call (which is why emitting the expression after the call's fallthrough and
> duplicating after the EH landing pad would effectively undo the optimization).

Ok, but we obviously can't hoist to this place so presumambly have to
reflect that in the dataflow somehow (or prune its solution).  We have

          if (!pre_p && MEM_P (expr->expr))
            /* Note memory references that can be clobbered by a call.
               We do not split abnormal edges in hoisting, so would
               a memory reference get hoisted along an abnormal edge,
               it would be placed /before/ the call.  Therefore, only
               constant memory references can be hoisted along abnormal
               edges.  */
            {

but exactly the same reasoning applies to EH edges.  Thus

Index: gcse.c
===================================================================
--- gcse.c      (revision 243738)
+++ gcse.c      (working copy)
@@ -1757,7 +1757,7 @@ prune_expressions (bool pre_p)
         and, hence, there's no need to conservatively prune expressions 
on
         "intermediate" set-and-jump instructions.  */
       FOR_EACH_EDGE (e, ei, bb->preds)
-       if ((e->flags & EDGE_ABNORMAL)
+       if ((e->flags & (EDGE_ABNORMAL|EDGE_EH))
            && (pre_p || CALL_P (BB_END (e->src))))
          {
            bitmap_and_compl (antloc[bb->index],

doing that if only pre_p is probably not necessary (but it'll be
only !CALL_P for non-call-exceptions).
Comment 9 Jeffrey A. Law 2016-12-21 22:21:39 UTC
So it's been a long time...  And IIRC, rth was the one that fixed this wart in gcse.


A BB that ends with a call that can throw or any other abnormal edge is supposed to suppress hoisting for precisely the reasons outlined in this BZ.  Essentially we need to insert on the edge, but we can't (easily) do that for the EH/abnormal edge and we can't safely insert before the call.

Theother possibility was that we kill the expressions in the EH landing pads.  I thought we looked at that as well.
Comment 10 Jeffrey A. Law 2017-01-04 23:09:12 UTC
So EDGE_EH implies EDGE_ABNORMAL, so the patch in c#8 isn't going to have any impact.  The real bug is more subtle.

prune_expressions walks the expression table looking for expressions that occur in blocks with abnormal edges.  We then remove those expressions from antic/transp.

The bug is it uses MEM_P.  Which is (of course) false if the given RTX has an embedded MEM.  A good example would be:

(zero_extend:DI (mem/c:QI (plus:DI (reg/f:DI 34 %fp)
            (const_int -1 [0xffffffffffffffff])) [2 T.a+0 S1 A8]))

Since that's not a MEM_P, the expression isn't removed from antic/transp which makes it subject to hoisting across the abnormal edge.

This could be easily fixed by walking into the ZERO/SIGN_EXTEND rtx, but there may be other places where an embedded MEM might appear (hell, on a cisc machine it would be just about anywhere).  So I want to do a bit of auditing of gcse.c.
Comment 11 Jakub Jelinek 2017-01-04 23:16:09 UTC
(In reply to Jeffrey A. Law from comment #10)
> Since that's not a MEM_P, the expression isn't removed from antic/transp
> which makes it subject to hoisting across the abnormal edge.
> 
> This could be easily fixed by walking into the ZERO/SIGN_EXTEND rtx, but
> there may be other places where an embedded MEM might appear (hell, on a
> cisc machine it would be just about anywhere).  So I want to do a bit of
> auditing of gcse.c.

Can't you just use contains_mem_rtx_p (move it from ifcvt.c to rtlanal.c and export)?
Comment 12 Jeffrey A. Law 2017-01-04 23:33:21 UTC
I'm mostly concerned about other places where we assume that a memory reference is supposed to show up at the toplevel of a source/dest.

For example, it looks like we don't properly handle the case where we have a REG_EQUAL note of the form (any_extend (mem (...)).

THankfully, most of gcse.c will DTRT with such an rtx and the cases for REG_EQUAL notes don't appear to be correctness issues.  The only correctness issue so far is prune_expressions.

I need to look at postreload-gcse.c as well.

All those cases will need to use a helper of some sort to look inside the rtx.  The mechanics of that (such as using contains_mem_rtx_p) aren't terribly concerning/interesting at this stage.
Comment 13 Jeffrey A. Law 2017-01-04 23:56:08 UTC
So I  see a case in postreload-gcse.c where we might mis-handle when the destination is a ZERO_EXTRACT or STRICT_LOW_PART.  Neither happen often which is probably why we've never noticed.
Comment 14 Jeffrey A. Law 2017-01-05 07:39:20 UTC
Author: law
Date: Thu Jan  5 07:38:48 2017
New Revision: 244093

URL: https://gcc.gnu.org/viewcvs?rev=244093&root=gcc&view=rev
Log:
	PR tree-optimizatin/78812
	* rtl.h (contains_mem_rtx_p): Prototype.
	* ifcvt.c (containts_mem_rtx_p): Move from here to...
	* rtlanal.c (contains_mem_rtx_p): Here and remvoe static linkage.
	* gcse.c (prune_expressions): Use contains_mem_rtx_p to discover
	and prune MEMs that are not at the toplevel of a SET_SRC rtx.  Look
	through ZERO_EXTEND and SIGN_EXTEND when trying to avoid pruning
	MEMs.

	PR tree-optimization/78812
	* g++.dg/torture/pr78812.C: New test.

Added:
    trunk/gcc/testsuite/g++.dg/torture/pr78812.C
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/gcse.c
    trunk/gcc/ifcvt.c
    trunk/gcc/rtl.h
    trunk/gcc/rtlanal.c
    trunk/gcc/testsuite/ChangeLog
Comment 15 rguenther@suse.de 2017-01-05 08:12:38 UTC
On Wed, 4 Jan 2017, law at redhat dot com wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78812
> 
> Jeffrey A. Law <law at redhat dot com> changed:
> 
>            What    |Removed                     |Added
> ----------------------------------------------------------------------------
>            Assignee|unassigned at gcc dot gnu.org      |law at redhat dot com
> 
> --- Comment #10 from Jeffrey A. Law <law at redhat dot com> ---
> So EDGE_EH implies EDGE_ABNORMAL, so the patch in c#8 isn't going to have any
> impact.  The real bug is more subtle.

EDGE_EH does not imply EDGE_ABNORMAL.
Comment 16 Jeffrey A. Law 2017-01-05 17:56:49 UTC
ABNORMAL_CALL and EH imply ABNORMAL, at least for RTL CFG according to their documentation.  I haven't gone through the trouble to see where that is enforced, but I did very that the edges had EH and ABNORMAL set in the dumps.
Comment 17 Eric Botcazou 2017-01-05 18:02:13 UTC
> ABNORMAL_CALL and EH imply ABNORMAL, at least for RTL CFG according to their
> documentation.  I haven't gone through the trouble to see where that is
> enforced, but I did very that the edges had EH and ABNORMAL set in the dumps.

This was historically the case in GIMPLE too but Jan changed it at some point.