Bug 56982 - [4.8 Regression] Bad optimization with setjmp()
Summary: [4.8 Regression] Bad optimization with setjmp()
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 4.8.0
: P3 normal
Target Milestone: 4.9.0
Assignee: Richard Biener
URL:
Keywords: wrong-code
Depends on: 57055 57026 57036 57050 57075 57147 57190 57349 57584
Blocks: 57067
  Show dependency treegraph
 
Reported: 2013-04-16 14:31 UTC by Jeroen Demeyer
Modified: 2014-12-10 12:37 UTC (History)
2 users (show)

See Also:
Host:
Target: x86_64-pc-linux-gnu
Build:
Known to work: 4.9.0
Known to fail: 4.8.4
Last reconfirmed: 2013-04-16 00:00:00


Attachments
Bug program (268 bytes, text/x-csrc)
2013-04-16 14:31 UTC, Jeroen Demeyer
Details
Bug program (preprocessed) (8.95 KB, application/octet-stream)
2013-04-16 14:31 UTC, Jeroen Demeyer
Details
patch (1.80 KB, patch)
2013-04-17 09:57 UTC, Richard Biener
Details | Diff
another example of wrong compilation (198 bytes, text/plain)
2013-07-02 22:26 UTC, Bernd Edlinger
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Jeroen Demeyer 2013-04-16 14:31:12 UTC
Created attachment 29883 [details]
Bug program

Compile the example program with gcc -Og jmp.c -o jmp

Run the program ./jmp and the output is

Returning 1
x = 0, n = 1
Returning 0
x = 42, n = 1
Aborted

The function g() is returning 0 the second time (after longjmp()) but the return value, assigned to n, equals 1. With other optimization levels or with earlier versions of gcc or with -Og -fno-tree-dominator-opts, the output is what I expect:

Returning 1
x = 0, n = 1
Returning 0
x = 42, n = 0

This is with gcc version 4.8.0, GNU libc version 2.15.
Comment 1 Jeroen Demeyer 2013-04-16 14:31:54 UTC
Created attachment 29884 [details]
Bug program (preprocessed)
Comment 2 Jakub Jelinek 2013-04-16 15:46:43 UTC
Also happens with -Os.  Started with http://gcc.gnu.org/r190284
Eyeballing the difference between r190283 and r190284 -Os dumps, I see that
*.copyrename4 is still the same appart from losing some SSA_NAME_VARs, but
*.uncprop has one difference:
-  # n_13 = PHI <0(5), 1(6)>
+  # _13 = PHI <_3(5), 1(6)>

The missing n is fine, but _3 instead of 0 there supposedly isn't.
_3 is set before the setjmp call, while this PHI is after the returns-twice function, and while for non-zero _3 the code before setjmp exits early, so _3 should contain 0, perhaps while extending the lifetime of _3 over the returns twice function it should have added SSA_NAME_OCCURS_IN_ABNORMAL_PHI or give up on it.  Richard, can you please have a look?
Comment 3 Jakub Jelinek 2013-04-16 16:04:10 UTC
At RTL time (besides it being a pessimization), the thing is that _3 is assigned a pseudo (compared to before the change, where it had only a single use and thus has been TERed), that pseudo is initialized from *e, and initialized to 1 in the bb where the PHI had 1, and the pseudo is spilled to the stack.  So it will initially contain the 0 value (correct), then that is overwritten with 1 (fine for the first setjmp returning case), but when setjmp returns second time, the value is still 1 rather than 0.

So the questions are:
- is it desirable that uncprop does anything to SSA_NAME_VAR == NULL phis?
- shouldn't something like that be not performed if current function calls setjmp (or more narrowly, if there is a returns twice function somewhere in between the considered setter and user)?
- what other optimizations might be similarly problematic across returns twice calls?
Comment 4 Richard Biener 2013-04-17 08:26:20 UTC
I will have a look.
Comment 5 Richard Biener 2013-04-17 08:48:33 UTC
So the questions are:
- is it desirable that uncprop does anything to SSA_NAME_VAR == NULL phis?

sure - it is all about improving out-of-SSA coalescing opportunities
and avoiding copies

- shouldn't something like that be not performed if current function calls
setjmp (or more narrowly, if there is a returns twice function somewhere in
between the considered setter and user)?

the testcase shows that uncprop extends the lifetime of an SSA name
across a setjmp call - but it can only do so because it's an SSA name.
Which means the testcase is questionable as 'n' is not declared volatile, no?

- what other optimizations might be similarly problematic across returns twice
calls?

every optimization pass that performs hosting.  PRE comes to my mind for

  if (x)
    tem = expr;
  setjmp ()
  var = expr;

which would happily eliminate the partial redundancy, moving expr to the
else arm of the if () and thus extending the lifetime of 'var' across
the setjmp call.

We do not explicitely model the abnormal control flow for setjmp / longjmp
which is the reason all these issues may appear.  So I believe the
correct fix is to either declare the testcase invalid or to model
the abnormal control flow explicitely. Add abnormal edges from all
call sites in the function that may end up calling longjmp _and_ eventually an
abnormal edge from function entry as we can call longjmp from callers
as well (though that may be invalid and thus we do not have to care?).

I don't see an "easy" fix for the issue (well, maybe the specific testcase).
That it happens only after my patch is probably pure luck because of for
example the PRE issue.  Testcase for that:

int f (int a, int flag)
{
  int tem;
  if (flag)
    tem = a + 1;
  int x = setjmp (env);
  int tem2 = a + 1;
  if (x)
    return tem2;
  return tem;
}

validity of course is questionable, but we clearly use tem only on the
normal path and tem2 on the abnormal path.  PRE does the transform
I indicated, proper abnormal edges would disable the transform.
Comment 6 Jakub Jelinek 2013-04-17 08:56:00 UTC
I don't see how we could declare the testcase invalid, why would n need to be volatile?  It isn't live across the setjmp call, it is even declared after the setjmp call, and it is always initialized after the setjmp call.
Comment 7 rguenther@suse.de 2013-04-17 09:07:10 UTC
On Wed, 17 Apr 2013, jakub at gcc dot gnu.org wrote:

> 
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56982
> 
> --- Comment #6 from Jakub Jelinek <jakub at gcc dot gnu.org> 2013-04-17 08:56:00 UTC ---
> I don't see how we could declare the testcase invalid, why would n need to be
> volatile?  It isn't live across the setjmp call, it is even declared after the
> setjmp call, and it is always initialized after the setjmp call.

Then there is no other way but to model the abnormal control flow
properly.  Even simple CSE can break things otherwise.  Consider

int tmp = a + 1;
setjmp ()
int tmp2 = a + 1;

even on RTL CSE would break that, no?  setjmp doesn't even
forcefully start a new basic-block.

Hmm, maybe doing that, start a new BB for all returns-twice
calls and add an abnormal edge from FN entry is enough to
avoid all possibly dangerous transforms.

Richard.
Comment 8 Jakub Jelinek 2013-04-17 09:28:55 UTC
#include <stdio.h>
#include <setjmp.h>

static sigjmp_buf env;
static inline int g (int x)
{
  if (x)
    {   
      fprintf (stderr, "Returning 0\n");
      return 0;
    }
  else
    { 
      fprintf (stderr, "Returning 1\n");
      return 1;
    }
}
__attribute__ ((noinline))
void bar (int n)
{
  if (n == 0)
    exit (0);
  static int x;
  if (x++) abort ();
  longjmp (env, 42);
}
int
f (int *e)
{
  int n = *e;
  if (n)
    return 1;
  int x = setjmp (env);
  n = g (x);
  fprintf (stderr, "x = %i, n = %i\n", x, n);
  bar (n);
}
int
main ()                     
{
  int v = 0;
  return f (&v);
}

Adjusted testcase that fails even with GCC 4.7.2 at -O2, works with -O2 -fno-dominator-opts (which disables uncprop).  Again, I don't see how
this could be declared invalid, while n is declared before the setjmp, it is
not live across the setjmp call.  This adjusted testcase regressed in April 2005 (i.e. 4.1+ regression).
Comment 9 Richard Biener 2013-04-17 09:57:52 UTC
Created attachment 29889 [details]
patch

Untested patch.  The patch handles setjmp similar to a non-local label,
thus force it to start a new basic-block and get abnormal edges from all
sites that can make a non-local goto or call longjmp.

Fixes the testcase for me.  Somewhat reduced:

#include <stdio.h>
#include <stdlib.h>
#include <setjmp.h>

static sigjmp_buf env;

static inline int g(int x)
{
    if (x)
    {
        fprintf(stderr, "Returning 0\n");
        return 0;
    }
    else
    {
        fprintf(stderr, "Returning 1\n");
        return 1;
    }
}

int f(int *e)
{
    if (*e)
      return 1;

    int x = setjmp(env);
    int n = g(x);
    if (n == 0)
      exit(0);
    if (x)
      abort();
    longjmp(env, 42);
}

int main(int argc, char** argv)
{
    int v = 0;
    return f(&v);
}

but I cannot remove the remaining printfs, so it's not appropriate for the
testsuite yet.
Comment 10 Leif Leonhardy 2013-04-18 01:17:31 UTC
"One proposed requirement on setjmp is that it be usable like any other function, that is, that it be callable in *any* expression context, and that the expression evaluate correctly whether the return from setjmp is direct or via a call to longjmp. Unfortunately, any implementation of setjmp as a conventional called function cannot know enough about the calling environment to save any temporary registers or dynamic stack locations used part way through an expression evaluation. [...] The temporaries may be correct on the initial call to setjmp, but are not likely to be on any return initiated by a corresponding call to longjmp. These considerations dictated the constraint that setjmp be called only from within fairly simple expressions, ones not likely to need temporary storage.

An alternative proposal considered by the C89 Committee was to require that implementations recognize that calling setjmp is a special case, and hence that they take whatever precautions are necessary to restore the setjmp environment properly upon a longjmp call. This proposal was rejected on grounds of consistency: implementations are currently allowed to implement library functions specially, but no other situations require special treatment."


So according to this (The C99 Rationale [1], page 139 ff., likewise the Single UNIX Specification), here setjmp() is simply used in an invalid context (i.e., in an assignment statement). ;-)

Still, with -Og at least, GCC 4.8.0 produces wrong code even if setjmp() is used in an "allowed" context (as in e.g. if (setjmp(...)>0) ..., or switch (setjmp(...)) { ... }), and no matter whether n is declared volatile or not.


[1] http://www.open-std.org/jtc1/sc22/wg14/www/C99RationaleV5.10.pdf
Comment 11 Richard Biener 2013-04-19 13:40:05 UTC
Author: rguenth
Date: Fri Apr 19 13:39:16 2013
New Revision: 198096

URL: http://gcc.gnu.org/viewcvs?rev=198096&root=gcc&view=rev
Log:
2013-04-19  Richard Biener  <rguenther@suse.de>

	PR tree-optimization/56982
	* builtins.def (BUILT_IN_LONGJMP): longjmp is not a leaf
	function.
	* gimplify.c (gimplify_call_expr): Notice special calls.
	(gimplify_modify_expr): Likewise.
	* tree-cfg.c (make_abnormal_goto_edges): Handle setjmp-like
	abnormal control flow receivers.
	(call_can_make_abnormal_goto): Handle cfun->calls_setjmp
	in the same way as cfun->has_nonlocal_labels.
	(gimple_purge_dead_abnormal_call_edges): Likewise.
	(stmt_starts_bb_p): Make setjmp-like abnormal control flow
	receivers start a basic-block.

	* gcc.c-torture/execute/pr56982.c: New testcase.

Added:
    trunk/gcc/testsuite/gcc.c-torture/execute/pr56982.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/builtins.def
    trunk/gcc/gimplify.c
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/tree-cfg.c
Comment 12 Jakub Jelinek 2013-05-31 10:58:28 UTC
GCC 4.8.1 has been released.
Comment 13 Bernd Edlinger 2013-07-02 22:26:34 UTC
Created attachment 30431 [details]
another example of wrong compilation

This is another example where the optimization can
go wrong.

The attached program produces expected results if
compiled with -O0:
x=0, a=1
x=1, a=1
a=1

But if compiled with -O3 and if the value "a" is placed
in a register the result is like this:
x=0, a=1
x=1, a=0
a=0

That is because longjmp has more semantic than just a branch:
It branches to the setjmp, and restores all callee saved registers to
the previos value.
Comment 14 Mikael Pettersson 2013-07-02 23:13:51 UTC
(In reply to Bernd Edlinger from comment #13)
> Created attachment 30431 [details]
> another example of wrong compilation
> 
> This is another example where the optimization can
> go wrong.
> 
> The attached program produces expected results if
> compiled with -O0:
> x=0, a=1
> x=1, a=1
> a=1
> 
> But if compiled with -O3 and if the value "a" is placed
> in a register the result is like this:
> x=0, a=1
> x=1, a=0
> a=0
> 
> That is because longjmp has more semantic than just a branch:
> It branches to the setjmp, and restores all callee saved registers to
> the previos value.

Your example is invalid C.  Referring to WG14 n1494.pdf (there may be more recent C1x documents, but it's the one I had available right now):

- you violate 7.13.1.1 which specifies where setjmp() may be called, an assignment statement is not one of the permitted contexts

- more importantly, your auto variable a is not volatile-qualified, which means that its value is indeterminate after the longjmp (7.13.2.1).

Please fix these issues and check again if it yields wrong results.
Comment 15 Bernd Edlinger 2013-07-03 08:05:28 UTC
(In reply to Mikael Pettersson from comment #14)
> Your example is invalid C.  Referring to WG14 n1494.pdf (there may be more
> recent C1x documents, but it's the one I had available right now):
> 
> - you violate 7.13.1.1 which specifies where setjmp() may be called, an
> assignment statement is not one of the permitted contexts
> 
> - more importantly, your auto variable a is not volatile-qualified, which
> means that its value is indeterminate after the longjmp (7.13.2.1).
> 
> Please fix these issues and check again if it yields wrong results.

Thanks for pointing that out.

When I add volatile to the auto variable, the code is OK.
Comment 16 Jakub Jelinek 2013-10-16 09:49:02 UTC
GCC 4.8.2 has been released.
Comment 17 Richard Biener 2014-05-22 09:03:58 UTC
GCC 4.8.3 is being released, adjusting target milestone.
Comment 18 Richard Biener 2014-12-10 12:37:04 UTC
wontfix for 4.8, the fix triggered very many latent issues.