Bug 56881 - Miscompilation (optimisation failure?) causing NULL dereference and segfault at runtime
Summary: Miscompilation (optimisation failure?) causing NULL dereference and segfault ...
Status: RESOLVED INVALID
Alias: None
Product: gcc
Classification: Unclassified
Component: other (show other bugs)
Version: 4.8.0
: P3 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords: wrong-code
Depends on:
Blocks:
 
Reported: 2013-04-08 16:48 UTC by devspam
Modified: 2013-04-14 15:24 UTC (History)
1 user (show)

See Also:
Host: x86_64-linux-gnu / glibc 2.17
Target: x86_64-linux-gnu
Build:
Known to work: 4.7.2
Known to fail: 4.8.0
Last reconfirmed:


Attachments
Source which triggers the problem at -O2 (14.09 KB, text/x-csrc)
2013-04-08 16:48 UTC, devspam
Details
Full source of the problem program, both in original form and fully pre-processed (135.39 KB, application/x-gzip)
2013-04-10 14:49 UTC, devspam
Details

Note You need to log in before you can comment on or make changes to this bug.
Description devspam 2013-04-08 16:48:28 UTC
Created attachment 29828 [details]
Source which triggers the problem at -O2

We (Unvanquished devs) have found what looks like an optimisation bug in gcc 4.8.0. The ‘offending’ code is this (lines 60 to 62 of src/tools/lcc/cpp/hideset.c):

  hs1 = (Hideset)domalloc(len*sizeof(Hideset));
  memmove(hs1, nhs, len*sizeof(Hideset));
  hidesets[nhidesets] = hs1;

The allocation succeeds (domalloc is a malloc wrapper), the memmove is fine, but hs1 somehow becomes NULL. So next time this function is called, there's a NULL pointer which gets dereferenced a few lines above.

Putting a function call in between the first two lines above works around the problem, as does swapping the latter two lines (as is done in https://github.com/Unvanquished/Unvanquished/commit/9157ac0d3668fc059ce001620bbfa45ccf66c8df).

Pre-processed source is attached. I'm able to cause the problem with -Os, -O2 or -O3 but not -O0 or -O; I can try enabling or disabling specific optimisations.

I'm using stock gcc-4.8 4.8.0-2 (Debian experimental), but another of us, an Arch user, is using core/gcc 4.8.0-1 (base-devel) and is seeing the same problem (which is why I'm reporting it here rather than in the Debian BTS).

Architecture is amd64 in both cases.
Comment 1 Chung-Ju Wu 2013-04-09 09:52:35 UTC
(In reply to comment #0)
> 
>   hs1 = (Hideset)domalloc(len*sizeof(Hideset));
>   memmove(hs1, nhs, len*sizeof(Hideset));
>   hidesets[nhidesets] = hs1;
> 

IMHO, if domalloc() does return NULL for some cases,
having NULL-checking statement before/inside memmove is required.
Comment 2 devspam 2013-04-09 13:57:08 UTC
(In reply to comment #1)
> IMHO, if domalloc() does return NULL for some cases,
> having NULL-checking statement before/inside memmove is required.

It doesn't return NULL – if malloc() returns null, domalloc() will report that and exit. But even if it did, that doesn't explain the problem occurring only at some optimisation levels.

Anyway. I've done some more testing. The problem is (or is related to) -fcaller-saves: -O2 and -Os both trigger the problem, but add -fno-caller-saves and all is well.
Comment 3 Mikael Pettersson 2013-04-09 14:45:33 UTC
The test case is incomplete, as it lacks both main() and domalloc().  Please add those (in a separate file if you like) so that the test case can be compiled to an executable, and the presence or absence of a runtime failure can be observed.
Comment 4 devspam 2013-04-10 14:49:24 UTC
Created attachment 29850 [details]
Full source of the problem program, both in original form and fully pre-processed
Comment 5 devspam 2013-04-10 15:02:54 UTC
The tarball which I've attached also provides its own test case – compile it then pass it one of its own source files. It'll either segfault or not depending on compile-time optimisation settings.
Comment 6 Mikael Pettersson 2013-04-13 17:53:30 UTC
Thanks for the complete test case.  I can reproduce the apparent wrong-code (runtime SEGV) on x86_64-linux w/ glibc-2.15 with gcc 4.9-20130407 and 4.8-20130411, but not with 4.7-20130406 or 4.6-20130405.
Comment 7 Mikael Pettersson 2013-04-13 20:39:03 UTC
Started with Bernd Schmidt's "Optimize calls to functions that return one of their arguments" patch in http://gcc.gnu.org/r187459, originally proposed in
<http://gcc.gnu.org/ml/gcc-patches/2012-04/msg01817.html>.
Comment 8 Mikael Pettersson 2013-04-14 08:56:47 UTC
The error is in the test case.  It overrides the libc memmove() with its own implementation, but that implementation fails to follow the specification.  In particular, it returns NULL rather than memmove()'s first parameter.

GCC now optimizes based on this aspect of the specification, so things go wrong at runtime.

Correcting the test case as follows allows it to work with gcc 4.8 and 4.9:

--- unix.c.~1~  2013-03-06 23:17:26.000000000 +0100
+++ unix.c      2013-04-14 10:45:24.651407693 +0200
@@ -110,7 +110,7 @@ memmove(void *dp, const void *sp, size_t
        unsigned char *cdp, *csp;
 
        if (n<=0)
-               return 0;
+               return dp;
        cdp = dp;
        csp = (unsigned char *)sp;
        if (cdp < csp) {
@@ -124,6 +124,6 @@ memmove(void *dp, const void *sp, size_t
                        *--cdp = *--csp;
                } while (--n);
        }
-       return 0;
+       return dp;
 }
 #endif

Not a bug in GCC.  Please close as INVALID.
Comment 9 devspam 2013-04-14 15:03:13 UTC
Hmm, interesting…

I'm taking your patch (crediting you) and paraphrasing where necessary to make a reasonable commit message of it. I'll also forward to other projects which I know use the same code.
Comment 10 Steven Bosscher 2013-04-14 15:24:28 UTC
Not a gcc bug.