Bug 66230 - Using optimizations causes program to segfault
Summary: Using optimizations causes program to segfault
Status: RESOLVED INVALID
Alias: None
Product: gcc
Classification: Unclassified
Component: c (show other bugs)
Version: 4.9.2
: P3 major
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-05-20 22:05 UTC by gpnuma
Modified: 2015-05-21 15:53 UTC (History)
0 users

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2015-05-21 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description gpnuma 2015-05-20 22:05:35 UTC
Hello,

First I'd like to point out that the code producing this error compiles and runs fine in gcc 4.8.4-1 for Linux and OS/X and Clang 3.5, 3.6 (Linux) and 6.1 (OS/X), but fails with gcc 4.9.2 when using -O3 on both Linux and OS/X (in debug mode it works fine).
The platform used for all these tests was x86_64.

To reproduce : clone and build https://github.com/centaurean/density with gcc 4.9.2.
Then run the following command :
./benchmark -3 some_file

This will turn into a segfault.

When I add the line [printf("anything");] just before DENSITY_MEMCPY(... here : https://github.com/centaurean/density/blob/master/src/kernel_lion_decode.c#L187, the program runs again normally without any segfault... that's really super strange.

The function that fails is called via an array of function pointers, but I don't think that's the problem since it works with any other compiler.
Comment 1 Markus Trippelsdorf 2015-05-21 06:03:17 UTC
Please attache a small self-contained testcase. 
Nobody here has the time to clone and build random projects.

And normally issues like these are caused by invoking undefined behavior.
Try to build the project with -fsanitize=undefined and see what runtime
errors it reports.
Comment 2 gpnuma 2015-05-21 09:32:36 UTC
I understand you're short of time but this problem is very difficult to reproduce !!

I did try to compile and link with -fsanitize=undefined this morning, now here's the interesting part :
* no warning was generated by ubsan 
* everything works fine
As soon as I remove -fsanitize=undefined, I get the segfault again, so I suspect the problem happens during the optimization stages.

The fact that if I add a useless line of code like printf("...") at the start of the called function cancelling the problem makes me wonder if it could be that the function pointer is not properly "captured" by gcc or that it "changes" after optimizations.

Here is what I'm doing to be more accurate :
1) I have a set of functions at the top of a file (functionA, functionB, ...)
2) At the bottom of that file I have another function which stores the function pointers of these functions using &functionA, &functionB etc... in an array.
3) Later on, I access the functions using an index to that array, and with gcc 4.8 / -O3 *only*, this fails and segfaults.

So my thinking is maybe the function pointers are stored correctly, but then the optimizer "changes" this function's address or the function itself making the initial pointer wrong which leads to a segfault... just a wild guess.
I think that adding the printf or a void function maybe adds some sort of "unoptimizable" code at the start (like IO) and therefore the initial stored pointer is unchanged after optimizations.
Oh yeah, it's worth mentioning that otherwise (if I don't put a bogus printf) the first line of code of the function is a __builtin_memcpy which is probably highly optimizable.

I'll try to come up with a short code example if I get the time later on.

Thank you
Guillaume
Comment 3 Markus Trippelsdorf 2015-05-21 09:38:38 UTC
Another thing you might try is to use: -fno-strict-aliasing -fwrapv -fno-aggressive-loop-optimizations (as per http://gcc.gnu.org/bugs/)
and see if the issue goes away, too.
Comment 4 gpnuma 2015-05-21 09:55:38 UTC
Sorry I meant gcc 4.9.2 / -O3 of course, 4.8 works fine.
Comment 5 gpnuma 2015-05-21 10:04:50 UTC
Ok I did just try "-fno-strict-aliasing -fwrapv -fno-aggressive-loop-optimizations" and the issue is still there.

If I add the printf("something"); at the top of the function, everything works normally.
Comment 6 Markus Trippelsdorf 2015-05-21 10:12:03 UTC
OK you got me interested, so I've downloaded and build the app.
With gcc-5 and -fsanitize=undefined I get many alignment errors:

 Pre-heating ...
../src/kernel_lion_encode.c:182:54: runtime error: store to misaligned address 0x7fa7274ed3d4 for type 'uint64_t', which requires 8 byte alignment
0x7fa7274ed3d4: note: pointer points here
  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
              ^ 
../src/kernel_lion_encode.c:169:52: runtime error: load of misaligned address 0x7fa7273d5da8 for type '__int128 unsigned', which requires 16 byte alignment
0x7fa7273d5da8: note: pointer points here
 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00
              ^ 
../src/kernel_lion_encode.c:182:54: runtime error: store to misaligned address 0x7fa72758b204 for type 'uint64_t', which requires 8 byte alignment
0x7fa72758b204: note: pointer points here
  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
              ^ 
../src/kernel_lion_encode.c:169:52: runtime error: load of misaligned address 0x7fa727415a98 for type '__int128 unsigned', which requires 16 byte alignment
0x7fa727415a98: note: pointer points here
 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00
              ^ 
../src/kernel_lion_encode.c:182:56: runtime error: load of misaligned address 0x7fa727504e9c for type 'uint64_t', which requires 8 byte alignment
0x7fa727504e9c: note: pointer points here
  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
              ^ 
../src/kernel_lion_encode.c:182:56: runtime error: load of misaligned address 0x7fa72754de44 for type 'uint64_t', which requires 8 byte alignment
0x7fa72754de44: note: pointer points here
  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
              ^ 
../src/kernel_lion_encode.c:182:56: runtime error: load of misaligned address 0x7fa72751446c for type 'uint64_t', which requires 8 byte alignment
0x7fa72751446c: note: pointer points here
  00 00 00 00 32 20 33 20  33 20 34 0a 32 20 33 20  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
              ^ 
../src/kernel_lion_encode.c:169:52: runtime error: load of misaligned address 0x7fa7273b128c for type '__int128 unsigned', which requires 16 byte alignment
0x7fa7273b128c: note: pointer points here
  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  66 66 65 72 00 00 00 00
              ^ 
../src/kernel_lion_encode.c:182:54: runtime error: store to misaligned address 0x7fa727571c5c for type 'uint64_t', which requires 8 byte alignment
0x7fa727571c5c: note: pointer points here
  23 20 32 33 23 20 31 39  65 78 74 65 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
              ^ 
../src/buffer.c:38:55: runtime error: load of misaligned address 0x0000020fd8cc for type 'uint_fast64_t', which requires 8 byte alignment
0x0000020fd8cc: note: pointer points here
  00 00 00 00 b5 98 00 00  00 00 00 00 43 3c 00 00  00 00 00 00 03 00 00 00  03 00 00 00 03 00 00 00
              ^ 
../src/buffer.c:38:80: runtime error: load of misaligned address 0x0000020fd8d4 for type 'uint_fast64_t', which requires 8 byte alignment
0x0000020fd8d4: note: pointer points here
  00 00 00 00 43 3c 00 00  00 00 00 00 03 00 00 00  03 00 00 00 03 00 00 00  00 00 00 00 b5 98 00 00
              ^ 
../src/kernel_lion_decode.c:84:50: runtime error: store to misaligned address 0x00000223dd54 for type 'uint64_t', which requires 8 byte alignment
0x00000223dd54: note: pointer points here
  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
              ^ 
../src/kernel_lion_decode.c:90:28: runtime error: load of misaligned address 0x000002126728 for type '__int128 unsigned', which requires 16 byte alignment
0x000002126728: note: pointer points here
 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00
              ^ 
../src/kernel_lion_decode.c:84:50: runtime error: store to misaligned address 0x000002259b94 for type 'uint64_t', which requires 8 byte alignment
0x000002259b94: note: pointer points here
  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
              ^ 
../src/kernel_lion_decode.c:84:52: runtime error: load of misaligned address 0x00000225581c for type 'uint64_t', which requires 8 byte alignment
0x00000225581c: note: pointer points here
  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
              ^ 
../src/kernel_lion_decode.c:84:52: runtime error: load of misaligned address 0x000002275754 for type 'uint64_t', which requires 8 byte alignment
0x000002275754: note: pointer points here
  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
              ^ 
../src/kernel_lion_decode.c:84:50: runtime error: store to misaligned address 0x0000022dbb84 for type 'uint64_t', which requires 8 byte alignment
0x0000022dbb84: note: pointer points here
  22 3c 62 75 22 77 74 66  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
              ^ 
../src/kernel_lion_decode.c:84:52: runtime error: load of misaligned address 0x0000022f5d34 for type 'uint64_t', which requires 8 byte alignment
0x0000022f5d34: note: pointer points here
  00 00 00 00 20 33 36 35  20 31 20 22 20 32 32 20  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
              ^ 
../src/kernel_lion_decode.c:84:52: runtime error: load of misaligned address 0x0000022c5704 for type 'uint64_t', which requires 8 byte alignment
0x0000022c5704: note: pointer points here
  00 00 00 00 2c 20 5f 5f  5f 5f 20 28 2c 20 2e 2e  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
              ^ 
../src/kernel_lion_decode.c:84:50: runtime error: store to misaligned address 0x000002299be4 for type 'uint64_t', which requires 8 byte alignment
0x000002299be4: note: pointer points here
  6c 6f 6e 67 73 68 6f 72  63 68 61 72 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
              ^ 
../src/kernel_lion_decode.c:90:28: runtime error: load of misaligned address 0x00000222ec4c for type '__int128 unsigned', which requires 16 byte alignment
0x00000222ec4c: note: pointer points here
  00 00 00 00 3e 3e 20 38  5f 5f 61 74 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
              ^ 
../src/kernel_lion_decode.c:84:52: runtime error: load of misaligned address 0x0000022fa774 for type 'uint64_t', which requires 8 byte alignment
0x0000022fa774: note: pointer points here
  00 00 00 00 63 74 20 74  63 74 0a 20 63 74 20 74  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
              ^ 
../src/kernel_lion_decode.c:84:50: runtime error: store to misaligned address 0x0000022e36e4 for type 'uint64_t', which requires 8 byte alignment
0x0000022e36e4: note: pointer points here
  72 65 74 75 20 20 75 6e  72 65 74 75 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
              ^ 
../src/buffer.c:38:55: runtime error: load of misaligned address 0x0000020fd95c for type 'uint_fast64_t', which requires 8 byte alignment
0x0000020fd95c: note: pointer points here
  01 00 00 00 43 3c 00 00  00 00 00 00 b5 98 00 00  00 00 00 00 00 0c 03 03  00 00 00 00 00 00 00 00
              ^ 
../src/buffer.c:38:80: runtime error: load of misaligned address 0x0000020fd964 for type 'uint_fast64_t', which requires 8 byte alignment
0x0000020fd964: note: pointer points here
  00 00 00 00 b5 98 00 00  00 00 00 00 00 0c 03 03  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
              ^ 

This is undefined behavior. So closing this bug as invalid.
Comment 7 Markus Trippelsdorf 2015-05-21 10:16:20 UTC
See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65709 for a similar bug
in LZ4.
Comment 8 gpnuma 2015-05-21 10:32:02 UTC
Thanks Markus I didn't think these alignment issues were actually the problem, it goes a long way.

By doing memmoves instead of pointer cast allocations I got rid of the segfault, but of course things are much slower... this "undefined behaviour" is really treacherous !!

Is there any way to ensure proper alignment so I don't fall into this trap and still benefit from maximum speed ?
Comment 9 gpnuma 2015-05-21 10:35:45 UTC
What I mean is the structs I was using the pointer casts allocations with are instanciated by the program itself, so there could be a way to instanciate them with the required alignment I suppose.
Comment 10 Markus Trippelsdorf 2015-05-21 11:26:45 UTC
(In reply to gpnuma from comment #8)
> Thanks Markus I didn't think these alignment issues were actually the
> problem, it goes a long way.
> 
> By doing memmoves instead of pointer cast allocations I got rid of the
> segfault, but of course things are much slower... this "undefined behaviour"
> is really treacherous !!
> 
> Is there any way to ensure proper alignment so I don't fall into this trap
> and still benefit from maximum speed ?

I'm afraid there is no general recipe that would ensure proper alignment.
But using memcpy hasn't necessary to be "much slower".
And trading undefined behavior for a little more speed isn't a good idea in general.
Comment 11 gpnuma 2015-05-21 15:53:00 UTC
(In reply to Markus Trippelsdorf from comment #10)
> (In reply to gpnuma from comment #8)
> > Thanks Markus I didn't think these alignment issues were actually the
> > problem, it goes a long way.
> > 
> > By doing memmoves instead of pointer cast allocations I got rid of the
> > segfault, but of course things are much slower... this "undefined behaviour"
> > is really treacherous !!
> > 
> > Is there any way to ensure proper alignment so I don't fall into this trap
> > and still benefit from maximum speed ?
> 
> I'm afraid there is no general recipe that would ensure proper alignment.
> But using memcpy hasn't necessary to be "much slower".
> And trading undefined behavior for a little more speed isn't a good idea in
> general.

Thanks, actually the code with __builtin_memmove is 30% slower compiled with gcc 4.9.2 or 4.8 than it is with pointer cast allocations in 4.8 (4.9 can't say because of the segfault).

However after testing with gcc 5.1 I had the pleasant surprise to see that it's performing at the same speed as before, which means 30% faster than gcc 4.9.

30% faster is huge, you've obviously done a great job in the optimization stages for 5.1 !