Bug 63829 - _Lock_policy used in thread.cc can cause incompatibilities with binaries using different -mcpu
Summary: _Lock_policy used in thread.cc can cause incompatibilities with binaries usin...
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: libstdc++ (show other bugs)
Version: 4.9.2
: P3 normal
Target Milestone: 9.0
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-11-12 01:27 UTC by Aaron Graham
Modified: 2020-11-23 13:13 UTC (History)
3 users (show)

See Also:
Host: x86_64-unknown-linux-gnu
Target: arm-unknown-linux-gnueabi
Build:
Known to work:
Known to fail:
Last reconfirmed: 2014-12-06 00:00:00


Attachments
Disassembly of crashing result. (3.94 KB, text/plain)
2014-11-12 01:27 UTC, Aaron Graham
Details
Disassembly of good result. (4.03 KB, text/plain)
2014-11-12 01:28 UTC, Aaron Graham
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Aaron Graham 2014-11-12 01:27:04 UTC
ARM processor, Raspberry Pi

Offending code:

  #include <memory>
  thread_local std::unique_ptr<int> tls_test;

  struct foo {
    foo() { tls_test.reset(new int(42)); }
  } const foo_instance;

  int main() {}

The following works:
g++ test.cc -std=c++14

The following crashes:
g++ test.cc -std=c++14 -mcpu=arm1176jzf-s

I will attach the full disassembly for both.

Here's the basic gdb output:
Core was generated by `./a.out'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00008668 in __tls_init ()
(gdb) bt
#0  0x00008668 in __tls_init ()
#1  0x00008a14 in TLS wrapper function for tls_test ()
#2  0x000086ec in foo::foo() ()
#3  0x00008648 in __static_initialization_and_destruction_0(int, int) ()
#4  0x000086d0 in _GLOBAL__sub_I_tls_test ()
#5  0x00008a78 in __libc_csu_init ()
#6  0x4f508f18 in __libc_start_main ()
   from /opt/armtools/20141030/arm-brcm-linux-gnueabi/sysroot/lib/libc.so.6
#7  0x000084fc in _start ()
(gdb) info reg
r0             0x10da0  69024
r1             0xffff   65535
r2             0xc      12
r3             0x0      0
r4             0x2      2
r5             0x10c64  68708
r6             0x2      2
r7             0x1      1
r8             0xafb35654       2947765844
r9             0xafb3565c       2947765852
r10            0x4f3a7000       1329229824
r11            0xafb354ac       2947765420
r12            0xffff0fff       4294905855
sp             0xafb354a8       0xafb354a8
lr             0x8a14   35348
pc             0x8668   0x8668 <__tls_init+16>
cpsr           0x60000010       1610612752
(gdb)
Comment 1 Aaron Graham 2014-11-12 01:27:44 UTC
Created attachment 33945 [details]
Disassembly of crashing result.
Comment 2 Aaron Graham 2014-11-12 01:28:09 UTC
Created attachment 33946 [details]
Disassembly of good result.
Comment 3 Aaron Graham 2014-12-06 01:39:48 UTC
This is a C++ standard library problem.

If the toolchain is compiled for generic arm processors, the C++ standard library uses the "_S_mutex" _Lock_policy (see ext/concurrence.h).

    // Compile time constant that indicates prefered locking policy in
    // the current configuration.
    static const _Lock_policy __default_lock_policy = 
  #ifdef __GTHREADS
  #if (defined(__GCC_HAVE_SYNC_COMPARE_AND_SWAP_2) \
       && defined(__GCC_HAVE_SYNC_COMPARE_AND_SWAP_4))
    _S_atomic;
  #else
    _S_mutex;
  #endif
  #else
    _S_single;
  #endif


If the compiler is then used to build binaries using -mcpu=arm1176jzf-s (or cortex-a9 or just about anything else) then those binaries use the "_S_atomic" _Lock_policy and are *incompatible* with the standard library built with the compiler.

Here's some simpler code that was failing because of this problem:
  void foo() {}
  int main() { std::thread(&foo).join(); }

This fails because execute_native_thread_routine accesses a shared_ptr, thereby requiring all binaries that link to it to use its locking policy, or else.

I've solved this problem in my own setup by building the toolchain and application binaries with the same -mcpu.

A more general solution might be to move more code out of places like thread.cc and into the headers.
Comment 4 Jonathan Wakely 2014-12-06 16:35:29 UTC
(In reply to Aaron Graham from comment #3)
> I've solved this problem in my own setup by building the toolchain and
> application binaries with the same -mcpu.

Yes, that's basically what you need to do if -mcpu can affect the availability of atomics.

This has been known for some time (the same problem exists for i386 vs i486+) but it seems to be causing problems on ARM more and more often. I think PR 42734 is the same issue.
Comment 5 Jonathan Wakely 2014-12-06 19:34:14 UTC
(In reply to Aaron Graham from comment #3)
> I've solved this problem in my own setup by building the toolchain and
> application binaries with the same -mcpu.
> 
> A more general solution might be to move more code out of places like
> thread.cc and into the headers.

I think the right solution is to determine the lock policy when GCC is configured and set it once and not have it change at compile-time based on flags such as -mcpu.
Comment 6 Richard Earnshaw 2015-01-16 18:54:48 UTC
arm linux code should always be using the _S_atomic sequences.  When the processor doesn't have the required instructions, kernel helper routines will be used.  This ensures that code is forwards compatible to later architectures.  It's a really bad idea for the headers used in user code to be changing this decision dynamically.
Comment 7 Jonathan Wakely 2017-01-05 17:18:57 UTC
(In reply to Richard Earnshaw from comment #6)
> arm linux code should always be using the _S_atomic sequences.  When the
> processor doesn't have the required instructions, kernel helper routines
> will be used.

Richard, do you mean this should happen via libatomic, or by libstdc++ calling the kernel helpers directly?

It would be relatively easy to make ARM always use _S_atomic, but for some programs that would add dependency on libatomic which IIUC is optional today.

Solving the backwards-compatibility problem would be harder (code compiled with existing GCC releases would be using the _S_mutex implementation and mixing object code would run into the same problems described above, but now even when using the same -mcpu settings).
Comment 8 Richard Earnshaw 2017-01-06 13:52:03 UTC
From a discussion on IRC:

What's the general story on lock policies?
Are environments supposed to support all three?
(if possible)
no. it's something only used in std::shared_ptr, and generally entirely invisible to users
What's supposed to happen if one (can't be supported)?
most people will only ever use the default one, which gets chosen according to the target
_S_single always works, it just isn't thread-safe
But that's surely bogus if detected dynamically
if _S_atomic doesn't work for a given target then users should not try to use it
(ie based on the CPU architecture
_S_single is only ever the default if gcc is built with --disable-threads 
Ok, so we can ignore that one, then
yeah
but seems that mutex and atomic can't be a compile time choice; it must be selected at configure time
based on the base architecture used at that point
yes (or you have to ensure every translation unit in the program uses the same choice)
(including the libstdc++.so linked to)
But we can't guarantee that if you use the __GCC_HAVE_SYNC_COMPARE_AND_SWAP_<N>
macros
since those are dependent on certain patterns being enabled in the back-end of the compiler, and that's dependent on the architecture.
yep
I think it would be better to move the _atomic _mutex check into configure and create a new macro based on the result of a test performed then
at least then a build of the library would always work with code compiled with that compiler
agreed
(even if it isn't quite as efficient if you normally have a later rev of the architecture)
OK, I'll put that in the PR
so IIUC, if gcc is configured for armv7 (and so chooses the _S_atomic policy) and then somebody builds their code with -mcpu=armv5 they will need to link to libatomic to get the necessary atomic ops
if that's correct, it seems fine to me. when this lock policy stuff was invented we didn't have libatomic, and so some arches simply didn't have atomics, period
Yes, provided the compiler correctly falls back into the library code in that case.
now we can punt to libatomic, so can use _S_atomic for any arch with a libatomic implementation
It would be a bit odd-ball to build for ARMv5 if your library is v7 anyway (you'd still need a v7 device to run it on)
yep
So I won't loose too much sleep over that combination
people seem to do that though
maybe just for testing, I don't know
Probably for that
ok, so not a real case we need to support
good
Not performant, anyway
Do you mind if I just paste the above directly into the PR?
Comment 9 Jonathan Wakely 2020-11-23 13:13:36 UTC
This was fixed in gcc-9 by the patch for PR libstdc++/67843