Bug 93731 - [10 regression] asan tests cause kernel panic on Darwin 11
Summary: [10 regression] asan tests cause kernel panic on Darwin 11
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: sanitizer (show other bugs)
Version: 10.0
: P4 normal
Target Milestone: 10.0
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-02-13 10:33 UTC by Rainer Orth
Modified: 2020-03-02 12:55 UTC (History)
5 users (show)

See Also:
Host: x86_64-apple-darwin11.4.2
Target: x86_64-apple-darwin11.4.2
Build: x86_64-apple-darwin11.4.2
Known to work:
Known to fail:
Last reconfirmed:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Rainer Orth 2020-02-13 10:33:44 UTC
Sometime after 20191101, my mainline bootstraps on Mac OS X 10.7/Darwin 11 began
to fail completely.  Initially it seemed the Mac minis I've been using remotely
had just been turned off willy-nilly, but even after it had been assured that this
wasn't the case, the machines still stopped in the middle of make check without
any indication of what had happened.

Only after I'd run such a bootstrap in a VirtualBox VM (with Mac OS X 10.7.5) did
I see that the machines (obviously like bare metal) crashed with a kernel panic
for some asan tests (I've seen alloca_big_alignment.exe, alloca_detect_custom_size. and bitfield-1.exe).  Only asan tests seem to be
affected (I didn't try any more given the tedious nature of the failure) and
probably only 64-bit ones (I do run multilib tests on Darwin if possible).

As expected, the ubsan tests still work.

Here's the gist of the panics (I do have screen shots if need be):

panic(cpu 0 caller 0xffffff8002c4794): Kernel trap at 0xffffff800053ae2, type 14=page fault, registers:
[...]

Debugger called: <panic>
Backtrace (CPU 0),Frame : Return Address
[...] mach_kernel : _panic + 0x252
      		    _kernel_trap + 0x6a4
		    _return_from_trap + 0xcd
		    _fdexec + 0x172
		    _kco_ma_addsample + 0x162c
		    _kco_ma_addsample + 0x2a80
		    _posix_spawn + 0xab6
		    _unix_syscall64 + 0x1fb
		    _hndl_unix_scall64 + 0x13

BSD process name corresponding to current thread: alloca_big_align

The obvious immediate fix is to disable libsanitizer on Darwin 11.  While in
theory one could keep the 32-bit tests if it really turns out that they continue
to work and the ubsan ones, it's probably not worth the effort given the age of
the OS version and missing provision for enabling ubsan separately.
Comment 1 Jakub Jelinek 2020-02-13 10:51:29 UTC
So you could just disable asan and keep ubsan (set ASAN_SUPPORTED=no in libsanitizer/configure.tgt for a particular darwin OS version, and if it is 32-bit only, also test x$ac_cv_sizeof_void_p = x4 ?
Of course, trying to workaround kernel bugs this way is weird, but if it isn't supported anymore or Apple isn't willing to fix their bugs...
Comment 2 ro@CeBiTec.Uni-Bielefeld.DE 2020-02-13 12:25:19 UTC
> --- Comment #1 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
> So you could just disable asan and keep ubsan (set ASAN_SUPPORTED=no in
> libsanitizer/configure.tgt for a particular darwin OS version, and if it is
> 32-bit only, also test x$ac_cv_sizeof_void_p = x4 ?

Right now there's only [LT]SAN_SUPPORTED in configure.{ac,tgt}.  Sure
ASAN_SUPPORTED (and/or UBSAN_SUPPORTED) could be added, but I doubt it's
worth the effort.

I have a prototype patch that just sets UNSUPPORTED=1 for *86*-apple-darwin11*.

> Of course, trying to workaround kernel bugs this way is weird, but if it isn't
> supported anymore or Apple isn't willing to fix their bugs...

Mac OS X 10.7 is almost 9 years old by now and long past support.  I
don't feel particularly inclined to reghunt which gcc/sanitizer change
caused this, let alone debug the Darwin kernel either.
Comment 3 Iain Sandoe 2020-02-13 17:17:03 UTC
These systems are EOL so we can't expect any fixes to the systems themselves.

The question is "is the latest imported as an version even supposed to support 10.7"?

I have a patch to unsupport the sanitiser for <= 10.6 [where it has been unsupported upstream since at least the last release].  That is something that I can apply immediately.

If the latest sanitiser code is _supposed_ to work on 10.7 - we should at least take a cursory look at why/where it's failing before punting.

I agree that spending much time on making the sanitisers work on EOL machines is not a priority.  I don't have access to my 10.7 box right now - but will take a look next week.
Comment 4 Eric Gallager 2020-02-13 20:47:44 UTC
(In reply to Iain Sandoe from comment #3)
> These systems are EOL so we can't expect any fixes to the systems themselves.
> 
> The question is "is the latest imported as an version even supposed to
> support 10.7"?
> 
> I have a patch to unsupport the sanitiser for <= 10.6 [where it has been
> unsupported upstream since at least the last release].  That is something
> that I can apply immediately.
> 
> If the latest sanitiser code is _supposed_ to work on 10.7 - we should at
> least take a cursory look at why/where it's failing before punting.
> 
> I agree that spending much time on making the sanitisers work on EOL
> machines is not a priority.  I don't have access to my 10.7 box right now -
> but will take a look next week.

I'm on 10.6 and have been configuring with --disable-libsanitizer for some time now anyways, so it won't be too much of a loss if that becomes the default
Comment 5 ro@CeBiTec.Uni-Bielefeld.DE 2020-02-14 10:33:25 UTC
> --- Comment #3 from Iain Sandoe <iains at gcc dot gnu.org> ---
> These systems are EOL so we can't expect any fixes to the systems themselves.
>
> The question is "is the latest imported as an version even supposed to support
> 10.7"?

When I tried to build all of LLVM before the 9.0 release and ran into a
couple of issues, I asked the same question.  Getting an answer was like
pulling teeth, unfortunately, and in the end no one was able or willing
to state which macOS version are supposed to be supported by LLVM.  Very
disappointing IMO, but given this precedent I don't expect anything
better now.

> I agree that spending much time on making the sanitisers work on EOL machines
> is not a priority.  I don't have access to my 10.7 box right now - but will
> take a look next week.

I'm only building mainline on 10.7 because we happen to have a couple of
old Mac minis running 10.7 still around that I can use for the purpose.
Given the nightmarish slowdowns since 10.13, that is still a decent
option.

That said, when experimenting with bootstraps inside VirtualBox VMs, I
had also tried 10.11 where unlike 10.7 I could run with 4 virtual cpus,
getting reasonable build times.
Comment 6 Jeffrey A. Law 2020-02-27 19:02:15 UTC
Only affecting EOL systems, moving to P4.
Comment 7 ro@CeBiTec.Uni-Bielefeld.DE 2020-02-29 15:11:40 UTC
> --- Comment #2 from ro at CeBiTec dot Uni-Bielefeld.DE <ro at CeBiTec dot
> Uni-Bielefeld.DE> ---
>> --- Comment #1 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
[...
>> Of course, trying to workaround kernel bugs this way is weird, but if it isn't
>> supported anymore or Apple isn't willing to fix their bugs...
>
> Mac OS X 10.7 is almost 9 years old by now and long past support.  I
> don't feel particularly inclined to reghunt which gcc/sanitizer change
> caused this, let alone debug the Darwin kernel either.

I've since experimented a bit more: 32-bit 10.7 is affected just the
same.  Afterwards, I've copied both the 32 and 64-bit
alloca_big_alignment.exe and the corresponding libasan.6.dylib and
libgcc_s.1.dylib to a 10.8 VM where they run just fine, so this is
obviously 10.7-only issue.

While working on this, I've created VirtualBox VMs for every single
macOS release between 10.7 and 10.15, each with the latest updates and
last supported Xcode version installed and ready for experiments if
needed.
Comment 8 Iain Sandoe 2020-02-29 15:41:26 UTC
(In reply to ro@CeBiTec.Uni-Bielefeld.DE from comment #7)
> > --- Comment #2 from ro at CeBiTec dot Uni-Bielefeld.DE <ro at CeBiTec dot
> > Uni-Bielefeld.DE> ---
> >> --- Comment #1 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
> [...
> >> Of course, trying to workaround kernel bugs this way is weird, but if it isn't
> >> supported anymore or Apple isn't willing to fix their bugs...
> >
> > Mac OS X 10.7 is almost 9 years old by now and long past support.  I
> > don't feel particularly inclined to reghunt which gcc/sanitizer change
> > caused this, let alone debug the Darwin kernel either.
> 
> I've since experimented a bit more: 32-bit 10.7 is affected just the
> same.  Afterwards, I've copied both the 32 and 64-bit
> alloca_big_alignment.exe and the corresponding libasan.6.dylib and
> libgcc_s.1.dylib to a 10.8 VM where they run just fine, so this is
> obviously 10.7-only issue.

Yeah, I'm just waiting for the x86_64-darwin13 run to finish with libsanitizer disabled
(the fault repeats for me on 64b).

It's early low on my priority list to look at this with the current sanitiser output, since that is emitting a different ABI for Darwin than clang does (so the emitted code would be the first thing to fix).

> While working on this, I've created VirtualBox VMs for every single
> macOS release between 10.7 and 10.15, each with the latest updates and
> last supported Xcode version installed and ready for experiments if
> needed.

VB is more reliable for some versions than others (which might have little to do with VB, of course ;) ).  It's pretty hard to get anything < 10.6 to work there, and obv. is no use of ppc.

----

Right now, I'm thinking to disable sanitzer by default for master <= 10.7 and for 9.x for <= 10.6.  I'll do that today or tomorrow since I want to make the 9.3 deadline.
Comment 9 Iain Sandoe 2020-02-29 16:44:58 UTC
one additional point.

For earlier OS versions the 'atos' version installed is not sufficient to get sensible output from the sanitizer (characterised by very long timeouts on failed tests).

In that case, it is better to install llvm-symbolizer from a recentish LLVM and to set ASAN_SYMBOLIZER_PATH=/path/to/llvm-symbolizer before running tests (FWFW, I tend to do this about 50% of the time even on recent OS versions to ensure that the fails seen are from the sanitiser not atos).  atos is closed-source so we can't fix/rebuild it.

Unfortunately, the llvm-symbolizer exe is not part of the Xcode distributions, so it has to be built from source.

In the case of the x86_64-darwin11 kernel panics, this made no difference to the observed fails.
Comment 10 GCC Commits 2020-03-01 14:42:03 UTC
The master branch has been updated by Iain D Sandoe <iains@gcc.gnu.org>:

https://gcc.gnu.org/g:63cc547f6d85819192afa795e9ade14f0800eda9

commit r10-6951-g63cc547f6d85819192afa795e9ade14f0800eda9
Author: Iain Sandoe <iain@sandoe.co.uk>
Date:   Sun Mar 1 14:40:57 2020 +0000

    Darwin, libsanitizer: Adjust minimum supported Darwin version (PR93731).
    
    The current imported libsanitizer code produces kernel panics for
    Darwin 11 (macOS 10.7) and is unsupported for earlier versions already.
    
    It is not clear if the current sources are even intended to be supported
    on Darwin 11, so this patch causes the default to be build without
    sanitizers for Darwin <= 11.
    
    2020-03-01  Iain Sandoe  <iain@sandoe.co.uk>
    
    	PR sanitizer/93731
    	* configure.tgt (x86_64-*-darwin*, i?86-*-darwin*): Enable by
    	default only for Darwin versions greater than 12 (macOS 10.8).
Comment 11 Iain Sandoe 2020-03-02 12:55:04 UTC
I checked current gcc-9 on Darwin11 and the sanitiser builds and tests there without any such issue, so we only need to exclude for Darwin <= 10 for gcc-9.  This PR should be fixed now.