Announcement : An AArch64 (Arm64) Darwin port is planned for GCC12
Thu Sep 16 12:47:40 GMT 2021
As many of you know, Apple has now released an AArch64-based version of macOS and desktop/laptop platforms using the ‘M1’ chip to support it. This is in addition to the existing iOS mobile platforms (but shares some of their constraints).
There is considerable interest in the user-base for a GCC port (starting with https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96168) - and, of great kudos to the gfortran team, one of the main drivers is folks using Fortran.
Fortunately, I was able to obtain access to one of the DTKs, courtesy of the OSS folks, and using that managed to draft an initial attempt at the port last year (however, nowhere near ready for presentation in GCC11). Nevertheless (as an aside) despite being a prototype, the port is in use with many via hombrew, macports or self-builds - which has shaken out some of the fixable bugs.
The work done in the prototype identified three issues that could not be coded around without work on generic parts of the compiler.
I am very happy to say that two of our colleagues, Andrew Burgess and Maxim Blinov (both from embecosm) have joined me in drafting a postable version of the port and we are seeking sponsorship to finish this in the GCC12 timeframe.
Maxim has a lightning talk on the GNU tools track at LPC (right after the steering committee session) that will focus on the two generic issues that we’re tackling (1 and 2 below).
Here is a short summary of the issues and proposed solutions (detailed discussion of any of the parts below would better be in new threads).
1. GCC’s default model for nested functions uses a trampoline on the stack; requiring an executable stack. Executable stack is prohibited by the security model for Arm64 macOS.
— We cannot punt on this because, in addition to the GCC extension to C to provide nested functions, the facility is used by Fortran (and Ada, of course), and many real-world examples fail without it (as reported to the prototype issues tracker).
— the prototype has a hacked implementation of the descriptor-based solution proposed some time ago for Ada (that uses a reserved bit in the address). This is, of course, completely unacceptable for the final port - and does not work when there are callbacks to system functions.
Andrew Burgess is pursuing a solution which is essentially based on our reasoning of the problem and the discussion in the descriptor thread here:
The mechanism would, of course, be opt-in (but is expected to be potentially useful to other OSs where the security model would require a non-executable stack).
The summary is to allocate a memory area to contain the trampolines and to allocate (and free) these with the nesting of function pointer uses. It is allowed for Arm64 macOS to have such a section of memory (granted by permissions on the executable). Such an area cannot be both writable and executable at the same time, so we have to consider the implications of switching when an allocation or free occurs.
This means modifying the nested function code to wrap allocations of trampolines in a cleanup.
The current design also uses builtin functions to implement the actual management of the trampoline page(s), these would be part of libgcc.
A note in passing: Apple’s implementation of libFFI and libObjC makes use of a similar technique - but the trampoline areas are placed in a code-signed SO (so that its authenticity is determined at load-time). This isn’t a suitable mechanism for GCC, since it would involve somehow getting a codesigned SO distributed with the OS. We are also assured that JIT code (which is more-or-less what this is) will be allowed for the forseeable future in macOS.
2. The darwinpcs (variant of the AAPCS64) has a lowering for function arguments that places them differently for ’normal’ and ‘variadic’ calls.
— the prototype has an outrageous hack to allow it function.
Maxim Blinov is working on a proper solution to this, thus:
Many ports go to quite some lengths to track their register and stack use via the cummulative args mechanism. However, when the lowering is done to RTL for calls, the code there assumes that the layout of a stack-placed argument will be the same for named and unnamed cases.
For the darwinpcs, named arguments are passed naturally-aligned on the stack (with necessary padding) - but unnamed arguments are passed word-aligned.
The current proposed solution is to extend the use of the cummulative args mechanism to provide callbacks that allow the computed layout in the cum. args to be used when placing arguments on the stack.
3. GCC's current PCH model requires that we load the compiler executable at the same address each time; this is prohibited by the security model for Arm64 macOS, which does not allow non-PIE executables.
— at present, there is no solution proposed for this and we will initially, at least, live without PCH on the platform (the migration away from the current PCH model is an often-discussed issue, but without a firm plan yet).
Finally, there is a lot of cleanup (and some futher lowering tweaks) needed to the AArch64 part of the back-end before it will be ready to post - plus a few changes to the generic Darwin port. These will fall to me to implement (and I’ve been gradually pushing the generic changes over the last months).
Andrew and Maxim
* current experimental branch here : https://github.com/iains/gcc-darwin-arm64
More information about the Gcc