[PATCH 0/6] RFC: adding support to GCC for detecting trust boundaries

Thu Dec 9 00:06:30 GMT 2021

On Mon, 2021-12-06 at 13:40 -0600, Segher Boessenkool wrote:
> On Mon, Dec 06, 2021 at 11:12:00AM -0700, Martin Sebor wrote:
> > On 11/13/21 1:37 PM, David Malcolm via Gcc-patches wrote:
> > > Approach 1: Custom Address Spaces
> > > =================================
> > > 
> > > GCC's C frontend supports target-specific address spaces; see:
> > >   https://gcc.gnu.org/onlinedocs/gcc/Named-Address-Spaces.html
> > > Quoting the N1275 draft of ISO/IEC DTR 18037:
> > >   "Address space names are ordinary identifiers, sharing the same
> > > name
> > >   space as variables and typedef names.  Any such names follow the
> > > same
> > >   rules for scope as other ordinary identifiers (such as typedef
> > > names).
> > >   An implementation may provide an implementation-defined set of
> > >   intrinsic address spaces that are, in effect, predefined at the
> > > start
> > >   of every translation unit.  The names of intrinsic address spaces
> > > must
> > >   be reserved identifiers (beginning with an underscore and an
> > > uppercase
> > >   letter or with two underscores).  An implementation may also
> > >   optionally support a means for new address space names to be
> > > defined
> > >   within a translation unit."
> > > 
> > > Patch 1a in the following patch kit for GCC implements such a means
> > > to
> > > define new address spaces names in a translation unit, via a
> > > pragma:
> > >   #prgama GCC custom_address_space(NAME_OF_ADDRESS_SPACE)
> > > 
> > > For example, the Linux kernel could perhaps write:
> > > 
> > >   #define __kernel
> > >   #pragma GCC custom_address_space(__user)
> > >   #pragma GCC custom_address_space(__iomem)
> > >   #pragma GCC custom_address_space(__percpu)
> > >   #pragma GCC custom_address_space(__rcu)
> > > 
> > > and thus the C frontend can complain about code that mismatches
> > > __user
> > > and kernel pointers, e.g.:
> > > 
> > > custom-address-space-1.c: In function ‘test_argpass_to_p’:
> > > custom-address-space-1.c:29:14: error: passing argument 1 of 
> > > ‘accepts_p’
> > > from pointer to non-enclosed address space
> > >    29 |   accepts_p (p_user);
> > >       |              ^~~~~~
> > > custom-address-space-1.c:21:24: note: expected ‘void *’ but
> > > argument is
> > > of type ‘__user void *’
> > >    21 | extern void accepts_p (void *);
> > >       |                        ^~~~~~
> > > custom-address-space-1.c: In function ‘test_cast_k_to_u’:
> > > custom-address-space-1.c:135:12: warning: cast to ‘__user’ address 
> > > space
> > > pointer from disjoint generic address space pointer
> > >   135 |   p_user = (void __user *)p_kernel;
> > >       |            ^
> > 
> > This seems like an excellent use of named address spaces :)
> 
> It has some big problems though.

Thanks for raising these points.

> 
> Named address spaces are completely target-specific.  Defining them
> with
> a pragma like this does not allow you to set the pointer mode or
> anything related to a custom LEGITIMATE_ADDRESS_P.

My thinking was that each custom address space is based on an existing
address space, but is disjoint from it, where "based on" means "what it
looks like in terms of RTL generation" (clearly I'm handwaving here).  

In patch 1a, the custom address spaces are all based on the generic
address space (but disjoint from it); syntax could be added to base
them on one of the target-specific address spaces.

>   It does not allow
> you to sayy zero pointers are invalid in some address spaces and not in
> others.

Syntax could be added for this, I suppose.

> You cannot provide any of the DWARF address space stuff this
> way.

True.  I confess that I haven't thought about the debugging experience,
and I'd need to think what would happen at the DWARF level.

> But most importantly, there are only four bits for the address
> space field internally, and they are used by however a backend wants to
> use them.

One of the ideas of patch 1a is to divide up this 4-bit space between
the target-specific and the custom address spaces.  The backend code
would need to be tweaked to decode the 4-bit value to get at the
underlying target-specific address space value.  This is done by the
function ensure_builtin_addr_space in patch 1a, though I've likely
missed some places.

IIRC, the target that's currently using the most address spaces is avr,
which I believe has 8 target-specific address spaces, in addition to
the generic one, i.e. 9 builtin address spaces, which would leave room
for up to 6 user-defined address spaces.  The Linux kernel's smatch
annotations currently effectively introduce 4 custom address spaces,
__user, __iomem, __percpu, and __rcu (assuming that __kernel is the
generic address space), so it's something of a tight squeeze, but it
does fit.  This doesn't account for out-of-tree backends, of course.

> 
> None of this cannot be solved, but all of it will have to be solved.

(nods)

> 
> IMO it will be best to not mix this with address spaces in the user
> interface (it is of course fine to *implement* it like that, or with
> big overlap at least).

I was thinking the other way around, in that it should look like
address spaces in terms of the user's source code, but has some
implementation differences.

> 
> > > The patch doesn't yet maintain a good distinction between implicit
> > > target-specific address spaces and user-defined address spaces,
> 
> And that will have to be fixed in the user code syntax at least.
> 
> > > has at
> > > least one known major bug, and has only been lightly tested.  I can
> > > fix these issues, but was hoping for feedback that this approach is
> > > the
> > > right direction from both the GCC and Linux development
> > > communities.
> 
> Allowing the user to define new address spaces does not jibe well with
> how targets do (validly!) use them.

I think from a user's perspective it's a nice approach - my feeling is
that it makes certain things easier for the user, whilst complicating
things from a backend implementation perspective.

Plus you've raised various technical issues which I'd have to resolve
if we went in this direction.

> 
> > > Approach 2: An "untrusted" attribute
> > > ====================================
> > > 
> > > Alternatively, patch 1b in the kit implements:
> > > 
> > >   __attribute__((untrusted))
> > > 
> > > which can be applied to types as a qualifier (similarly to const,
> > > volatile, etc) to mark a trust boundary, hence the kernel could
> > > have:
> > > 
> > >   #define __user __attribute__((untrusted))
> > > 
> > > where my patched GCC treats
> > >   T *
> > > vs
> > >   T __attribute__((untrusted)) *
> > > as being different types and thus the C frontend can complain (even
> > > without
> > > -fanalyzer) about e.g.:
> > > 
> > > extern void accepts_p(void *);
> > > 
> > > void test_argpass_to_p(void __user *p_user)
> > > {
> > >   accepts_p(p_user);
> > > }
> > > 
> > > untrusted-pointer-1.c: In function ‘test_argpass_to_p’:
> > > untrusted-pointer-1.c:22:13: error: passing argument 1 of
> > > ‘accepts_p’
> > > from pointer with different trust level
> > >    22 |   accepts_p(p_user);
> > >       |              ^~~~~~
> > > untrusted-pointer-1.c:14:23: note: expected ‘void *’ but argument
> > > is of
> > > type ‘__attribute__((untrusted)) void *’
> > >    14 | extern void accepts_p(void *);
> > >       |                        ^~~~~~
> > > 
> > > So you'd get enforcement of __user vs non-__user pointers as part
> > > of
> > > GCC's regular type-checking.  (You need an explicit cast to convert
> > > between the untrusted vs trusted types).
> > 
> > As with the named address space idea, this approach also looks
> > reasonable to me.  If you anticipate using the attribute only
> > in the analyzer I would suggest to consider introducing it in
> > the analyzer's namespace (e.g., analyzer::untrusted, or even
> > gnu::analyzer::untrusted).
> 
> I don't see any fundamental problems with this approach.  It also is
> very much in line with how Perl handles this (and some copycat
> languages
> do as well), the "tainted" flag on data.

(nods)

> 
> > > This approach is much less expressive that the custom addres space
> > > approach; it would only cover the trust boundary aspect; it
> > > wouldn't
> > > cover any differences between generic pointers and __user, vs
> > > __iomem,
> > > __percpu, and __rcu which I admit I only dimly understand.
> 
> Yes, it does not have any of the big problems that come with those
> address spaces either!  :-)

Indeed - I think this "untrusted attribute" approach is much simpler
implementation-wise than the "custom address space" approach, which is
also in its favor.

I'm wondering if anyone from the kernel development community has
strong opinions here, since the custom address space approach is
potentially much more expressive.

Otherwise I think we're both preferring the "untrusted attribute"
approach (patch 1b).

> 
> > > Other attributes
> > > ================
> > > 
> > > Patch 2 in the kit adds:
> > >   __attribute__((returns_zero_on_success))
> > > and
> > >   __attribute__((returns_nonzero_on_success))
> > > as hints to the analyzer that it's worth bifurcating the analysis
> > > of
> > > such functions (to explore failure vs success, and thus to better
> > > explore error-handling paths).  It's also a hint to the human
> > > reader of
> > > the source code.
> > 
> > I thing being able to express something along these lines would
> > be useful even outside the analyzer, both for warnings and, when
> > done right, perhaps also for optimization.  So I'm in favor of
> > something like this.  I'll just reiterate here the comment on
> > this attribute I sent you privately some time ago.
> 
> What is "success" though?  You probably want it so some checker can
> make
> sure you do handle failure some way, but how do you see what is
> handling
> failure and what is handling the successful case?

"success" and "failure" in this case are purely in terms of how we
label events for the user in the analyzer, such as in event (3) in the
following:

infoleak-antipatterns-1.c: In function ‘infoleak_stack_unchecked_err’:
infoleak-antipatterns-1.c:118:10: warning: potential exposure of
sensitive information by copying uninitialized data from stack across
trust boundary [CWE-200] [-Wanalyzer-exposure-through-uninit-copy]
  118 |   err |= copy_to_user (dst, &st, sizeof(st));
      |          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  ‘infoleak_stack_unchecked_err’: events 1-4
    |
    |  110 |   struct infoleak_buf st;
    |      |                       ^~
    |      |                       |
    |      |                       (1) source region created on stack here
    |      |                       (2) capacity: 256 bytes
    |......
    |  117 |   int err = copy_from_user (&st, src, sizeof(st));
    |      |             ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    |      |             |
    |      |             (3) when ‘copy_from_user’ fails, returning non-zero
    |  118 |   err |= copy_to_user (dst, &st, sizeof(st));
    |      |          ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    |      |          |
    |      |          (4) uninitialized data copied from stack here
    |

i.e. it's allows the analyzer to provide a hint to the reader of the
analyzer output.  The attribute is also a hint to the human reader of
the source code.

Thanks
Dave