[PATCH 0/6] RFC: adding support to GCC for detecting trust boundaries

Martin Sebor msebor@gmail.com
Mon Dec 6 18:12:00 GMT 2021


On 11/13/21 1:37 PM, David Malcolm via Gcc-patches wrote:
> [Crossposting between gcc-patches@gcc.gnu.org and
> linux-toolchains@vger.kernel.org; sorry about my lack of kernel
> knowledge, in case of the following seems bogus]
> 
> I've been trying to turn my prototype from the LPC2021 session on
> "Adding kernel-specific test coverage to GCC's -fanalyzer option"
> ( https://linuxplumbersconf.org/event/11/contributions/1076/ ) into
> something that can go into GCC upstream without adding kernel-specific
> special cases, or requiring a GCC plugin.  The prototype simply
> specialcased "copy_from_user" and "copy_to_user" in GCC, which is
> clearly not OK.
> 
> This GCC patch kit implements detection of "trust boundaries", aimed at
> detection of "infoleaks" and of use of unsanitized attacker-controlled
> values ("taint") in the Linux kernel.
> 
> For example, here's an infoleak diagnostic (using notes to
> express what fields and padding within a struct have not been
> initialized):
> 
> infoleak-CVE-2011-1078-2.c: In function ‘test_1’:
> infoleak-CVE-2011-1078-2.c:28:9: warning: potential exposure of sensitive
>    information by copying uninitialized data from stack across trust
>    boundary [CWE-200] [-Wanalyzer-exposure-through-uninit-copy]
>     28 |         copy_to_user(optval, &cinfo, sizeof(cinfo));
>        |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>    ‘test_1’: events 1-3
>      |
>      |   21 |         struct sco_conninfo cinfo;
>      |      |                             ^~~~~
>      |      |                             |
>      |      |                             (1) region created on stack here
>      |      |                             (2) capacity: 6 bytes
>      |......
>      |   28 |         copy_to_user(optval, &cinfo, sizeof(cinfo));
>      |      |         ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>      |      |         |
>      |      |         (3) uninitialized data copied from stack here
>      |
> infoleak-CVE-2011-1078-2.c:28:9: note: 1 byte is uninitialized
>     28 |         copy_to_user(optval, &cinfo, sizeof(cinfo));
>        |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> infoleak-CVE-2011-1078-2.c:14:15: note: padding after field ‘dev_class’ is uninitialized (1 byte)
>     14 |         __u8  dev_class[3];
>        |               ^~~~~~~~~
> infoleak-CVE-2011-1078-2.c:21:29: note: suggest forcing zero-initialization by providing a ‘{0}’ initializer
>     21 |         struct sco_conninfo cinfo;
>        |                             ^~~~~
>        |                                   = {0}
> 
> I have to come up with a way of expressing trust boundaries in a way
> that will be:
> - acceptable to the GCC community (not be too kernel-specific), and
> - useful to the Linux kernel community.
> 
> At LPC it was pointed out that the kernel already has various
> annotations e.g. "__user" for different kinds of pointers, and that it
> would be best to reuse those.
> 
> 
> Approach 1: Custom Address Spaces
> =================================
> 
> GCC's C frontend supports target-specific address spaces; see:
>    https://gcc.gnu.org/onlinedocs/gcc/Named-Address-Spaces.html
> Quoting the N1275 draft of ISO/IEC DTR 18037:
>    "Address space names are ordinary identifiers, sharing the same name
>    space as variables and typedef names.  Any such names follow the same
>    rules for scope as other ordinary identifiers (such as typedef names).
>    An implementation may provide an implementation-defined set of
>    intrinsic address spaces that are, in effect, predefined at the start
>    of every translation unit.  The names of intrinsic address spaces must
>    be reserved identifiers (beginning with an underscore and an uppercase
>    letter or with two underscores).  An implementation may also
>    optionally support a means for new address space names to be defined
>    within a translation unit."
> 
> Patch 1a in the following patch kit for GCC implements such a means to
> define new address spaces names in a translation unit, via a pragma:
>    #prgama GCC custom_address_space(NAME_OF_ADDRESS_SPACE)
> 
> For example, the Linux kernel could perhaps write:
> 
>    #define __kernel
>    #pragma GCC custom_address_space(__user)
>    #pragma GCC custom_address_space(__iomem)
>    #pragma GCC custom_address_space(__percpu)
>    #pragma GCC custom_address_space(__rcu)
> 
> and thus the C frontend can complain about code that mismatches __user
> and kernel pointers, e.g.:
> 
> custom-address-space-1.c: In function ‘test_argpass_to_p’:
> custom-address-space-1.c:29:14: error: passing argument 1 of ‘accepts_p’
> from pointer to non-enclosed address space
>     29 |   accepts_p (p_user);
>        |              ^~~~~~
> custom-address-space-1.c:21:24: note: expected ‘void *’ but argument is
> of type ‘__user void *’
>     21 | extern void accepts_p (void *);
>        |                        ^~~~~~
> custom-address-space-1.c: In function ‘test_cast_k_to_u’:
> custom-address-space-1.c:135:12: warning: cast to ‘__user’ address space
> pointer from disjoint generic address space pointer
>    135 |   p_user = (void __user *)p_kernel;
>        |            ^

This seems like an excellent use of named address spaces :)

I'm familiar with TR 18037 but I'm not an expert on this stuff
so I can't really say a whole lot more.

My only suggestion here is to follow the terminology from
there in the naming of the pragma, unless you have some reason
not to.  I'd also recommend to consider other implementations
of named address spaces, if there are any, especially those
that try to be compatible with GCC.  If there are none, rather
than custom_address_space I'd suggest either just address_space
or named_address_space.

I have not yet looked at the implementation so this is just
a high-level comment on the design.

> The patch doesn't yet maintain a good distinction between implicit
> target-specific address spaces and user-defined address spaces, has at
> least one known major bug, and has only been lightly tested.  I can
> fix these issues, but was hoping for feedback that this approach is the
> right direction from both the GCC and Linux development communities.
> 
> Implementation status: doesn't yet bootstrap; am running into stage2
> vs stage3 comparison issues.
> 
> 
> Approach 2: An "untrusted" attribute
> ====================================
> 
> Alternatively, patch 1b in the kit implements:
> 
>    __attribute__((untrusted))
> 
> which can be applied to types as a qualifier (similarly to const,
> volatile, etc) to mark a trust boundary, hence the kernel could have:
> 
>    #define __user __attribute__((untrusted))
> 
> where my patched GCC treats
>    T *
> vs
>    T __attribute__((untrusted)) *
> as being different types and thus the C frontend can complain (even without
> -fanalyzer) about e.g.:
> 
> extern void accepts_p(void *);
> 
> void test_argpass_to_p(void __user *p_user)
> {
>    accepts_p(p_user);
> }
> 
> untrusted-pointer-1.c: In function ‘test_argpass_to_p’:
> untrusted-pointer-1.c:22:13: error: passing argument 1 of ‘accepts_p’
> from pointer with different trust level
>     22 |   accepts_p(p_user);
>        |              ^~~~~~
> untrusted-pointer-1.c:14:23: note: expected ‘void *’ but argument is of
> type ‘__attribute__((untrusted)) void *’
>     14 | extern void accepts_p(void *);
>        |                        ^~~~~~
> 
> So you'd get enforcement of __user vs non-__user pointers as part of
> GCC's regular type-checking.  (You need an explicit cast to convert
> between the untrusted vs trusted types).

As with the named address space idea, this approach also looks
reasonable to me.  If you anticipate using the attribute only
in the analyzer I would suggest to consider introducing it in
the analyzer's namespace (e.g., analyzer::untrusted, or even
gnu::analyzer::untrusted).

I'll try to loook at the patch itself sometime later this week
and comment on the implementation there.

> 
> This approach is much less expressive that the custom addres space
> approach; it would only cover the trust boundary aspect; it wouldn't
> cover any differences between generic pointers and __user, vs __iomem,
> __percpu, and __rcu which I admit I only dimly understand.
> 
> Implementation status: bootstraps and passes regression testing.
> Builds most of the kernel, but am running into various conversion
> issues.  It would be good to have some clarity on what conversions
> the compiler ought to warn about, and what conversions should be OK.
> 
> 
> Approach 3: some kind of custom qualifier
> =========================================
> 
> Approach 1 extends the existing "named address space" machinery to add
> new values; approach 2 adds a new flag to cv-qualifiers.  Both of these
> approaches work in terms of cv-qualifiers.  We have some spare bits
> available for these; perhaps a third approach could be to add a new
> kind of user-defined qualifier, like named address spaces, but othogonal
> to them.   I haven't attempted to implement this.

I'm afraid I don't understand what this would be useful for
enough to comment.

> Other attributes
> ================
> 
> Patch 2 in the kit adds:
>    __attribute__((returns_zero_on_success))
> and
>    __attribute__((returns_nonzero_on_success))
> as hints to the analyzer that it's worth bifurcating the analysis of
> such functions (to explore failure vs success, and thus to better
> explore error-handling paths).  It's also a hint to the human reader of
> the source code.

I thing being able to express something along these lines would
be useful even outside the analyzer, both for warnings and, when
done right, perhaps also for optimization.  So I'm in favor of
something like this.  I'll just reiterate here the comment on
this attribute I sent you privately some time ago.

A more general attribute would also make it possible to specify
the value (or argument) on success and failure.  With those we
would be able to express the return values of the POSIX read and
write functions and others like it:

   ssize_t read (int fildes, void *buf, size_t nbyte);
   ssize_t write (int fildes, const void *buf, size_t nbyte);

I.e., it would be nice to express that the return value is
also the number of bytes (elements?) of the array the function
wrote into.  This, along with symbolic evaluation in the middle
end, would then let us detect uninitialized reads back in
the function's caller (after read) and similar.

This is just an idea, and there may be more general apoproaches
that would be even more expressive.  But it's probably too late
in the development cycle to design and add those to GCC 12.

As I promised, I'll try to look at the meat of each patch and
give you some comments, hopefully later this week.

Martin

> 
> Given the above, the kernel could then have:
> 
> extern int copy_from_user(void *to, const void __user *from, long n)
>    __attribute__((access (write_only, 1, 3),
> 		 access (read_only, 2, 3),
> 		 returns_zero_on_success));
> 
> extern long copy_to_user(void __user *to, const void *from, unsigned long n)
>    __attribute__((access (write_only, 1, 3),
> 		 access (read_only, 2, 3),
> 		 returns_zero_on_success));
> 
> with suitable macros in compiler.h or whatnot.
> 
> ("access" is an existing GCC attribute; see
>   https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html )
> 
> My patched GCC add a heuristic to -fanalyzer that a 3-argument function
> with a read_only buffer, a write_only buffer and a shared size argument
> is a "copy function", and treats it as a copy from *from to *to of up to
> n bytes that succeeds, or, given one of the above attributes can succeed
> or fail.  I'm wiring things up so that values read from *untrusted_ptr
> are tracked as tainted, and values written to *untrusted_ptr are treated
> as possible infoleaks (e.g. uninitialized values written to
> *untrusted_ptr are specifically called out).  This gets the extra
> checking for infoleaks and taint that my earlier prototype had, but is
> thus expressed via attributes, without having to have kernel-specific
> special cases.
> 
> Patch 3 of the kit adds infoleak detection to GCC's -fanalyzer (as
> in the example above).
> 
> Possibly silly question: is it always a bug for the value of a kernel
> pointer to leak into user space?  i.e. should I be complaining about an
> infoleak if the value of a trusted_ptr itself is written to
> *untrusted_ptr?  e.g.
> 
>    s.p = some_kernel_ptr;
>    copy_to_user(user_p, &s, sizeof (s));
>       /* value of some_kernel_ptr is written to user space;
>          is this something we should warn for?  */
> 
> Patch 4a/4b wire up the different implementations of "untrusted" into
> GCC's -fanalyzer, which is used by...
> 
> Patch 5 uses this so that "untrusted" values are used in taint detection
> in the analyzer, so that it can complain about attacker-controlled
> values being used without sanitization.
> 
> Patch 6 adds a new __attribute__ ((tainted)) allowing for further
> taint detection (e.g. identifying syscalls), with minimal patching of
> the kernel, and without requiring a lot of link-time interprocedural
> analysis.  I believe that some of this could work independently of
> the trust boundary marking from the rest of the patch kit.
> 
> The combined patch kit (using approach 2 i.e. the "b" patches)
> successfully bootstraps and passes regression testing on
> x86_64-pc-linux-gnu.
> 
> 
> Which of the 3 approaches looks best to:
> - the GCC community?
> - the Linux kernel community?
> 
> Does clang/LLVM have anything similar?
> 
> There are many examples in the patches, some of which are taken from
> historical kernel vulnerabilities, and others from my "antipatterns.ko"
> project ( https://github.com/davidmalcolm/antipatterns.ko ).
> 
> Thoughts?
> 
> Dave
> 
> 
> David Malcolm (6 or 8, depending how you count):
>    1a: RFC: Implement "#pragma GCC custom_address_space"
>    1b: Add __attribute__((untrusted))
>    2: Add returns_zero_on_success/failure attributes
>    3: analyzer: implement infoleak detection
>    4a: analyzer: implemention of region::untrusted_p in terms of custom
>      address spaces
>    4b: analyzer: implement region::untrusted_p in terms of
>      __attribute__((untrusted))
>    5: analyzer: use region::untrusted_p in taint detection
>    6: Add __attribute__ ((tainted))
> 
>   gcc/Makefile.in                               |   3 +-
>   gcc/analyzer/analyzer.opt                     |  20 +
>   gcc/analyzer/checker-path.cc                  | 104 +++
>   gcc/analyzer/checker-path.h                   |  47 +
>   gcc/analyzer/diagnostic-manager.cc            |  75 +-
>   gcc/analyzer/diagnostic-manager.h             |   3 +-
>   gcc/analyzer/engine.cc                        | 342 ++++++-
>   gcc/analyzer/exploded-graph.h                 |   3 +
>   gcc/analyzer/pending-diagnostic.cc            |  30 +
>   gcc/analyzer/pending-diagnostic.h             |  24 +
>   gcc/analyzer/program-state.cc                 |  26 +-
>   gcc/analyzer/region-model-impl-calls.cc       |  26 +-
>   gcc/analyzer/region-model.cc                  | 504 ++++++++++-
>   gcc/analyzer/region-model.h                   |  46 +-
>   gcc/analyzer/region.cc                        |  52 ++
>   gcc/analyzer/region.h                         |   4 +
>   gcc/analyzer/sm-taint.cc                      | 839 ++++++++++++++++--
>   gcc/analyzer/sm.h                             |   9 +
>   gcc/analyzer/store.h                          |   1 +
>   gcc/analyzer/trust-boundaries.cc              | 615 +++++++++++++
>   gcc/c-family/c-attribs.c                      | 132 +++
>   gcc/c-family/c-pretty-print.c                 |   2 +
>   gcc/c/c-typeck.c                              |  64 ++
>   gcc/doc/extend.texi                           |  63 +-
>   gcc/doc/invoke.texi                           |  80 +-
>   gcc/print-tree.c                              |   3 +
>   .../c-c++-common/attr-returns-zero-on-1.c     |  68 ++
>   gcc/testsuite/c-c++-common/attr-untrusted-1.c | 165 ++++
>   .../gcc.dg/analyzer/attr-tainted-1.c          |  88 ++
>   .../gcc.dg/analyzer/attr-tainted-misuses.c    |   6 +
>   .../gcc.dg/analyzer/copy-function-1.c         |  98 ++
>   .../gcc.dg/analyzer/copy_from_user-1.c        |  45 +
>   gcc/testsuite/gcc.dg/analyzer/infoleak-1.c    | 181 ++++
>   gcc/testsuite/gcc.dg/analyzer/infoleak-2.c    |  29 +
>   gcc/testsuite/gcc.dg/analyzer/infoleak-3.c    | 141 +++
>   gcc/testsuite/gcc.dg/analyzer/infoleak-5.c    |  35 +
>   .../analyzer/infoleak-CVE-2011-1078-1.c       | 134 +++
>   .../analyzer/infoleak-CVE-2011-1078-2.c       |  42 +
>   .../analyzer/infoleak-CVE-2014-1446-1.c       | 117 +++
>   .../analyzer/infoleak-CVE-2017-18549-1.c      | 101 +++
>   .../analyzer/infoleak-CVE-2017-18550-1.c      | 171 ++++
>   .../gcc.dg/analyzer/infoleak-antipatterns-1.c | 162 ++++
>   .../gcc.dg/analyzer/infoleak-fixit-1.c        |  22 +
>   gcc/testsuite/gcc.dg/analyzer/pr93382.c       |   2 +-
>   .../analyzer/taint-CVE-2011-0521-1-fixed.c    | 113 +++
>   .../gcc.dg/analyzer/taint-CVE-2011-0521-1.c   | 113 +++
>   .../analyzer/taint-CVE-2011-0521-2-fixed.c    |  93 ++
>   .../gcc.dg/analyzer/taint-CVE-2011-0521-2.c   |  93 ++
>   .../analyzer/taint-CVE-2011-0521-3-fixed.c    |  56 ++
>   .../gcc.dg/analyzer/taint-CVE-2011-0521-3.c   |  57 ++
>   .../gcc.dg/analyzer/taint-CVE-2011-0521-4.c   |  40 +
>   .../gcc.dg/analyzer/taint-CVE-2011-0521-5.c   |  42 +
>   .../gcc.dg/analyzer/taint-CVE-2011-0521-6.c   |  37 +
>   .../gcc.dg/analyzer/taint-CVE-2011-0521.h     | 136 +++
>   .../gcc.dg/analyzer/taint-CVE-2011-2210-1.c   |  93 ++
>   .../gcc.dg/analyzer/taint-CVE-2020-13143-1.c  |  38 +
>   .../gcc.dg/analyzer/taint-CVE-2020-13143-2.c  |  32 +
>   .../gcc.dg/analyzer/taint-CVE-2020-13143.h    |  91 ++
>   gcc/testsuite/gcc.dg/analyzer/taint-alloc-1.c |  64 ++
>   gcc/testsuite/gcc.dg/analyzer/taint-alloc-2.c |  27 +
>   gcc/testsuite/gcc.dg/analyzer/taint-alloc-3.c |  21 +
>   gcc/testsuite/gcc.dg/analyzer/taint-alloc-4.c |  31 +
>   .../gcc.dg/analyzer/taint-antipatterns-1.c    | 137 +++
>   .../gcc.dg/analyzer/taint-divisor-1.c         |  26 +
>   .../{taint-1.c => taint-read-index-1.c}       |  19 +-
>   .../gcc.dg/analyzer/taint-read-offset-1.c     | 128 +++
>   .../taint-read-through-untrusted-ptr-1.c      |  37 +
>   gcc/testsuite/gcc.dg/analyzer/taint-size-1.c  |  32 +
>   .../gcc.dg/analyzer/taint-write-index-1.c     | 132 +++
>   .../gcc.dg/analyzer/taint-write-offset-1.c    | 132 +++
>   gcc/testsuite/gcc.dg/analyzer/test-uaccess.h  |  19 +
>   .../torture/infoleak-net-ethtool-ioctl.c      |  78 ++
>   .../torture/infoleak-vfio_iommu_type1.c       |  39 +
>   gcc/tree-core.h                               |   6 +-
>   gcc/tree.c                                    |   1 +
>   gcc/tree.h                                    |  11 +-
>   76 files changed, 6558 insertions(+), 140 deletions(-)
>   create mode 100644 gcc/analyzer/trust-boundaries.cc
>   create mode 100644 gcc/testsuite/c-c++-common/attr-returns-zero-on-1.c
>   create mode 100644 gcc/testsuite/c-c++-common/attr-untrusted-1.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/attr-tainted-1.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/attr-tainted-misuses.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/copy-function-1.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/copy_from_user-1.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/infoleak-1.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/infoleak-2.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/infoleak-3.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/infoleak-5.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/infoleak-CVE-2011-1078-1.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/infoleak-CVE-2011-1078-2.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/infoleak-CVE-2014-1446-1.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/infoleak-CVE-2017-18549-1.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/infoleak-CVE-2017-18550-1.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/infoleak-antipatterns-1.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/infoleak-fixit-1.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-CVE-2011-0521-1-fixed.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-CVE-2011-0521-1.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-CVE-2011-0521-2-fixed.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-CVE-2011-0521-2.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-CVE-2011-0521-3-fixed.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-CVE-2011-0521-3.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-CVE-2011-0521-4.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-CVE-2011-0521-5.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-CVE-2011-0521-6.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-CVE-2011-0521.h
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-CVE-2011-2210-1.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-CVE-2020-13143-1.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-CVE-2020-13143-2.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-CVE-2020-13143.h
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-alloc-1.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-alloc-2.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-alloc-3.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-alloc-4.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-antipatterns-1.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-divisor-1.c
>   rename gcc/testsuite/gcc.dg/analyzer/{taint-1.c => taint-read-index-1.c} (72%)
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-read-offset-1.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-read-through-untrusted-ptr-1.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-size-1.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-write-index-1.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/taint-write-offset-1.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/test-uaccess.h
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/torture/infoleak-net-ethtool-ioctl.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/torture/infoleak-vfio_iommu_type1.c
> 



More information about the Gcc-patches mailing list