This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [RFC PATCH] -fsanitize=pointer-overflow support (PR sanitizer/80998)
- From: Richard Biener <rguenther at suse dot de>
- To: Jakub Jelinek <jakub at redhat dot com>
- Cc: Martin Liška <mliska at suse dot cz>, gcc-patches at gcc dot gnu dot org
- Date: Tue, 20 Jun 2017 10:18:20 +0200 (CEST)
- Subject: Re: [RFC PATCH] -fsanitize=pointer-overflow support (PR sanitizer/80998)
- Authentication-results: sourceware.org; auth=none
- References: <20170619182515.GA2123@tucnak> <alpine.LSU.2.20.1706200928120.22867@zhemvz.fhfr.qr> <20170620081348.GE2123@tucnak>
On Tue, 20 Jun 2017, Jakub Jelinek wrote:
> On Tue, Jun 20, 2017 at 09:41:43AM +0200, Richard Biener wrote:
> > On Mon, 19 Jun 2017, Jakub Jelinek wrote:
> >
> > > Hi!
> > >
> > > The following patch adds -fsanitize=pointer-overflow support,
> > > which adds instrumentation (included in -fsanitize=undefined) that checks
> > > that pointer arithmetics doesn't wrap. If the offset on ptr p+ off when treating
> > > it as signed value is non-negative, we check whether the result is bigger
> > > (uintptr_t comparison) than ptr, if it is negative in ssizetype, we check
> > > whether the result is smaller than ptr, otherwise we check at runtime
> > > whether (ssizetype) off < 0 and do the check based on that.
> > > The patch checks both POINTER_PLUS_EXPR, as well as e.g. ADDR_EXPR of
> > > handled components, and even handled components themselves (exception
> > > is for constant offset when the base is an automatic non-VLA decl or
> > > decl that binds to current function where we can at compile time for
> > > sure guarantee it will fit).
> >
> > Does this "properly" interact with any array-bound sanitizing we do?
> > Say, for
> >
> > &a->b[i].c.d
> >
> > ?
>
> It doesn't interact with it right now at all, and I think it shouldn't
> for the case you wrote, &a->b[i].c.d even if i is within bounds of the
> declared array the pointer still could point to something that wraps around.
> struct S { struct T { struct U { int d; } c; } b[0x1000000]; } *a;
> could still in a buggy program point to something that is actually much
> shorter like a = (struct S *) malloc (0x10000 * sizeof (int));
> or similar and there could be a wrap around.
>
> What I have considered, but haven't implemented yet, is checking if there
> is UBSAN_BOUNDS sanitization and it is actually array or structure with
> array etc. (i.e. there is no pointer dereference) - for
> &q.b[i].c.d
> if the bounds check is present and we are sure about the object size
> (i.e. automatic variable or locally defined file scope var).
>
> > > Martin has said he'll write the sanopt part of optimization
> > > (if UBSAN_PTR for some pointer is dominated by UBSAN_PTR for the same
> > > pointer and the offset is constant in both cases and equal or absolute value
> > > bigger and same sign in the dominating UBSAN_PTR, we can avoid the dominated
> > > check).
> > >
> > > For the cases where there is a dereference (i.e. not ADDR_EXPR of the
> > > handled component or POINTER_PLUS_EXPR), I wonder if we couldn't ignore
> > > say constant offsets in range <-4096, 4096> or something similar, hoping
> > > people don't have anything mapped at the page 0 and -pagesize in hosted
> > > env. Thoughts on that?
> >
> > Not sure what the problem is here?
>
> It would be an attempt to avoid sanitizing int foo (int *p) { return p[10] + p[-5]; }
> (when the offset is constant and small and we dereference it).
> If there is no page mapped at NULL or at the highest page in the virtual
> address space, then the above will crash in case p + 10 or p - 5 wraps
> around.
Ah, so merely an optimization to avoid excessive instrumentation then,
yes, this might work (maybe make 4096 a --param configurable to be able
to disable it).
> > > I've bootstrapped/regtested the patch on x86_64-linux and i686-linux
> > > and additionally bootstrapped/regtested with bootstrap-ubsan on both too.
> > > The latter revealed a couple of issues I'd like to discuss:
> > >
> > > 1) libcpp/symtab.c contains a couple of spots reduced into:
> > > #define DELETED ((char *) -1)
> > > void bar (char *);
> > > void
> > > foo (char *p)
> > > {
> > > if (p && p != DELETED)
> > > bar (p);
> > > }
> > > where we fold it early into if ((p p+ -1) <= (char *) -3)
> > > and as the instrumentation is done during ubsan pass, if p is NULL,
> > > we diagnose this as invalid pointer overflow from NULL to 0xffff*f.
> > > Shall we change the folder so that during GENERIC folding it
> > > actually does the addition and comparison in pointer_sized_int
> > > instead (my preference), or shall I move the UBSAN_PTR instrumentation
> > > earlier into the FEs (but then I still risk stuff is folded earlier)?
> >
> > Aww, so we turn the pointer test into a range test ;) That it uses
> > a pointer type rather than an unsigned integer type is a bug, probably
> > caused by pointers being TYPE_UNSIGNED.
> >
> > Not sure if the folding itself is worthwhile to keep though, thus an
> > option would be to not generate range tests from pointers?
>
> I'll have a look. Maybe only do it during reassoc and not earlier.
It certainly looks somewhat premature in fold-const.c.
> > > 3) not really related to this patch, but something I also saw during the
> > > bootstrap-ubsan on i686-linux:
> > > ../../gcc/bitmap.c:141:12: runtime error: signed integer overflow: -2147426384 - 2147475412 cannot be represented in type 'int'
> > > ../../gcc/bitmap.c:141:12: runtime error: signed integer overflow: -2147426384 - 2147478324 cannot be represented in type 'int'
> > > ../../gcc/bitmap.c:141:12: runtime error: signed integer overflow: -2147450216 - 2147451580 cannot be represented in type 'int'
> > > ../../gcc/bitmap.c:141:12: runtime error: signed integer overflow: -2147450216 - 2147465664 cannot be represented in type 'int'
> > > ../../gcc/bitmap.c:141:12: runtime error: signed integer overflow: -2147469348 - 2147451544 cannot be represented in type 'int'
> > > ../../gcc/bitmap.c:141:12: runtime error: signed integer overflow: -2147482364 - 2147475376 cannot be represented in type 'int'
> > > ../../gcc/bitmap.c:141:12: runtime error: signed integer overflow: -2147483624 - 2147475376 cannot be represented in type 'int'
> > > ../../gcc/bitmap.c:141:12: runtime error: signed integer overflow: -2147483628 - 2147451544 cannot be represented in type 'int'
> > > ../../gcc/memory-block.cc:59:4: runtime error: signed integer overflow: -2147426384 - 2147475376 cannot be represented in type 'int'
> > > ../../gcc/memory-block.cc:59:4: runtime error: signed integer overflow: -2147450216 - 2147451544 cannot be represented in type 'int'
> > > The problem here is that we lower pointer subtraction, e.g.
> > > long foo (char *p, char *q) { return q - p; }
> > > as return (ptrdiff_t) ((ssizetype) q - (ssizetype) p);
> > > and even for a valid testcase where we have an array across
> > > the middle of the virtual address space, say the first one above
> > > is (char *) 0x8000dfb0 - (char *) 0x7fffdfd4 subtraction, even if
> > > there is 128KB array starting at 0x7fffd000, it will yield
> > > UB (not in the source, but in whatever the compiler lowered it into).
> > > So, shall we instead do the subtraction in sizetype and only then
> > > cast? For sizeof (*ptr) > 1 I think we have some outstanding PR,
> > > and it is more difficult to find out in what types to compute it.
> > > Or do we want to introduce POINTER_DIFF_EXPR?
> >
> > Just use uintptr_t for the difference computation (well, an unsigned
> > integer type of desired precision -- mind address-spaces), then cast
> > the result to signed.
>
> Ok (of course, will handle this separately from the rest).
Yes. Note I didn't look at the actual patch (yet).
Richard.