[PATCH, RFC] Introduce -fsanitize=use-after-scope (v2)

Jakub Jelinek jakub@redhat.com
Mon Oct 3 09:39:00 GMT 2016


On Mon, Oct 03, 2016 at 11:27:38AM +0200, Martin Liška wrote:
> > Plus, as I've mentioned before, it would be nice to optimize - for ASAN_MARK
> > unpoison appearing strictly before (i.e. dominating) the first (non-shadow) memory read
> > or write in the function (explicit or possible through function calls etc.)
> > you really don't need to unpoison (depending on whether we follow LLVM as
> > mentioned above then it can be removed without anything, or the decl needs
> > to be somehow marked and tell asan_emit_stack_protection it shouldn't poison
> > it at the start), and for ASAN_MARK poisoning appearing after the last
> > load/store in the function (post dominating those, you don't care about
> > noreturn though) you can combine that (remove the ASAN_MARK) with letting
> > asan_emit_stack_protection know it doesn't need to unpoison.
> 
> Fully agree with that approach, however I would be happy to do that as a follow-up as
> it's not going to so trivial..

Ok.

> >> +	    char c = poison ? ASAN_STACK_MAGIC_USE_AFTER_SCOPE : 0;
> >> +	    for (unsigned i = 0; i < shadow_size; ++i)
> >> +	      {
> >> +		emit_move_insn (var_mem, gen_int_mode (c, QImode));
> >> +		var_mem = adjust_address (var_mem, QImode, 1);
> > 
> > When you combine it with the loop, you can also use the infrastructure to
> > handle it 4 bytes at a time.
> 
> Current implementation can handle up to 4 bytes at a time. I'm wondering we can
> do even better for targets with 64-bits memory stores? How can one get such
> info about a target?

It is not just the question of whether the target has fast 64-bit memory
stores, but also whether the constants you want to store are reasonably
cheap.  E.g. on x86_64, movabsq is kind of expensive, so storing 64-bit 0
is cheap, but storing 64-bit 0xfdfdfdfdfdfdfdfdULL might be better done as 2
32-bit stores, perhaps both for speed and size.

> > 
> > Another thing I've noticed is that the inline expansion of
> > __asan_unpoison_stack_memory you emit looks buggy.
> > In use-after-scope-1.c I see:
> >   _9 = (unsigned long) &my_char;
> >   _10 = _9 >> 3;
> >   _11 = _10 + 2147450880;
> >   _12 = (signed char *) _11;
> >   MEM[(short int *)_12] = 0;
> > 
> > That would be fine only for 16 byte long my_char, but we have instead 9 byte
> > one.  So I believe in that case we need to store
> > 0x00, 0x01 bytes, for little endian thus 0x0100.  You could use for it
> > a function similarly to asan_shadow_cst, just build INTEGER_CST rather than
> > CONST_INT out of it.  In general, poisioning is storing 0xf8 to all affected
> > shadow bytes, unpoisioning should restore the state what we would emit
> > without use-after-scope sanitization, which is all but the last byte 0, and
> > the last byte 0 only if the var size is a multiple of 8, otherwise number
> > of valid bytes (1-7).
> 
> Fixed in the newer patch.
> 
> > 
> > As for the option, it seems clang uses now
> > -fsanitize-address-use-after-scope option, while I don't like that much, if
> > they have already released some version with that option or if they are
> > unwilling to change, I'd go with their option.
> 
> I also do not like the option, but 3.9.0 has already the functionality. Thus,
> I'm copying LLVM behavior.
> 
> > 
> >> +         if (flag_stack_reuse != SR_NONE
> >> +             && flag_openacc
> >> +             && oacc_declare_returns != NULL)
> > 
> > This actually looks like preexisting OpenACC bug, I doubt the OpenACC
> > behavior should depend on -fstack-reuse= setting.
> 
> The generated diff for this hunk is bit misleading, I simplified that
> in the second version.
> 
> > 
> > +      bool unpoison_var = asan_poisoned_variables.contains (t);
> > +      if (asan_sanitize_use_after_scope ()
> > +         && unpoison_var)
> > +       asan_poisoned_variables.remove (t);
> > 
> > Similarly to asan_handled_variables, I'd prefer it to be a pointer to
> > hash_set or something similar, so that it costs as few as possible for the
> > general case (no sanitization).  Similarly, querying the hash_set even for
> > no use-after-scope sanitization looks wrong.
> 
> Sure, fixed.
> 
> > 
> > +         if ((asan_sanitize_stack_p () || asan_sanitize_use_after_scope ())
> > 
> > I would say if asan_sanitize_stack_p () is false, then we should not be
> > doing use-after-scope sanitization (error if user requested that
> > explicitly).
> 
> Done by adding '&& ASAN_STACK' to asan_sanitize_use_after_scope.
> 
> > 
> > Don't remember if I've mentioned it earlier, but for vars that are
> > TREE_ADDRESSABLE only because of ASAN_MARK calls, we should probably turn
> > them non-addressable and remove those ASAN_MARK calls, those shouldn't leak.
> > You can have a look at the r237814 change for how similarly
> > compare and exchange is special cased for the
> > addressables discovery (though, the ASAN_MARK case would be easier, just
> > drop it rather than turn it into something different).
> 
> I like the approach to not to handle local variables that have address not taken.
> My implementation in gimplify.c handles only vars with TREE_ADDRESSABLE set to true,
> which should cover all variables which have an address taken in the original code.
> I'm aware of cases where one can hit another variable by *(&local_var1 + magic_offset) = 12345,
> but that would make the check even more slower.
> 
> > 
> > I think I miss some testsuite coverage for what has been discussed, stuff
> > like:
> >   switch (x)
> >     {	
> >       int a;
> >     case 1:
> >       int b;
> >       foo (&a);
> >       foo (&b);
> >       /* FALLTHRU */
> >     case 2:
> >       int c;
> >       foo (&a);
> >       foo (&b);
> >       foo (&c);
> > ...
> >     }
> 
> One can't write such code because of:
> 
> use-after-scope-switch.c: In function ‘main’:
> use-after-scope-switch.c:16:7: error: a label can only be part of a statement and a declaration is not a statement
>        int b;
>        ^~~
> use-after-scope-switch.c:21:7: error: a label can only be part of a statement and a declaration is not a statement
>        int c;
>        ^~~

So try:
  switch (x)
    {	
      int a;
    case 1:;
      int b;
      foo (&a);
      foo (&b);
      /* FALLTHRU */
    case 2:;
      int c;
      foo (&a);
      foo (&b);
      foo (&c);
...
    }
then?

> I know that this is still a challenging area. As I've read C/C++ FE, we really have
> data structures that are used to identify cross initialization in gotos like:
> 
> -Wjump-misses-init
> cross.c: In function ‘main’:
> cross.c:46:5: warning: jump skips variable initialization [-Wjump-misses-init]
>      goto label;
>      ^~~~
> cross.c:25:7: note: label ‘label’ defined here
>        label:
>        ^~~~~
> cross.c:13:16: note: ‘a’ declared here
>        struct A a = { 123 };
> 
> However, it's connected to scopes and I'm not sure whether this data structures
> are up to date during gimplification and whether on can find a mapping between
> LABEL_DECLs and these scopes.
> 
> As an initial implementation, I decided to register all LABEL_DECLs that are either
> used in goto, or have address taken. For those, I defensively generate unpoisoning
> code. Hope it's reasonable initial implementation?

I guess we can live with that for initial implementation, but would be nice
to get it improved in stage1 or stage3.

I'll try to look at the new patch in detail soon.

	Jakub



More information about the Gcc-patches mailing list