[PATCH 3/X] [libsanitizer] Add option to bootstrap using HWASAN

Wed Nov 20 15:46:00 GMT 2019

On 20/11/2019 14:33, Martin Liška wrote:
> On 11/13/19 4:24 PM, Matthew Malcomson wrote:
>> On 12/11/2019 12:08, Martin Liška wrote:
>>> On 11/11/19 5:03 PM, Matthew Malcomson wrote:
>>>> Ah!
>>>> My apologies -- I sent up a series with a few documentation mistakes.
> 
>>
>> b) Marking 'ptr' and 'mem' in the dump sounds like a good idea to me.
>>      Exactly how I'm not sure -- maybe with a colourscheme?  Do you 
>> have a
>>      marking in mind?
> 
> Libsanitizer is capable of using colors for report printing.
> I can help with that and come up with a patch for upstream.
> 
>>
>>      Uninitialised shadow space has the zero tag, however, there are a 
>> few
>>      extra details that help understanding these traces:
>>
>>      On the stack, zero is both uninitialized and "the background" (i.e.
>>      the tag for anything not specially instrumented, like register 
>> spills
>>      and parameters passed on the stack).
>>      However, accessible tagged objects can be given a zero tag.
> 
> Question here would be if we should use non-zero tags here? Maybe related
> to my comment about skipping of HWASAN_STACK_BACKGROUND tag?

Unfortunately we can't skip non-zero tags at compile time when using a 
random frame tag.  This is because we don't know at compile time what 
the random frame tag will be.

On each entry to a frame a "base tag" is generated randomly at runtime.
Each local object in the frame has a compile-time offset that's what 
gets calculated in `hwasan_increment_tag` -- the offset from this random 
tag.
The tag assigned to a local object is the runtime random frame tag plus 
the compile-time constant offset.

I could avoid HWASAN_STACK_BACKGROUND as a tag when the parameter 
`hwasan-random-frame-tag` is false, since then there is no runtime 
random base tag (instead I start with zero).

I'll be happy to add that in if you'd like -- I decided against it since 
it would only matter when a function has 256 or more variables, but I 
flip-flopped on the decision a few times.

> 
>>      We allow this to avoid runtime checks when incrementing the random
>>      frame tag to get the tag for a new local variable.
>>      We can easily avoid the zero tag at compile-time if we don't use a
>>      random tag for each frame.  I had this in development at one point
>>      and found it quite useful for verification.  I already have an 
>> option
>>      to disable random tags for each frame that this ability could go
>>      under.
>>      I don't believe (but am not 100% certain) this option is in LLVM.
>>
>>      On the heap uninitialised is tag zero, but memory that has been
>>      `free`d is given a random tag, so non-zero in a dump does not mean a
>>      valid object.
>>
>> c) Is there an alternate notation you have in mind?
>>      I would guess the "dots" are there to say "this granule has no
>>      short-tag information", and I'm not sure what would be a better
>>      way to demonstrate that.
> 
> Now I've got it here. Dot means that top-byte of a pointer equals to zero.
> Right?

Ah!
I think I never described the "short-tag" functionality, and the fact 
it's in the debug output is getting confusing.

This will also be part of answering your question "c)", and question 
"h", in the other email 
https://gcc.gnu.org/ml/gcc-patches/2019-11/msg01950.html .

----------

The main tagging behaviour as described has a natural limitation.
Invalid accesses that do not cross a 16 byte boundary are not caught, 
since each shadow-memory tag applies to a 16 byte chunk.

To account for this, HWASAN has a "short-tag" functionality.
This functionality was introduced in llvm-svn revision 365551.

Usually, a shadow-memory byte records the *tag* that is valid for access 
to the relevant 16 byte granule in normal memory.
When using short-tags, if an object fills only part of a 16 byte granule 
in normal memory, the corresponding shadow-memory byte stores the 
*length (in bytes) into this granule that is valid*.
The *tag* is then stored in the last byte of the 16 byte granule in 
normal memory.
(We know that last byte is unused, since this is a "short" granule tag).

Now, checking a memory access consists of two parts.
1) A normal tag comparison.
2) A fallback in the tag-mismatch case.
    This fallback checks if the accessing pointer is accessing less bytes
    into the granule than the length given in shadow-memory.
    Then if that's the case it also checks the pointers tag matches the
    last byte in this 16 byte granule.

That is a little difficult to explain clearly in text, so I apologise if 
the above doesn't make sense.

----------

The hwasan error-reporting output lists both memory tags *and* short-tags.
These are the two sections under the titles of "Memory tags ..." and 
"Tags for short granules ...".

The first printed section shows what is stored in shadow-memory.
This is usually the tag, but can be a length if using "short" tags.
The second section contains the "last byte of a granule" for every 
granule whose shadow-memory byte could be a length.

This is why the majority of the "Memory tags ..." section is zero 
(uninitialized).
The majority of the "Tags for short granules ..." section is dots to 
represent that this granule can't be a "short" granule.
Hwasan knows those granules can't be "short" granules since their 
corresponding byte in shadow memory is not a valid length for a 
short-granule interpretation.
Valid lengths are in the range 1 to 15.

It is up to the user to disambiguate the two possibilities in the output.

----------

I have not implemented setting up short-tags for the stack (and do not 
intend to for GCC 10).

This explains your question "h)" -- the testcase you found accesses a 
stack-allocated buffer.
The access is outside the buffer, but not outside the 16 byte granule 
that object is in.  Hence without short-tags this can not be detected by 
hwasan.

It also explains your question "c)" there are no granules that could be 
interpreted as "short" because GCC doesn't yet set any granules as "short".

----------

Note: You will likely see some stack error-reports that do include 
"short" tag information.  This is not because the compiler has generated 
short tag information, but it's because the tags that have been 
generated could be interpreted as valid short tags.
This could cause some rare false-passes, but hwasan is already a 
probabilistic sanitizer.

Note 2: Adding short-tags later is backwards-compatible -- especially 
since I have not added inline tests yet.
The compatibility story for adding short-tags is:
- If you generate short-tags you must have short-tag checking.
- Having short-tag checking without generating short-tags can add
   rare false-passes.

Note 3: short-tags is not a feature in MTE.

> 
>>
>> d) I agree, an address offset annotation on each line of the shadow
>>      memory sounds useful.
> 
> I can come up with an upstream patch as well.
> 
> Thank you,
> Martin
> 
>>
>> Cheers,
>> MM
>>
>>>
>>> Thanks,
>>> Martin
>>>
>>>>
>>>> I'm attaching the entire updated patch series (with the other
>>>> documentation fixes in it too) and the fixed patch for just this 
>>>> part in
>>>> case you just want to compile and test right now.
>>>
>>
>