[PATCH 6/X] libsanitizer: Add hwasan pass and associated gimple changes

Wed Oct 14 18:37:15 GMT 2020

Matthew Malcomson <matthew.malcomson@arm.com> writes:
> @@ -133,6 +137,13 @@ enum asan_mark_flags
>  #undef DEF
>  };
>  
> +enum hwasan_mark_flags
> +{
> +#define DEF(X) HWASAN_MARK_##X
> +  IFN_ASAN_MARK_FLAGS
> +#undef DEF
> +};

Are these used anywhere?  It looks like expand_HWASAN_MARK uses the
plain asan versions.

> @@ -640,6 +684,85 @@ handle_builtin_alloca (gcall *call, gimple_stmt_iterator *iter)
>      = DECL_FUNCTION_CODE (callee) == BUILT_IN_ALLOCA
>        ? 0 : tree_to_uhwi (gimple_call_arg (call, 1));
>  
> +  if (hwasan_sanitize_allocas_p ())
> +    {
> +      /*
> +	 HWASAN needs a different expansion.
> +
> +	 addr = __builtin_alloca (size, align);
> +
> +	 should be replaced by
> +
> +	 new_size = size rounded up to HWASAN_TAG_GRANULE_SIZE byte alignment;
> +	 untagged_addr = __builtin_alloca (new_size, align);
> +	 tag = __hwasan_choose_alloca_tag ();
> +	 addr = __hwasan_tag_pointer (untagged_addr, tag);
> +	 __hwasan_tag_memory (untagged_addr, tag, new_size);
> +	*/
> +      /* Ensure alignment at least HWASAN_TAG_GRANULE_SIZE bytes so we start on
> +	 a tag granule.  */
> +      align = align > HWASAN_TAG_GRANULE_SIZE ? align : HWASAN_TAG_GRANULE_SIZE;
> +
> +      uint8_t tg_mask = HWASAN_TAG_GRANULE_SIZE - 1;
> +      /* tree new_size = (old_size + tg_mask) & ~tg_mask;  */
> +      tree old_size = gimple_call_arg (call, 0);
> +      tree tree_mask = build_int_cst (size_type_node, tg_mask);
> +      g = gimple_build_assign (make_ssa_name (size_type_node), PLUS_EXPR,
> +			       old_size, tree_mask);
> +      gsi_insert_before (iter, g, GSI_SAME_STMT);
> +      tree oversize = gimple_assign_lhs (g);
> +
> +      g = gimple_build_assign (make_ssa_name (size_type_node), BIT_NOT_EXPR,
> +			       tree_mask);
> +      tree mask = gimple_assign_lhs (g);
> +      gsi_insert_before (iter, g, GSI_SAME_STMT);

Seems simpler to use:

      tree mask = build_int_cst (size_type_node, -HWASAN_TAG_GRANULE_SIZE);

> +
> +      g = gimple_build_assign (make_ssa_name (size_type_node), BIT_AND_EXPR,
> +			       oversize, mask);
> +      gsi_insert_before (iter, g, GSI_SAME_STMT);
> +      tree new_size = gimple_assign_lhs (g);
> +
> +      /* emit the alloca call */
> +      tree fn = builtin_decl_implicit (BUILT_IN_ALLOCA_WITH_ALIGN);
> +      gg = gimple_build_call (fn, 2, new_size,
> +			      build_int_cst (size_type_node, align));
> +      tree untagged_addr = make_ssa_name (ptr_type, gg);
> +      gimple_call_set_lhs (gg, untagged_addr);
> +      gsi_insert_before (iter, gg, GSI_SAME_STMT);
> +
> +      /* Insert code choosing the tag.
> +	 Here we use an internal function so we can choose the tag at expand
> +	 time.  We need the decision to be made after stack variables have been
> +	 assigned their tag (i.e. once the tag_offset variable has been set to
> +	 one after the last stack variables tag).  */
> +
> +      gg = gimple_build_call_internal (IFN_HWASAN_CHOOSE_TAG, 0);
> +      tree tag = make_ssa_name (unsigned_char_type_node, gg);
> +      gimple_call_set_lhs (gg, tag);
> +      gsi_insert_before (iter, gg, GSI_SAME_STMT);
> +
> +      /* Insert code adding tag to pointer.  */
> +      fn = builtin_decl_implicit (BUILT_IN_HWASAN_TAG_PTR);
> +      gg = gimple_build_call (fn, 2, untagged_addr, tag);
> +      tree addr = make_ssa_name (ptr_type, gg);
> +      gimple_call_set_lhs (gg, addr);
> +      gsi_insert_before (iter, gg, GSI_SAME_STMT);
> +
> +      /* Insert code tagging shadow memory.
> +	 NOTE: require using `untagged_addr` here for libhwasan API.  */
> +      fn = builtin_decl_implicit (BUILT_IN_HWASAN_TAG_MEM);
> +      gg = gimple_build_call (fn, 3, untagged_addr, tag, new_size);
> +      gsi_insert_before (iter, gg, GSI_SAME_STMT);
> +
> +      /* Finally, replace old alloca ptr with NEW_ALLOCA.  */
> +      replace_call_with_value (iter, addr);
> +      return;
> +    }
> +
> +  tree last_alloca = get_last_alloca_addr ();
> +  const HOST_WIDE_INT redzone_mask = ASAN_RED_ZONE_SIZE - 1;
> +
> +
>    /* If ALIGN > ASAN_RED_ZONE_SIZE, we embed left redzone into first ALIGN
>       bytes of allocated space.  Otherwise, align alloca to ASAN_RED_ZONE_SIZE
>       manually.  */
> @@ -792,6 +915,31 @@ get_mem_refs_of_builtin_call (gcall *call,
>        break;
>  
>      case BUILT_IN_STRLEN:
> +      /* Special case strlen here since its length is taken from its return
> +	 value.
> +
> +	 The approach taken by the sanitizers is to check a memory access
> +	 before it's taken.  For ASAN strlen is intercepted by libasan, so no
> +	 check is inserted by the compiler.
> +
> +	 This function still returns `true` and provides a length to the rest
> +	 of the ASAN pass in order to record what areas have been checked,
> +	 avoiding superfluous checks later on.
> +
> +	 HWASAN does not intercept any of these internal functions.
> +	 This means that checks for memory accesses must be inserted by the
> +	 compiler.
> +	 strlen is a special case, because we can tell the length from the
> +	 return of the function, but that is not known until after the function
> +	 has returned.
> +
> +	 Hence we can't check the memory access before it happens.
> +	 We could check the memory access after it has already happened, but
> +	 for now I'm choosing to just ignore `strlen` calls.
> +	 This decision was simply made because that means the special case is
> +	 limited to this one case of this one function.  */
> +      if (hwasan_sanitize_p ())
> +	return false;

What does LLVM do here?

s/I'm choosing/we choose/

> @@ -1361,6 +1509,149 @@ asan_redzone_buffer::flush_if_full (void)
>      flush_redzone_payload ();
>  }
>  
> +
> +/* HWAddressSanitizer (hwasan) is a probabilistic method for detecting
> +   out-of-bounds and use-after-free bugs.
> +   Read more:
> +   http://code.google.com/p/address-sanitizer/
> +
> +   Similar to AddressSanitizer (asan) it consists of two parts: the
> +   instrumentation module in this file, and a run-time library.
> +
> +   The instrumentation module adds a run-time check before every memory insn in
> +   the same manner as asan (see the block comment for AddressSanitizer above).
> +   Currently, hwasan only adds out-of-line instrumentation, where each check is
> +   implemented as a function call to the run-time library.  Hence a check for a
> +   load of N bytes from address X would be implemented with a function call to
> +   __hwasan_loadN(X), and checking a store of N bytes from address X would be
> +   implemented with a function call to __hwasan_storeN(X).
> +
> +   The main difference between hwasan and asan is in the information stored to
> +   help this checking.  Both sanitizers use a shadow memory area which stores
> +   data recording the state of main memory at a corresponding address.
> +
> +   For hwasan, each 16 byte granule in main memory has a corresponding 1 byte
> +   in shadow memory.  This shadow address can be calculated with equation:
> +     (addr >> HWASAN_TAG_SHIFT_SIZE) + __hwasan_shadow_memory_dynamic_address;
> +   The conversion between real and shadow memory for asan is given in the block
> +   comment at the top of this file.
> +   The description of how this shadow memory is laid out for asan is in the

Maybe “is also in”?

> +   block comment at the top of this file, here we describe how this shadow
> +   memory is used for hwasan.
> +
> +   For hwasan, each variable is assigned a byte-sized 'tag'.  The extent of
> +   the shadow memory for that variable is filled with the assigned tag, and
> +   every pointer referencing that variable has its top byte set to the same
> +   tag.  The run-time library redefines malloc so that every allocation returns
> +   a tagged pointer and tags the corresponding shadow memory with the same tag.
> +
> +   On each pointer dereference the tag found in the pointer is compared to the
> +   tag found in the shadow memory corresponding to the accessed memory address.
> +   If these tags are found to differ then this memory access is judged to be
> +   invalid and a report is generated.
> +
> +   This method of bug detection is not perfect -- it can not catch every bad
> +   access -- but catches them probabilistically instead.  There is always the
> +   possibility that an invalid memory access will happen to access memory
> +   tagged with the same tag as the pointer that this access used.
> +   The chances of this are approx. 0.4% for any two uncorrelated objects.
> +
> +   Random tag generation can mitigate this problem by decreasing the
> +   probability that an invalid access will be missed in the same manner over
> +   multiple runs.  i.e. if two objects are tagged the same in one run of the
> +   binary they are unlikely to be tagged the same in the next run.
> +   Both heap and stack allocated objects have random tags by default.
> +
> +   [16 byte granule implications]
> +    Since the shadow memory only has a resolution on real memory of 16 bytes,
> +    invalid accesses that are within the same 16 byte granule as a valid
> +    address will not be caught.
> +
> +    There is a "short-granule" feature in the runtime library which does catch
> +    such accesses, but this feature is not implemented for stack objects (since
> +    stack objects are allocated and tagged by compiler instrumentation, and
> +    this feature has not yet been implemented in GCC instrumentation).
> +
> +    Another outcome of this 16 byte resolution is that each tagged object must
> +    be 16 byte aligned.  If two objects were to share any 16 byte granule in
> +    memory, then they both would have to be given the same tag, and invalid
> +    accesses to one using a pointer to the other would be undetectable.
> +
> +   [Compiler instrumentation]
> +    Compiler instrumentation ensures that two adjacent buffers on the stack are
> +    given different tags, this means an access to one buffer using a pointer
> +    generated from the other (e.g. through buffer overrun) will have mismatched
> +    tags and be caught by hwasan.
> +
> +    We don't randomly tag every object on the stack, since that would require
> +    keeping many registers to record each tag.  Instead we randomly generate a
> +    tag for each function frame, and each new stack object uses a tag offset
> +    from that frame tag.
> +    i.e. each object is tagged as RFT + offset, where RFT is the "random frame
> +    tag" generated for this frame.

So randomisation isn't effective at perturbing the pairs of mismatching
tags within a frame, but that is mitigated by objects with the same tag
within a frame being very far apart.  Is that right?

> +    As a demonstration, using the same example program as in the asan block
> +    comment above:
> +
> +     int
> +     foo ()
> +     {
> +       char a[23] = {0};
> +       int b[2] = {0};
> +
> +       a[5] = 1;
> +       b[1] = 2;
> +
> +       return a[5] + b[1];
> +     }
> +
> +    On AArch64 the stack will be ordered as follows for the above function:
> +
> +    Slot 1/ [24 bytes for variable 'a']
> +    Slot 2/ [8 bytes padding for alignment]
> +    Slot 3/ [8 bytes for variable 'b']
> +    Slot 4/ [8 bytes padding for alignment]
> +
> +    (The padding is there to ensure 16 byte alignment as described in the 16
> +     byte granule implications).
> +
> +    While the shadow memory will be ordered as follows:
> +
> +    - 2 bytes (representing 32 bytes in real memory) tagged with RFT + 1.
> +    - 1 byte (representing 16 bytes in real memory) tagged with RFT + 2.
> +
> +    And any pointer to "a" will have the tag RFT + 1, and any pointer to "b"
> +    will have the tag RFT + 2.
> +
> +   [Top Byte Ignore requirements]
> +    Hwasan requires the ability to store an 8 bit tag in every pointer.  There
> +    is no instrumentation done to remove this tag from pointers before
> +    dereferencing, which means the hardware must ignore this tag during memory
> +    accesses.
> +
> +    One architecture that provides this feature is the AArch64 architecture.
> +    This is the only architecture that hwasan is currently implemented for.

This could easily get out of date.  Maybe it would be better to refer to
the target hook instead.

> +
> +   [Stack requires cleanup on unwinding]
> +    During normal operation of a hwasan sanitized program more space in the
> +    shadow memory becomes tagged as the stack grows.  As the stack shrinks this
> +    shadow memory space must become untagged.  If it is not untagged then when
> +    the stack grows again (during other function calls later on in the program)
> +    objects on the stack that are usually not tagged (e.g. parameters passed on
> +    the stack) can be placed in memory whose shadow space is tagged with
> +    something else, and accesses can cause false positive reports.
> +
> +    Hence we place untagging code on every epilogue of functions which tag some
> +    stack objects.
> +
> +    Moreover, the run-time library intercepts longjmp & setjmp to uncolour

uncolor

> +    when the stack is unwound this way.
> +
> +    C++ exceptions are not yet handled, which means this sanitizer can not
> +    handle C++ code that throws exceptions -- it will give false positives
> +    after an exception has been thrown.  */

Is this a limitation in the library as well as the compiler side?
Might be worth clarifying in the comment.

But thanks for this, it's a really nice write-up.

> @@ -2301,7 +2620,8 @@ instrument_derefs (gimple_stmt_iterator *iter, tree t,
>      {
>        if (DECL_THREAD_LOCAL_P (inner))
>  	return;
> -      if (!param_asan_globals && is_global_var (inner))
> +      if ((hwasan_sanitize_p () || !param_asan_globals)
> +	  && is_global_var (inner))
>          return;
>        if (!TREE_STATIC (inner))
>  	{

Could you expand on this?

> @@ -2530,10 +2850,13 @@ maybe_instrument_call (gimple_stmt_iterator *iter)
>  	      break;
>  	    }
>  	}
> -      tree decl = builtin_decl_implicit (BUILT_IN_ASAN_HANDLE_NO_RETURN);
> -      gimple *g = gimple_build_call (decl, 0);
> -      gimple_set_location (g, gimple_location (stmt));
> -      gsi_insert_before (iter, g, GSI_SAME_STMT);
> +      if (! hwasan_sanitize_p ())
> +	{
> +	  tree decl = builtin_decl_implicit (BUILT_IN_ASAN_HANDLE_NO_RETURN);
> +	  gimple *g = gimple_build_call (decl, 0);
> +	  gimple_set_location (g, gimple_location (stmt));
> +	  gsi_insert_before (iter, g, GSI_SAME_STMT);
> +	}

Why is this only needed for asan?  (Just curious.)

> @@ -3255,18 +3581,34 @@ asan_expand_mark_ifn (gimple_stmt_iterator *iter)
>  
>    gcc_checking_assert (TREE_CODE (decl) == VAR_DECL);
>  
> +  tree len = gimple_call_arg (g, 2);
> +  gcc_assert (tree_fits_shwi_p (len));
> +  unsigned HOST_WIDE_INT size_in_bytes = tree_to_shwi (len);
> +  gcc_assert (size_in_bytes);
> +
> +  if (hwasan_sanitize_p ())
> +    {
> +      gcc_assert (param_hwasan_instrument_stack);
> +      /* Here we swap ASAN_MARK calls for HWASAN_MARK.
> +	 This is because we are using the approach of using ASAN_MARK as a
> +	 synonym until here.
> +	 That approach means we don't yet have to duplicate all the special
> +	 cases for ASAN_MARK and ASAN_POISON with the exact same handling but
> +	 called HWASAN_MARK etc.  */

Definitely in favour of that :-)

> +      gimple *hw_poison_call
> +	= gimple_build_call_internal (IFN_HWASAN_MARK, 3,
> +				      gimple_call_arg (g, 0),
> +				      base, len);
> +      gsi_replace (iter, hw_poison_call, false);
> +      return false;
> +    }
> +
>    if (is_poison)
>      {
>        if (asan_handled_variables == NULL)
>  	asan_handled_variables = new hash_set<tree> (16);
>        asan_handled_variables->add (decl);
>      }
> -  tree len = gimple_call_arg (g, 2);
> -
> -  gcc_assert (tree_fits_shwi_p (len));
> -  unsigned HOST_WIDE_INT size_in_bytes = tree_to_shwi (len);
> -  gcc_assert (size_in_bytes);
> -
>    g = gimple_build_assign (make_ssa_name (pointer_sized_int_node),
>  			   NOP_EXPR, base);
>    gimple_set_location (g, loc);
> @@ -3329,6 +3671,7 @@ asan_expand_mark_ifn (gimple_stmt_iterator *iter)
>  bool
>  asan_expand_check_ifn (gimple_stmt_iterator *iter, bool use_calls)
>  {
> +  gcc_assert (!hwasan_sanitize_p ());
>    gimple *g = gsi_stmt (*iter);
>    location_t loc = gimple_location (g);
>    bool recover_p;
> @@ -3602,11 +3945,66 @@ asan_expand_poison_ifn (gimple_stmt_iterator *iter,
>  
>        int nargs;
>        bool store_p = gimple_call_internal_p (use, IFN_ASAN_POISON_USE);
> -      tree fun = report_error_func (store_p, recover_p, tree_to_uhwi (size),
> -				    &nargs);
> -
> -      gcall *call = gimple_build_call (fun, 1,
> -				       build_fold_addr_expr (shadow_var));
> +      gcall *call;
> +      if (hwasan_sanitize_p ())
> +	{
> +	  tree fun = builtin_decl_implicit (BUILT_IN_HWASAN_TAG_MISMATCH4);
> +	  /* NOTE: hwasan has no __hwasan_report_* functions like asan does.
> +		We use __hwasan_tag_mismatch4 with arguments that tell it the
> +		size of access and load to report all tag mismatches.
> +
> +		The arguments to this function are:
> +		  Address of invalid access.
> +		  Bitfield containing information about the access
> +		    (access_info)
> +		  Pointer to a frame of registers
> +		    (for use in printing the contents of registers in a dump)
> +		    Not used yet -- to be used by inline instrumentation.
> +		  Size of access (only if size is not representable in the
> +		    access_info argument).
> +
> +		The access_info bitfield encodes the following pieces of
> +		information:
> +		  - Is this a store or load?
> +		    access_info & 0x10  =>  store
> +		  - Should the program continue after reporting the error?
> +		    access_info & 0x20  =>  recover
> +		  - What size access is this (if size is less than 16 bytes)
> +		    if (access_info & 0xf == 0xf)
> +		      size is taken from last argument.
> +		    else
> +		      size == 1 << (access_info & 0xf)
> +
> +		The last argument contains the size of the access iff the
> +		access was of a size greater than or equal to 16 bytes.
> +
> +		See the function definition `__hwasan_tag_mismatch4` in
> +		libsanitizer/hwasan for the full definition.
> +		*/
> +	  unsigned HOST_WIDE_INT size_in_bytes = tree_to_uhwi (size);
> +	  unsigned size_indicator = (size_in_bytes > 16)
> +	    ? 0xf
> +	    : exact_log2 (size_in_bytes);

The parts about “16 bytes” don't seem to match the equation;
seems like it's a power of 2 in the range [1, 16384].

Is there any benefit to using the size field encoded in the access_info here?
AIUI it was retained for compatibility with pre-__hwasan_tag_mismatch4
versions of the library.  Seems like we could always use 0xf instead.

> +	  unsigned access_info = (0x20 * recover_p)
> +	    + (0x10 * store_p)
> +	    + (size_indicator);
> +	  tree long_pointer_type
> +	    = build_pointer_type (long_unsigned_type_node);

pointer_sized_int_type seems more appropriate, given that the second
argument and the target of the third argument are uptrs.

> +	  call = gimple_build_call (fun, 3,
> +				    build_fold_addr_expr (shadow_var),
> +				    build_int_cst (long_unsigned_type_node,
> +						   access_info),
> +				    build_int_cst (long_pointer_type,
> +						   0),
> +				    size);
> +	}
> +      else
> +	{
> +	  tree fun = report_error_func (store_p, recover_p, tree_to_uhwi (size),
> +					&nargs);
> +	  call = gimple_build_call (fun, 1,
> +				    build_fold_addr_expr (shadow_var));
> +	}
>        gimple_set_location (call, gimple_location (use));
>        gimple *call_to_insert = call;
>  
> @@ -4005,4 +4425,216 @@ hwasan_finish_file (void)
>    flag_sanitize |= SANITIZE_HWADDRESS;
>  }
>  
> +/* Construct a function tree for __hwasan_{load,store}{1,2,4,8,16,_n}.
> +   IS_STORE is either 1 (for a store) or 0 (for a load).  */
> +static tree
> +hwasan_check_func (bool is_store, bool recover_p, HOST_WIDE_INT size_in_bytes,
> +		   int *nargs)
> +{
> +  static enum built_in_function check[2][2][6]
> +    = { { { BUILT_IN_HWASAN_LOAD1, BUILT_IN_HWASAN_LOAD2,
> +	    BUILT_IN_HWASAN_LOAD4, BUILT_IN_HWASAN_LOAD8,
> +	    BUILT_IN_HWASAN_LOAD16, BUILT_IN_HWASAN_LOADN },
> +	  { BUILT_IN_HWASAN_STORE1, BUILT_IN_HWASAN_STORE2,
> +	    BUILT_IN_HWASAN_STORE4, BUILT_IN_HWASAN_STORE8,
> +	    BUILT_IN_HWASAN_STORE16, BUILT_IN_HWASAN_STOREN } },
> +	{ { BUILT_IN_HWASAN_LOAD1_NOABORT,
> +	    BUILT_IN_HWASAN_LOAD2_NOABORT,
> +	    BUILT_IN_HWASAN_LOAD4_NOABORT,
> +	    BUILT_IN_HWASAN_LOAD8_NOABORT,
> +	    BUILT_IN_HWASAN_LOAD16_NOABORT,
> +	    BUILT_IN_HWASAN_LOADN_NOABORT },
> +	  { BUILT_IN_HWASAN_STORE1_NOABORT,
> +	    BUILT_IN_HWASAN_STORE2_NOABORT,
> +	    BUILT_IN_HWASAN_STORE4_NOABORT,
> +	    BUILT_IN_HWASAN_STORE8_NOABORT,
> +	    BUILT_IN_HWASAN_STORE16_NOABORT,
> +	    BUILT_IN_HWASAN_STOREN_NOABORT } } };
> +  if (size_in_bytes == -1)
> +    {
> +      *nargs = 2;
> +      return builtin_decl_implicit (check[recover_p][is_store][5]);
> +    }
> +  *nargs = 1;
> +  int size_log2 = exact_log2 (size_in_bytes);
> +  return builtin_decl_implicit (check[recover_p][is_store][size_log2]);

I don't follow the logic here.  Why is it the case that we have to
handle size_in_bytes == -1, but are otherwise guaranteed that
size_in_bytes is in [1, 2, 4, 8, 16]?  I.e. why is the tree_fits_shwi_p
call needed here:

> +  unsigned HOST_WIDE_INT size_in_bytes
> +    = is_scalar_access && tree_fits_shwi_p (len) ? tree_to_shwi (len) : -1;

?  If is_scalar_access guarantees a valid value then I think we should
drop the tree_fits_shwi_p check above and instead assert that size_log2
is in range.  If is_scalar_access doesn't guarantee a valid value then
I think we should check size_log2 and set *nargs to 2 for out-of-range
values.

> +}
> +
> +/* Expand the HWASAN_{LOAD,STORE} builtins.  */
> +bool
> +hwasan_expand_check_ifn (gimple_stmt_iterator *iter, bool)
> +{
> +  gimple *g = gsi_stmt (*iter);
> +  location_t loc = gimple_location (g);
> +  bool recover_p;
> +  if (flag_sanitize & SANITIZE_USER_HWADDRESS)
> +    recover_p = (flag_sanitize_recover & SANITIZE_USER_HWADDRESS) != 0;
> +  else
> +    recover_p = (flag_sanitize_recover & SANITIZE_KERNEL_HWADDRESS) != 0;
> +
> +  HOST_WIDE_INT flags = tree_to_shwi (gimple_call_arg (g, 0));
> +  gcc_assert (flags < ASAN_CHECK_LAST);
> +  bool is_scalar_access = (flags & ASAN_CHECK_SCALAR_ACCESS) != 0;
> +  bool is_store = (flags & ASAN_CHECK_STORE) != 0;
> +  bool is_non_zero_len = (flags & ASAN_CHECK_NON_ZERO_LEN) != 0;
> +
> +  tree base = gimple_call_arg (g, 1);
> +  tree len = gimple_call_arg (g, 2);
> +
> +  /* `align` is unused for HWASAN_CHECK, but I pass the argument anyway

s/I/we/.

> +     since that way the arguments match ASAN_CHECK.  */
> +  /* HOST_WIDE_INT align = tree_to_shwi (gimple_call_arg (g, 3));  */
> +
> +  unsigned HOST_WIDE_INT size_in_bytes
> +    = is_scalar_access && tree_fits_shwi_p (len) ? tree_to_shwi (len) : -1;
> +
> +  gimple_stmt_iterator gsi = *iter;
> +
> +  if (!is_non_zero_len)
> +    {
> +      /* So, the length of the memory area to hwasan-protect is
> +	 non-constant.  Let's guard the generated instrumentation code
> +	 like:
> +
> +	 if (len != 0)
> +	   {
> +	     // hwasan instrumentation code goes here.
> +	   }
> +	 // falltrough instructions, starting with *ITER.  */
> +
> +      g = gimple_build_cond (NE_EXPR,
> +			    len,
> +			    build_int_cst (TREE_TYPE (len), 0),
> +			    NULL_TREE, NULL_TREE);
> +      gimple_set_location (g, loc);
> +
> +      basic_block then_bb, fallthrough_bb;
> +      insert_if_then_before_iter (as_a <gcond *> (g), iter,
> +				  /*then_more_likely_p=*/true,
> +				  &then_bb, &fallthrough_bb);
> +      /* Note that fallthrough_bb starts with the statement that was
> +	pointed to by ITER.  */
> +
> +      /* The 'then block' of the 'if (len != 0) condition is where
> +	we'll generate the hwasan instrumentation code now.  */
> +      gsi = gsi_last_bb (then_bb);
> +    }
> +
> +  g = gimple_build_assign (make_ssa_name (pointer_sized_int_node),
> +			   NOP_EXPR, base);
> +  gimple_set_location (g, loc);
> +  gsi_insert_after (&gsi, g, GSI_NEW_STMT);
> +  tree base_addr = gimple_assign_lhs (g);
> +
> +  int nargs = 0;
> +  tree fun = hwasan_check_func (is_store, recover_p, size_in_bytes, &nargs);
> +  if (nargs == 1)
> +    g = gimple_build_call (fun, 1, base_addr);
> +  else
> +    {
> +      gcc_assert (nargs == 2);
> +      g = gimple_build_assign (make_ssa_name (pointer_sized_int_node),
> +			       NOP_EXPR, len);
> +      gimple_set_location (g, loc);
> +      gsi_insert_after (&gsi, g, GSI_NEW_STMT);
> +      tree sz_arg = gimple_assign_lhs (g);
> +      g = gimple_build_call (fun, nargs, base_addr, sz_arg);
> +    }
> +
> +  gimple_set_location (g, loc);
> +  gsi_insert_after (&gsi, g, GSI_NEW_STMT);
> +  gsi_remove (iter, true);
> +  *iter = gsi;
> +  return false;
> +}
> +
> +/* For stack tagging:
> +     Dummy: the HWASAN_MARK internal function should only ever be in the code
> +     after the sanopt pass.  */

Same nit as the previous part about this style of comment formatting.

> +bool
> +hwasan_expand_mark_ifn (gimple_stmt_iterator *)
> +{
> +  gcc_unreachable ();
> +}
> +
> +bool
> +gate_hwasan ()
> +{
> +  return hwasan_sanitize_p ();
> +}
> +
> +namespace {
> +
> +const pass_data pass_data_hwasan =
> +{
> +  GIMPLE_PASS, /* type */
> +  "hwasan", /* name */
> +  OPTGROUP_NONE, /* optinfo_flags */
> +  TV_NONE, /* tv_id */
> +  ( PROP_ssa | PROP_cfg | PROP_gimple_leh ), /* properties_required */
> +  0, /* properties_provided */
> +  0, /* properties_destroyed */
> +  0, /* todo_flags_start */
> +  TODO_update_ssa, /* todo_flags_finish */
> +};
> +
> +class pass_hwasan : public gimple_opt_pass
> +{
> +public:
> +  pass_hwasan (gcc::context *ctxt)
> +    : gimple_opt_pass (pass_data_hwasan, ctxt)
> +  {}
> +
> +  /* opt_pass methods: */
> +  opt_pass * clone () { return new pass_hwasan (m_ctxt); }
> +  virtual bool gate (function *) { return gate_hwasan (); }
> +  virtual unsigned int execute (function *) { return hwasan_instrument (); }
> +
> +}; /* class pass_hwasan  */
> +
> +} /* anon namespace  */
> +
> +gimple_opt_pass *
> +make_pass_hwasan (gcc::context *ctxt)
> +{
> +  return new pass_hwasan (ctxt);
> +}
> +
> +namespace {
> +
> +const pass_data pass_data_hwasan_O0 =
> +{
> +  GIMPLE_PASS, /* type */
> +  "hwasan_O0", /* name */
> +  OPTGROUP_NONE, /* optinfo_flags */
> +  TV_NONE, /* tv_id */
> +  ( PROP_ssa | PROP_cfg | PROP_gimple_leh ), /* properties_required */
> +  0, /* properties_provided */
> +  0, /* properties_destroyed */
> +  0, /* todo_flags_start */
> +  TODO_update_ssa, /* todo_flags_finish */
> +};
> +
> +class pass_hwasan_O0 : public gimple_opt_pass
> +{
> +public:
> +  pass_hwasan_O0 (gcc::context *ctxt)
> +    : gimple_opt_pass (pass_data_hwasan_O0, ctxt)
> +  {}
> +
> +  /* opt_pass methods: */
> +  opt_pass * clone () { return new pass_hwasan_O0 (m_ctxt); }
> +  virtual bool gate (function *) { return !optimize && gate_hwasan (); }
> +  virtual unsigned int execute (function *) { return hwasan_instrument (); }
> +
> +}; /* class pass_hwasan */
> +
> +} /* anon namespace */
> +
> +gimple_opt_pass *
> +make_pass_hwasan_O0 (gcc::context *ctxt)
> +{
> +  return new pass_hwasan_O0 (ctxt);
> +}

Is it worth creating separate passes for this?  Seems like we could
just fork the implementation of the asan ones based on whether we're
using hwasan or not.

> diff --git a/gcc/gimplify.c b/gcc/gimplify.c
> index 15dfee903ab298f6a02e45d1affcc2260f3c911d..24ebedd490aea4ad634a92aa5742a83b1b0c0bb7 100644
> --- a/gcc/gimplify.c
> +++ b/gcc/gimplify.c
> @@ -1234,8 +1234,11 @@ asan_poison_variable (tree decl, bool poison, gimple_stmt_iterator *it,
>  
>    /* It's necessary to have all stack variables aligned to ASAN granularity
>       bytes.  */
> -  if (DECL_ALIGN_UNIT (decl) <= ASAN_SHADOW_GRANULARITY)
> -    SET_DECL_ALIGN (decl, BITS_PER_UNIT * ASAN_SHADOW_GRANULARITY);
> +  gcc_assert (!hwasan_sanitize_p () || hwasan_sanitize_stack_p ());
> +  unsigned shadow_granularity
> +    = hwasan_sanitize_p () ? HWASAN_TAG_GRANULE_SIZE : ASAN_SHADOW_GRANULARITY;
> +  if (DECL_ALIGN_UNIT (decl) <= shadow_granularity)
> +    SET_DECL_ALIGN (decl, BITS_PER_UNIT * shadow_granularity);
>  
>    HOST_WIDE_INT flags = poison ? ASAN_MARK_POISON : ASAN_MARK_UNPOISON;
>  
> @@ -15023,7 +15026,7 @@ gimplify_function_tree (tree fndecl)
>       if necessary.  */
>    cfun->curr_properties |= PROP_gimple_lva;
>  
> -  if (asan_sanitize_use_after_scope () && sanitize_flags_p (SANITIZE_ADDRESS))
> +  if (asan_sanitize_use_after_scope ())
>      asan_poisoned_variables = new hash_set<tree> ();
>    bind = gimplify_body (fndecl, true);
>    if (asan_poisoned_variables)
> diff --git a/gcc/internal-fn.c b/gcc/internal-fn.c
> index 8efc77d986b927dc4a37e396e6c710ffeda663ff..3397fa355f373f3ead5a02a07838a467f08f03c4 100644
> --- a/gcc/internal-fn.c
> +++ b/gcc/internal-fn.c
> @@ -467,6 +467,105 @@ expand_UBSAN_OBJECT_SIZE (internal_fn, gcall *)
>  /* This should get expanded in the sanopt pass.  */
>  
>  static void
> +expand_HWASAN_CHECK (internal_fn, gcall *)
> +{
> +  gcc_unreachable ();
> +}
> +
> +/* For hwasan stack tagging:
> +     Clear tags on the dynamically allocated space.
> +     For use after an object dynamically allocated on the stack goes out of
> +     scope.  */
> +static void
> +expand_HWASAN_ALLOCA_UNPOISON (internal_fn, gcall *gc)
> +{
> +  tree restored_position = gimple_call_arg (gc, 0);
> +  rtx restored_rtx = expand_expr (restored_position, NULL_RTX, VOIDmode,
> +				  EXPAND_NORMAL);
> +  rtx func = init_one_libfunc ("__hwasan_tag_memory");
> +  rtx off = expand_simple_binop (ptr_mode, MINUS, restored_rtx,
> +				 stack_pointer_rtx, NULL_RTX, 0,
> +				 OPTAB_LIB_WIDEN);
> +  rtx dynamic = convert_memory_address (ptr_mode, virtual_stack_dynamic_rtx);
> +  emit_library_call_value (func, NULL_RTX, LCT_NORMAL, VOIDmode,
> +			   dynamic, ptr_mode,
> +			   const0_rtx, QImode,
> +			   off, ptr_mode);
> +}
> +
> +/* For hwasan stack tagging:
> +     Return a tag to be used for a dynamic allocation.  */
> +static void
> +expand_HWASAN_CHOOSE_TAG (internal_fn, gcall *gc)
> +{
> +  tree tag = gimple_call_lhs (gc);
> +  rtx target = expand_expr (tag, NULL_RTX, VOIDmode, EXPAND_NORMAL);
> +  machine_mode mode = GET_MODE (target);
> +  gcc_assert (mode == QImode);
> +
> +  rtx base_tag = hwasan_extract_tag (hwasan_base ());
> +  gcc_assert (base_tag);
> +  rtx tag_offset = GEN_INT (hwasan_current_tag ());
> +  rtx chosen_tag = expand_simple_binop (QImode, PLUS, base_tag, tag_offset,
> +					target, /* unsignedp = */1,
> +					OPTAB_WIDEN);
> +
> +  gcc_assert (chosen_tag);
> +  /* Really need to put the tag into the `target` RTX.  */
> +  if (chosen_tag != target)
> +    {
> +      rtx temp = chosen_tag;
> +      machine_mode ret_mode = GET_MODE (chosen_tag);
> +      if (ret_mode != mode)
> +	temp = simplify_gen_unary (TRUNCATE, mode, chosen_tag, ret_mode);

Is this truncation necessary?  I'd have expected:

  expand_simple_binop (QImode, …)

to always return a QImode value, even for the widening path.

> +/* For hwasan stack tagging:
> +     Tag a region of space in the shadow stack according to the base pointer of
> +     an object on the stack.  */
> +static void
> +expand_HWASAN_MARK (internal_fn, gcall *gc)
> +{
> +  HOST_WIDE_INT flag = tree_to_shwi (gimple_call_arg (gc, 0));
> +  bool is_poison = ((asan_mark_flags)flag) == ASAN_MARK_POISON;
> +
> +  tree base = gimple_call_arg (gc, 1);
> +  gcc_checking_assert (TREE_CODE (base) == ADDR_EXPR);
> +  /* base is a pointer argument, hence in ptr_mode.
> +     We convert to Pmode for use in the hwasan_extract_tag and
> +     hwasan_create_untagged_base functions.
> +     We then convert the result to ptr_mode for the emit_library_call.  */

How does this ptr_mode<->Pmode stuff work for ILP32?  Wouldn't the tag
get lost by the Pmode->ptr_mode conversion?  Or does something ensure
that the tag is moved from bits 24+ to bits 56+ and back?

> +  rtx base_rtx = convert_memory_address (Pmode, expand_normal (base));
> +
> +  rtx tag = is_poison ? const0_rtx : hwasan_extract_tag (base_rtx);
> +  rtx address = hwasan_create_untagged_base (base_rtx);
> +  address = convert_memory_address (ptr_mode, address);
> +
> +  tree len = gimple_call_arg (gc, 2);
> +  gcc_assert (tree_fits_shwi_p (len));

Why's this guaranteed to be a SHWI?  For example, I couldn't see
anything that explicitly stopped calls to build_asan_poison_call_expr
with poly_int- or variable-sized decls.

Could we expand this to an aligned size in gimple, when generating
the HWASAN_MARK call?  E.g. we could reuse the alloca code for
computing an aligned size.

It'd probably be worth using the gimple_build routines in
gimple-fold.h to build the calculation, instead of calling
gimple_build_assign directly.  The operations will then get
folded on the fly.

> +  unsigned HOST_WIDE_INT size_in_bytes = tree_to_shwi (len);
> +  uint8_t tg_mask = HWASAN_TAG_GRANULE_SIZE - 1;
> +  gcc_assert (size_in_bytes);
> +  size_in_bytes = (size_in_bytes + tg_mask) & ~tg_mask;
> +  rtx size = gen_int_mode (size_in_bytes, ptr_mode);
> +
> +  rtx func = init_one_libfunc ("__hwasan_tag_memory");
> +  emit_library_call (func,
> +      LCT_NORMAL,
> +      VOIDmode,
> +      address, ptr_mode,
> +      tag, QImode,
> +      size, ptr_mode);

Formatting nit: should be indented under “func”.  Seems at least
LCT_NORMAL and VOIDmode could go on the same line as “func”.

Generally looks really good though (this and previous patches).

Thanks,
Richard