[PATCH] Add sanopt support for UBSAN_PTR.

Martin Liška mliska@suse.cz
Thu Oct 5 13:52:00 GMT 2017


On 10/04/2017 08:06 PM, Jakub Jelinek wrote:
> On Wed, Oct 04, 2017 at 11:05:23AM +0200, Martin Liška wrote:
>> Following patch adds support for optimizing out unnecessary UBSAN_PTR checks.
>> It handles separately positive and negative offsets, zero offset is ignored.
>> Apart from that, we utilize get_inner_reference for local and global variables,
>> that also helps to reduce some.
> 
> Thanks for working on it.
> 
>> @@ -148,6 +149,61 @@ struct sanopt_tree_triplet_hash : typed_noop_remove <sanopt_tree_triplet>
>>    }
>>  };
>>  
>> +/* Tree couple for ptr_check_map.  */
>> +struct sanopt_tree_couple
>> +{
>> +  tree ptr;
>> +  bool positive;
> 
> Maybe pos_p instead, so that it isn't that long?

Hi.

Yes.

> 
>> +  static inline bool
>> +  equal (const sanopt_tree_couple &ref1, const sanopt_tree_couple &ref2)
>> +  {
>> +    return operand_equal_p (ref1.ptr, ref2.ptr, 0)
>> +	   && ref1.positive == ref2.positive;
> 
> Or this would need to use ()s around the return expression for emacs
> users.

Done.

> 
>> +/* Return true when pointer PTR for a given OFFSET is already sanitized
>> +   in a given sanitization context CTX.  */
>> +
>> +static bool
>> +has_dominating_ubsan_ptr_check (sanopt_ctx *ctx, tree ptr, tree cur_offset)
> 
> Any reason why you pass cur_offset as a tree rather than wide_int & or
> offset_int &?  If the caller makes sure it is sign extended from
> pointer/sizetype's precision, then:
> 
>> +{
>> +  int r = get_range_pos_neg (cur_offset);
> 
> here you can just wi::neg_p or similar.
> 
>> +  /* We already have recorded a UBSAN_PTR check for this pointer.  Perhaps we
>> +     can drop this one.  But only if this check doesn't specify larger offset.
>> +     */
>> +  tree offset = gimple_call_arg (g, 1);
>> +  gcc_assert (TREE_CODE (offset) == INTEGER_CST);
>> +
>> +  if (positive && tree_int_cst_le (cur_offset, offset))
>> +    return true;
>> +  else if (!positive && tree_int_cst_le (offset, cur_offset))
>> +    return true;
> 
> And the comparisons here in wide_int would be cheaper too.
> 
>> +
>> +static void
>> +record_ubsan_ptr_check_stmt (sanopt_ctx *ctx, gimple *stmt, tree ptr,
>> +			     tree offset)
>> +{
>> +  int r = get_range_pos_neg (offset);
> 
> See above.

Good point! I will use offset_int at all places.

> 
>> +  gcc_assert (r != 3);
>> +  bool positive = r == 1;
>> +
>> +  sanopt_tree_couple couple;
>> +  couple.ptr = ptr;
>> +  couple.positive = positive;
>> +
>> +  auto_vec<gimple *> &v = ctx->ptr_check_map.get_or_insert (couple);
>> +  v.safe_push (stmt);
>> +}
>> +
>> +/* Optimize away redundant UBSAN_PTR calls.  */
>> +
>> +static bool
>> +maybe_optimize_ubsan_ptr_ifn (sanopt_ctx *ctx, gimple *stmt)
>> +{
>> +  gcc_assert (gimple_call_num_args (stmt) == 2);
>> +  tree ptr = gimple_call_arg (stmt, 0);
>> +  tree cur_offset = gimple_call_arg (stmt, 1);
>> +
>> +  if (TREE_CODE (cur_offset) != INTEGER_CST)
>> +    return false;
>> +
>> +  if (integer_zerop (cur_offset))
>> +    return true;
>> +
>> +  if (has_dominating_ubsan_ptr_check (ctx, ptr, cur_offset))
>> +    return true;
>> +
>> +  tree base = ptr;
>> +  if (TREE_CODE (base) == ADDR_EXPR)
>> +    {
>> +      base = TREE_OPERAND (base, 0);
>> +
>> +      HOST_WIDE_INT bitsize, bitpos;
>> +      machine_mode mode;
>> +      int volatilep = 0, reversep, unsignedp = 0;
>> +      tree offset;
>> +      base = get_inner_reference (base, &bitsize, &bitpos, &offset, &mode,
>> +				  &unsignedp, &reversep, &volatilep);
>> +      if (DECL_P (base))
>> +	{
>> +	  gcc_assert (!DECL_REGISTER (base));
>> +	  /* If BASE is a fixed size automatic variable or
>> +	     global variable defined in the current TU and bitpos
>> +	     fits, don't instrument anything.  */
>> +	  if ((offset == NULL_TREE || TREE_CODE (offset) == INTEGER_CST)
> 
> Do you really need to handle offset != NULL_TREE?
> If the bit offset is representable in shwi, then it will just be
> in bitpos and offset will be NULL.

For this:
UBSAN_PTR (&MEM[(void *)&b + 9223372036854775807B], 1);

I have offset:
<integer_cst 0x155553ec19f0 type <integer_type 0x155553d81000 sizetype> constant 9223372036854775807>

which is a valid offset.

> 
>> +	      && (VAR_P (base)
>> +		  || TREE_CODE (base) == PARM_DECL
>> +		  || TREE_CODE (base) == RESULT_DECL)
>> +	      && DECL_SIZE (base)
>> +	      && TREE_CODE (DECL_SIZE (base)) == INTEGER_CST
> 
> Why are you testing here DECL_SIZE when you actually then use
> DECL_SIZE_UNIT?

Yep, will fix that.

> 
>> +	      && (!is_global_var (base) || decl_binds_to_current_def_p (base)))
>> +	    {
>> +	      HOST_WIDE_INT bytepos = bitpos / BITS_PER_UNIT;
>> +	      offset_int total_offset = bytepos;
>> +	      if (offset != NULL_TREE)
>> +		total_offset += wi::sext (wi::to_offset (offset), POINTER_SIZE);
>> +	      total_offset += wi::sext (wi::to_offset (cur_offset),
>> +					POINTER_SIZE);
> 
> Why are you sign extending it each time?  Can't it be sign extended after
> all the additions?

I had problem with this:
UBSAN_PTR (&MEM[(void *)&b + -9223372036854775807B], 9223372036854775805);

when not doing sign extension, I was said that fits_shwi_p(to_offset(-9223372036854775807) + to_offset (9223372036854775805)) is false.


> On the other side, is it safe to add the (bytepos + offset) part with
> the cur_offset part unconditionally?
> If the former is "positive" and cur_offset is "negative", they cancel each
> other and we IMHO shouldn't try to optimize it away.  bytepos could be some
> very large integer, outside of bounds of the var, and cur_offset some
> negative constant, where both var + bytepos and (var + bytepos) + cur_offset
> overflow.  Or bytepos could be negative, outside of the bounds of the
> variable.  I think you need to check separately that bytepos is >= 0 and
> <= DECL_SIZE_UNIT and that bytepos + cur_offset is within that range too.
> 
>> +	      /* New total offset can fit in pointer type.  */
>> +	      if (wi::fits_shwi_p (total_offset))
> 
> Why do you need to convert a wide_int to signed HOST_WIDE_INT and back?
> Or do you want to check for some overflows above?

Do I understand it correctly that:

  p = b - 9223372036854775807LL; /* pointer overflow check is needed */
  p2 = p + 9223372036854775805LL;

should not be handled because: &b + -9223372036854775807B is outsize of address of b?
Am I right?

I will rewrite the usage of offset, to support just offset == NULL_TREE.

> 
>> +		{
>> +		  tree toffset = build_int_cst (TREE_TYPE (cur_offset),
>> +						total_offset.to_shwi ());
>> +		  tree ptr2 = build1 (ADDR_EXPR,
>> +				      build_pointer_type (TREE_TYPE (base)),
>> +				      base);
> 
> Why do you need to create these trees here when you actually don't use them
> (or shouldn't need)

Is it really impossible to catch something by has_dominating_ubsan_ptr_check (ctx, ptr2, total_offset)?

> 
>> +
>> +		  /* If total offset is non-negative and smaller or equal
>> +		     to size of declaration, then no check is needed.  */
>> +		  if (get_range_pos_neg (toffset) == 1
>> +		      && tree_int_cst_le (toffset, DECL_SIZE_UNIT (base)))
>> +		    return true;
> 
> above.
> 
>> +		  /* Maybe we already have a dominating check for the BASE.  */
>> +		  else if (has_dominating_ubsan_ptr_check (ctx, ptr2, toffset))
>> +		    return true;
> 
> And, isn't this something that really doesn't depend on whether base is a
> decl (so move it outside of the if (DECL_P ()) if), but rather just on
> whether bytepos and cur_offset have the same "sign"?
> If ptr is say &MEM_REF[whatever + 1024] and 2048 (both positive), then
> it will not overflow if we've checked that UBSAN(whatever, 3072) (or more).
> Similarly for both negative values.  For &MEM_REF[whatever + 3072] and -2048,
> previous UBSAN(whatever, 3072) check would be sufficient too, or for
> &MEM_REF[whatever + 1024] and -3072 a previous UBSAN(whatever, -2048) check
> (or more negative).  But in any case, you need to compare the signs and if
> they aren't the same, reason what is appropriate and what is not.

Good point, I will work on that!

>>        else
>> -	gsi_next (&gsi);
>> +	{
>> +	  if (dump_file && (dump_flags & TDF_DETAILS))
>> +	    {
>> +	      fprintf (dump_file, "Leaving: ");
>> +	      print_gimple_stmt (dump_file, stmt, 0, dump_flags);
>> +	    }
> 
> Why this?  Printing all other stmts?  That doesn't make sense to me.

Ok, it's more debugging dump and I'll remove it.

Thanks for feedback,
Martin

> 
> 	Jakub
> 



More information about the Gcc-patches mailing list