[PATCH 03/10] tree-object-size: Use tree instead of HOST_WIDE_INT

Siddhesh Poyarekar siddhesh@gotplt.org
Fri Nov 19 19:01:19 GMT 2021


On 11/19/21 22:36, Jakub Jelinek wrote:
> On Wed, Nov 10, 2021 at 12:31:29AM +0530, Siddhesh Poyarekar wrote:
>> 	* tree-object-size.h (compute_builtin_object_size): Return tree
>> 	instead of HOST_WIDE_INT.
>> 	* builtins.c (fold_builtin_object_size): Adjust.
>> 	* gimple-fold.c (gimple_fold_builtin_strncat): Likewise.
>> 	* ubsan.c (instrument_object_size): Likewise.
>> 	* tree-object-size.c (object_sizes): Change type to vec<tree>.
>> 	(initval): New function.
>> 	(unknown): Use it.
>> 	(size_unknown_p, size_initval, size_unknown): New functions.
>> 	(object_sizes_unknown_p): Use it.
>> 	(object_sizes_get): Return tree.
>> 	(object_sizes_initialize): Rename from object_sizes_set_force
>> 	and set VAL parameter type as tree.
>> 	(object_sizes_set): Set VAL parameter type as tree and adjust
>> 	implementation.
>> 	(size_for_offset): New function.
>> 	(addr_object_size): Change PSIZE parameter to tree and adjust
>> 	implementation.
>> 	(alloc_object_size): Return tree.
>> 	(compute_builtin_object_size): Return tree in PSIZE.
>> 	(expr_object_size, call_object_size, unknown_object_size):
>> 	Adjust for object_sizes_set change.
>> 	(merge_object_sizes): Set OFFSET type to tree and adjust
>> 	implementation.
>> 	(plus_stmt_object_size, cond_expr_object_size,
>> 	collect_object_sizes_for, check_for_plus_in_loops_1): Adjust for
>> 	change of type from HOST_WIDE_INT to tree.
>> 	(object_sizes_execute): Adjust for type change to tree.
> 
>> --- a/gcc/builtins.c
>> +++ b/gcc/builtins.c
>> @@ -10226,7 +10226,7 @@ maybe_emit_sprintf_chk_warning (tree exp, enum built_in_function fcode)
>>   static tree
>>   fold_builtin_object_size (tree ptr, tree ost)
>>   {
>> -  unsigned HOST_WIDE_INT bytes;
>> +  tree bytes;
>>     int object_size_type;
>>   
>>     if (!validate_arg (ptr, POINTER_TYPE)
>> @@ -10251,17 +10251,15 @@ fold_builtin_object_size (tree ptr, tree ost)
>>     if (TREE_CODE (ptr) == ADDR_EXPR)
>>       {
>>         compute_builtin_object_size (ptr, object_size_type, &bytes);
>> -      if (wi::fits_to_tree_p (bytes, size_type_node))
>> -	return build_int_cstu (size_type_node, bytes);
>> +      return fold_convert (size_type_node, bytes);
>>       }
>>     else if (TREE_CODE (ptr) == SSA_NAME)
>>       {
>>         /* If object size is not known yet, delay folding until
>>          later.  Maybe subsequent passes will help determining
>>          it.  */
>> -      if (compute_builtin_object_size (ptr, object_size_type, &bytes)
>> -	  && wi::fits_to_tree_p (bytes, size_type_node))
>> -	return build_int_cstu (size_type_node, bytes);
>> +      if (compute_builtin_object_size (ptr, object_size_type, &bytes))
>> +	return fold_convert (size_type_node, bytes);
> 
> Neither of these are equivalent to what it used to do before.
> If some target has e.g. pointers wider than size_t, then previously we could
> compute bytes that doesn't fit into size_t and would return NULL which
> eventually would result in object_sizes_execute or expand_builtin_object_size
> resolving it to the unknown value.  But fold_convert will perform modulo
> arithmetics on it, so say if size_t is 24-bit and bytes is 0x1000001, it
> will fold to (size_t) 1.  I think we should still punt if it doesn't fit.
> Or do we ensure that what it returns always has sizetype type?

compute_builtin_object_size should always return sizetype.  I probably 
haven't strictly ensured that but I could do that.  Would it be 
necessary to check for fit in sizetype -> size_type_node conversion too? 
  I've been assuming that they'd have the same precision.

> 
>> @@ -2486,13 +2486,14 @@ gimple_fold_builtin_strncat (gimple_stmt_iterator *gsi)
>>     if (cmpsrc < 0)
>>       return false;
>>   
>> -  unsigned HOST_WIDE_INT dstsize;
>> +  tree dstsize;
>>   
>>     bool nowarn = warning_suppressed_p (stmt, OPT_Wstringop_overflow_);
>>   
>> -  if (!nowarn && compute_builtin_object_size (dst, 1, &dstsize))
>> +  if (!nowarn && compute_builtin_object_size (dst, 1, &dstsize)
>> +      && tree_fits_uhwi_p (dstsize))
>>       {
>> -      int cmpdst = compare_tree_int (len, dstsize);
>> +      int cmpdst = compare_tree_int (len, tree_to_uhwi (dstsize));
> 
> Why jump from tree to UHWI and back?
> Check that TREE_CODE (dstsize) == INTEGER_CST and do
> tree_int_cst_compare instead?
> 
>>   
>>         if (cmpdst >= 0)
>>   	{
>> @@ -2509,7 +2510,7 @@ gimple_fold_builtin_strncat (gimple_stmt_iterator *gsi)
>>   				    "destination size")
>>   			       : G_("%qD specified bound %E exceeds "
>>   				    "destination size %wu"),
>> -			       fndecl, len, dstsize);
>> +			       fndecl, len, tree_to_uhwi (dstsize));
> 
> Similarly, use %E and pass dstsize to it.
> 

I noticed it while doing the gimple fold patches; I'll fix it.  I need 
to rebase this bit anyway.

>> +/* Initial value of object sizes; zero for maximum and SIZE_MAX for minimum
>> +   object size.  */
>> +
>> +static inline unsigned HOST_WIDE_INT
>> +initval (int object_size_type)
>> +{
>> +  return (unsigned HOST_WIDE_INT) -popcount_hwi (object_size_type
>> +						 & OST_MINIMUM);
> 
> This makes the code quite unreadable and on some arches it will be also
> much worse (popcount is a libcall on many, and even if not, it is inline
> function using a builtin inside of it).  Can't you use a simple readable
>    return (object_size_type & OST_MINUMUM) ? HOST_WIDE_INT_M1U : 0;
> and let the compiler optimize as much as it wants?
> ?

OK, I'll do that; I was trying too hard there I suppose :)

> 
>> +/* Bytes at end of the object with SZ from offset OFFSET. */
>> +
>> +static tree
>> +size_for_offset (tree sz, tree offset)
>> +{
>> +  return size_binop (MINUS_EXPR, size_binop (MAX_EXPR, sz, offset), offset);
>> +}
>>   
>> @@ -314,27 +356,22 @@ addr_object_size (struct object_size_info *osi, const_tree ptr,
>>   			    SSA_NAME_VERSION (var)))
>>   	    sz = object_sizes_get (osi, SSA_NAME_VERSION (var));
>>   	  else
>> -	    sz = unknown (object_size_type);
>> +	    sz = size_unknown (object_size_type);
>>   	}
>> -      if (sz != unknown (object_size_type))
>> +      if (!size_unknown_p (sz, object_size_type))
>>   	{
>> -	  offset_int mem_offset;
>> -	  if (mem_ref_offset (pt_var).is_constant (&mem_offset))
>> -	    {
>> -	      offset_int dsz = wi::sub (sz, mem_offset);
>> -	      if (wi::neg_p (dsz))
>> -		sz = 0;
>> -	      else if (wi::fits_uhwi_p (dsz))
>> -		sz = dsz.to_uhwi ();
>> -	      else
>> -		sz = unknown (object_size_type);
>> -	    }
>> +	  tree offset = TREE_OPERAND (pt_var, 1);
>> +	  if (TREE_CODE (offset) != INTEGER_CST
>> +	      || TREE_CODE (sz) != INTEGER_CST)
>> +	    sz = size_unknown (object_size_type);
>>   	  else
>> -	    sz = unknown (object_size_type);
>> +	    sz = size_for_offset (sz, offset);
> 
> This doesn't match what the code did and I'm surprised if it works at all.
> TREE_OPERAND (pt_var, 1), while it is an INTEGER_CST or POLY_INT_CST,
> has in its type encoded the type for aliasing, so the type is some pointer
> type.  Performing size_binop etc. on such values can misbehave, the code
> assumes that it is sizetype or at least some integral type compatible with
> it.Also, mem_ref_offset is signed and offset_int has bigger precision
> than pointers on the target such that it can be always signed.  So
> e.g. if MEM_REF's second operand is bigger or equal than half of the
> address space, in the old code it would appear to be negative, wi::sub would
> result in a value bigger than sz.  Not really sure right now if that is
> exactly how we want to treat it, would be nice to try some testcase.

Let me try coming up with a test case for it.

> In any case, size_for_offset should be called with the offset converted
> to sizetype if it does size_binop on it.

I'll fix this.

>> +    reexamine |= merge_object_sizes (osi, var, then_, size_int (0));
> 
> size_zero_node ?
>>     else
>>       expr_object_size (osi, var, then_);
>>   
>> @@ -952,7 +970,7 @@ cond_expr_object_size (struct object_size_info *osi, tree var, gimple *stmt)
>>       return reexamine;
>>   
>>     if (TREE_CODE (else_) == SSA_NAME)
>> -    reexamine |= merge_object_sizes (osi, var, else_, 0);
>> +    reexamine |= merge_object_sizes (osi, var, else_, size_int (0));
> 
> Likewise.  Many times.

OK.

>> -  unsigned HOST_WIDE_INT size;
>> -  if (compute_builtin_object_size (base_addr, 0, &size))
>> -    sizet = build_int_cst (sizetype, size);
>> -  else if (optimize)
> 
> I'd prefer not to reindent it to avoid useless churn, just do
>    if (compute_builtin_object_size (base_addr, 0, &sizet))
>      ;
>    else if (optimize)

OK.

> ...
> 
> 	Jakub
> 

Thanks,
Siddhesh


More information about the Gcc-patches mailing list