This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH][Middle-end]2nd patch of PR78809 and PR83026


On Thu, Dec 14, 2017 at 01:45:21PM -0600, Qing Zhao wrote:
> 2017-12-11  Qing Zhao  <qing.zhao@oracle.com <mailto:qing.zhao@oracle.com>>

No " <mailto:qing.zhao@oracle.com>" in ChangeLog entries please.

> --- a/gcc/tree-ssa-strlen.c
> +++ b/gcc/tree-ssa-strlen.c
> @@ -2541,6 +2541,198 @@ handle_builtin_memcmp (gimple_stmt_iterator *gsi)
>    return false;
>  }
>  
> +/* Given an index to the strinfo vector, compute the string length for the
> +   corresponding string. Return -1 when unknown.  */
> + 
> +static HOST_WIDE_INT 
> +compute_string_length (int idx)
> +{
> +  HOST_WIDE_INT string_leni = -1; 
> +  gcc_assert (idx != 0);
> +
> +  if (idx < 0)
> +    string_leni = ~idx;
> +  else
> +    {
> +      strinfo *si = get_strinfo (idx);
> +      if (si)
> +	{
> +	  tree const_string_len = get_string_length (si);
> +	  string_leni
> +	    = (const_string_len && tree_fits_uhwi_p (const_string_len)
> +	       ? tree_to_uhwi(const_string_len) : -1); 

So, you are returning a signed HWI, then clearly tree_fits_uhwi_p and
tree_to_uhwi are inappropriate, you should have used tree_fits_shwi_p
and tree_to_shwi.  Space after function name is missing too.
And, as you start by initializing string_leni to -1, there is no
point to write it this way rather than
	  if (const_string_len && tree_fits_shwi_p (const_string_len))
	    string_leni = tree_to_shwi (const_string_len);

> +	}
> +    }

Maybe also do
  if (string_leni < 0)
    return -1;

> +  return string_leni;

unless the callers just look for negative value as unusable.

> +      tree len = gimple_call_arg (stmt, 2);
> +      if (tree_fits_uhwi_p (len))
> +        length = tree_to_uhwi (len);

Similarly to above, you are mixing signed and unsigned HWIs too much.

> +      if (gimple_code (ustmt) == GIMPLE_ASSIGN)

      if (is_gimple_assign (ustmt))

Usually we use use_stmt instead of ustmt.

> +	{    
> +	  gassign *asgn = as_a <gassign *> (ustmt);

No need for the gassign and ugly as_a, gimple_assign_rhs_code
as well as gimple_assign_rhs2 can be called on gimple * too.

> +	  tree_code code = gimple_assign_rhs_code (asgn);
> +	  if ((code != EQ_EXPR && code != NE_EXPR)
> +	      || !integer_zerop (gimple_assign_rhs2 (asgn)))
> +	    return true;
> +	}
> +      else if (gimple_code (ustmt) == GIMPLE_COND)
> +	{
> +	  tree_code code = gimple_cond_code (ustmt);
> +	  if ((code != EQ_EXPR && code != NE_EXPR)
> +	      || !integer_zerop (gimple_cond_rhs (ustmt)))
> +	    return true;

There is another case you are missing, assign stmt with
gimple_assign_rhs_code COND_EXPR, where gimple_assign_rhs1 is
tree with TREE_CODE EQ_EXPR or NE_EXPR with TREE_OPERAND (rhs1, 1)
integer_zerop.

> +  /* When both arguments are known, and their strlens are unequal, we can 
> +     safely fold the call to a non-zero value for strcmp;
> +     othewise, do nothing now.  */
> +  if (idx1 != 0 && idx2 != 0)
> +    {
> +      HOST_WIDE_INT const_string_leni1 = -1;
> +      HOST_WIDE_INT const_string_leni2 = -1;
> +      const_string_leni1 = compute_string_length (idx1);
> +      const_string_leni2 = compute_string_length (idx2);

Why do you initialize the vars when you immediately overwrite it?
Just do
      HOST_WIDE_INT const_string_leni1 = compute_string_length (idx1);
etc.

> +  /* When one of args is constant string.  */
> +  tree var_string;
> +  HOST_WIDE_INT const_string_leni = -1;
> +  
> +  if (idx1)
> +    {
> +      const_string_leni = compute_string_length (idx1);
> +      var_string = arg2;
> +    } 
> +  else if (idx2)
> +    {
> +      const_string_leni = compute_string_length (idx2);
> +      var_string = arg1;
> +    } 

Haven't you checked earlier that one of idx1 and idx2 is non-zero?
If so, then the else if (idx2) will just might confuse -Wuninitialized,
if you just use else, you don't need to initialize const_string_leni
either.

> +  /* Try to get the min and max string length for var_string, the max length is
> +     the size of the array - 1, recorded in size[1].  */ 
> +  get_range_strlen (var_string, size);
> +  if (size[1] && tree_fits_uhwi_p (size[1]))
> +    var_sizei = tree_to_uhwi (size[1]) + 1;

This is something that looks problematic to me.  get_range_strlen returns
some conservative upper bound on the string length, which is fine if
var_string points to say a TREE_STATIC variable where you know the allocated
size, or automatic variable.  But if somebody passes you a pointer to a
structure and the source doesn't contain aggregate copying for it, not sure
if you can take for granted that all the bytes are readable after the '\0'
in the string.  Hopefully at least for flexible array members and arrays in
such positions get_range_strlen will not provide the upper bound, but even
in other cases it doesn't feel safe to me.

Furthermore, in the comments you say that you do it only for small strings,
but in the patch I can't see any upper bound, so you could transform strlen
that would happen to return say just 1 or 2 with a function call that
possibly reads megabytes of data (memcmp may read all bytes, not just stop
at the first difference).
> +  unsigned HOST_WIDE_INT final_length 
> +    = is_ncmp ? length : (const_string_leni + 1);

Why the ()s?

	Jakub


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]