[PATCH] Add a character size parameter to c_strlen/get_range_strlen

Wed Aug 15 18:30:00 GMT 2018

On 08/15/18 20:13, Jeff Law wrote:
> On 08/15/2018 10:38 AM, Bernd Edlinger wrote:
>> On 08/15/18 18:25, Jeff Law wrote:
>>> On 08/14/2018 04:25 AM, Bernd Edlinger wrote:
>>>> Hi,
>>>>
>>>> this patch is a follow-up to my patch here:
>>>> https://gcc.gnu.org/ml/gcc-patches/2018-07/msg01800.html
>>>>
>>>> Since most calls of c_strlen and get_range_strlen expect
>>>> a string length in bytes of a zero-terminated string, there is
>>>> a need for a new parameter eltsize, which is per default 1,
>>>> but can be used in gimple-ssa-sprintf.c to specify the
>>>> expected character size.
>>>>
>>>> Bootstrapped and reg-tested on x86_64-pc-linux-gnu.
>>>> Is it OK for trunk?
>>>>
>>>>
>>>> Thanks
>>>> Bernd.
>>>>
>>>>
>>>> patch-fix-c-strlen.diff
>>>>
>>>>
>>>> 2018-08-14  Bernd Edlinger  <bernd.edlinger@hotmail.de>
>>>>
>>>> 	* builtins.c (c_strlen): Add new parameter eltsize.
>>>> 	* builtins.h (c_strlen): Adjust prototype.
>>>> 	* expr.c (string_constant): Add new parameter mem_size.
>>>> 	* expr.h (string_constant): Adjust protoype.
>>>> 	* gimple-fold.c (get_range_strlen): Add new parameter eltsize.
>>>> 	* gimple-fold.h (get_range_strlen): Adjust prototype.
>>>> 	* gimple-ssa-sprintf.c (get_string_length): Add new parameter eltsize.
>>>> 	(format_string): Call get_string_length with eltsize.
>>>>
>>>> 2018-08-14  Bernd Edlinger  <bernd.edlinger@hotmail.de>
>>>>
>>>> 	* gcc.dg/strlenopt-49.c: Adjust test case.
>>>> 	* gcc.dg/tree-ssa/builtin-sprintf-warn-18.c: Likewise.
>>>
>>> Your patch appears to pass in an ELTSIZE which is used to determine how
>>> to count.  This has some advantages in that the routine can be used to
>>> count bytes or other elements such as 2 or 4 byte wide characters.
>>>
>>> Your testsuite change appears to remove the xfails in strlenopt-49 which
>>> is usually OK.  So no concerns there.
>>>
>>> The other testsuite change shows that the issued warning has changed.
>>> But we're still issuing a diagnostic the the construct, so it looks
>>> reasonable to me.
>>>
>>>
>>> This seems conceptually independent from the rest of the stuff we need
>>> to work through.  Though there definitely a potential for
>>> implementation/merge conflicts.
>>>
>>> My concern is the sprintf changes -- for the wide char case we'll be
>>> getting a count of wide chars in the string.  I'm not offhand sure if
>>> the rest of the sprintf warning bits are prepared to count that way.
>>>
>>> Martin?
>>>
>>
>> I think the %S issue is an independet thing, I have seen
>> a patch where %S sets the FMT_LEN_l, so I left that one alone.
>>
>> What I have no solution for, is which wide char-type to assume.
>> I used 4, but that is only correct for some O/S. Others have 2.
> I wasn't very clear.  Sorry.
> 
> My concern is that with your change you're asking format_string to count
> in wide character chunks for FMT_LEN_l.  Yet the rest of the sprintf
> pass counts in bytes.  See for example the computation of the
> destination buffer size.  Your change would introduce the exact kind of
> inconsistency it's supposed to avoid.  In fact everything which relies
> on the underlying builtin_object_size infrastructure counts in bytes.
> 
> AFAICT that's the only place where you pass anything other than byte
> requests to c_strlen.  So unless you have other places where you expect
> to be passing in requests to count by 2 or 4 bytes it seems this patch
> is unnecessary and dangerous.
> 
> Am I missing something?
> 

As I understood, c_strlen has been counting single byte strings in single
byte units since ever, and that was changed for the sprintf pass last year
or so, for use in the sprintf pass.

Consider
  sprintf ("%ls", L"abc")

Here c_strlen uses the type info of the STRING_CST L"abc", counts 3
and the strlen pass computes the range info 3 * target_mb_len_max():

/* The likely worst case value of MB_LEN_MAX for the target, large enough
    for UTF-8.  Ideally, this would be obtained by a target hook if it were
    to be used for optimization but it's good enough as is for warnings.  */
#define target_mb_len_max()   6

so the string length 3 * 6 = 18 single byte characters.
The problem is, instead of L"abc" there could be "abc", which
also counts 3,
or

consider:
sprintf("%s" , L"abc")
now since c_strlen does not know, that it is not supposed to count
bytes, it will see 3 wide characters.
and the sprintf will compute 3 * 1, as this time it is an ordinary %s.

So, right now I am not aware of any other place where wide character
string length are needed, except here.

Or am I missing something?

Bernd.