This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Fix not properly nul-terminated string constants in JIT


On 11/2/18 9:40 PM, David Malcolm wrote:
> On Sun, 2018-08-05 at 16:59 +0000, Bernd Edlinger wrote:
>> Hi!
>>
>>
>> My other patch with adds assertions to varasm.c regarding correct
>> nul termination of sting literals did make these incorrect string
>> constants in JIT frontend fail.
>>
>> The string constants are not nul terminated if their length exceeds
>> 200 characters.  The test cases do not use strings of that size where
>> that would make a difference.  But using a fixed index type is
>> clearly
>> wrong.
>>
>> This patch removes the fixed char[200] array type from
>> playback::context,
>> and uses build_string_literal instead of using build_string directly.
>>
>>
>> Bootstrapped and reg-tested on x86_64-pc-linux-gnu.
>> Is it OK for trunk?
> 
> Sorry for the belated response.
> 
> Was this tested with --enable-host-shared and --enable-languages=jit ?
> Note that "jit" is not included in --enable-languages=all.
> 

Yes, of course.  The test suite contains a few string constants, just
all of them are shorter than 200 characters.  But I think removing this
artificial limit enables the existing test cases to test that the
shorter string is in fact zero terminated.

> The patch seems reasonable, but I'm a little confused over the meaning
> of "len" in build_string_literal and build_string: does it refer to the
> length or the size of the string?
> 

build_string_literal:
For languages that use zero-terminated strings, len is strlen(str)+1, and
str is a zero terminated single-byte character string.
For languages that don't use zero-terminated strings, len is the size of
the string and str is not zero terminated.

build_string:
constructs a STRING_CST tree object, which is usable as is in some contexts,
like for asm constraints, but as a string literal it is incomplete, and
needs an index type.  The index type defines the memory size which must
be larger than the string precision.  Excess memory is implicitly cleared.

This means currently all jit strings shorter than 200 characters
are filled with zero up to the limit of 200 chars as imposed by
m_char_array_type_node.  Strings of exactly 200 chars are not zero terminated,
and larger strings should result in an assertion (excess precision was previously
allowed, but no zero termination was appended, when that is not part of
the original string constant).

Previously it was allowed to have memory size less than the string len, which
had complicated the STRING_CST semantics in the middle-end, but with the
string_cst semantic rework I did for gcc-9 this is no longer allowed and
results in (checking) assertions in varasm.c.

>> @@ -617,16 +616,9 @@ playback::rvalue *
>>   playback::context::
>>   new_string_literal (const char *value)
>>   {
>> -  tree t_str = build_string (strlen (value), value);
>> -  gcc_assert (m_char_array_type_node);
>> -  TREE_TYPE (t_str) = m_char_array_type_node;
>> -
>> -  /* Convert to (const char*), loosely based on
>> -     c/c-typeck.c: array_to_pointer_conversion,
>> -     by taking address of start of string.  */
>> -  tree t_addr = build1 (ADDR_EXPR, m_const_char_ptr, t_str);
>> +  tree t_str = build_string_literal (strlen (value) + 1, value);
>>   
>> -  return new rvalue (this, t_addr);
>> +  return new rvalue (this, t_str);
>>   }
> 
> In the above, the call to build_string with strlen is replaced with
> build_string_literal with strlen + 1.
> 
> 
> build_string's comment says:
> 
> "Note that for a C string literal, LEN should include the trailing
> NUL."
> 
> but has:
> 
>    length = len + offsetof (struct tree_string, str) + 1;
> 
> and:
> 
>    TREE_STRING_LENGTH (s) = len;
>    memcpy (s->string.str, str, len);
>    s->string.str[len] = '\0';
> 
> suggesting that the "len" parameter is in fact the length *without* the
> trailing NUL, and that a trailing NUL is added by build_string.
> 

Yes, string constants in tree objects have another zero termiation,
but varasm.c does something different, there the index range takes
precedence.
The index range is built in build_string_literal as follows:

   elem = build_type_variant (char_type_node, 1, 0);
   index = build_index_type (size_int (len - 1));
   type = build_array_type (elem, index);

therefore the string constant hast the type char[0..len-1]
thus only len bytes are significant for code generation, the extra
nul is just for "convenience".

> However build_string_literal has:
> 
>    t = build_string (len, str);
>    elem = build_type_variant (char_type_node, 1, 0);
>    index = build_index_type (size_int (len - 1));
> 
> suggesting that the len is passed directly to build_string (and thus
> ought to be strlen), but the build_index_type uses len - 1 (which
> suggests that len is the size of the string, rather than its length).
> 
> What's the intended meaning of len in these functions?
> 

I hope this helps.


Thanks
Bernd.


> Thanks
> Dave
> 

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]