This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Fix not properly nul-terminated string constants in JIT


Hi Dave,

is the patch OK, or do you still have questions?


Thanks
Bernd.

On 11/2/18 10:48 PM, Bernd Edlinger wrote:
> On 11/2/18 9:40 PM, David Malcolm wrote:
>> On Sun, 2018-08-05 at 16:59 +0000, Bernd Edlinger wrote:
>>> Hi!
>>>
>>>
>>> My other patch with adds assertions to varasm.c regarding correct
>>> nul termination of sting literals did make these incorrect string
>>> constants in JIT frontend fail.
>>>
>>> The string constants are not nul terminated if their length exceeds
>>> 200 characters.  The test cases do not use strings of that size where
>>> that would make a difference.  But using a fixed index type is
>>> clearly
>>> wrong.
>>>
>>> This patch removes the fixed char[200] array type from
>>> playback::context,
>>> and uses build_string_literal instead of using build_string directly.
>>>
>>>
>>> Bootstrapped and reg-tested on x86_64-pc-linux-gnu.
>>> Is it OK for trunk?
>>
>> Sorry for the belated response.
>>
>> Was this tested with --enable-host-shared and --enable-languages=jit ?
>> Note that "jit" is not included in --enable-languages=all.
>>
> 
> Yes, of course.  The test suite contains a few string constants, just
> all of them are shorter than 200 characters.  But I think removing this
> artificial limit enables the existing test cases to test that the
> shorter string is in fact zero terminated.
> 
>> The patch seems reasonable, but I'm a little confused over the meaning
>> of "len" in build_string_literal and build_string: does it refer to the
>> length or the size of the string?
>>
> 
> build_string_literal:
> For languages that use zero-terminated strings, len is strlen(str)+1, and
> str is a zero terminated single-byte character string.
> For languages that don't use zero-terminated strings, len is the size of
> the string and str is not zero terminated.
> 
> build_string:
> constructs a STRING_CST tree object, which is usable as is in some contexts,
> like for asm constraints, but as a string literal it is incomplete, and
> needs an index type.  The index type defines the memory size which must
> be larger than the string precision.  Excess memory is implicitly cleared.
> 
> This means currently all jit strings shorter than 200 characters
> are filled with zero up to the limit of 200 chars as imposed by
> m_char_array_type_node.  Strings of exactly 200 chars are not zero terminated,
> and larger strings should result in an assertion (excess precision was previously
> allowed, but no zero termination was appended, when that is not part of
> the original string constant).
> 
> Previously it was allowed to have memory size less than the string len, which
> had complicated the STRING_CST semantics in the middle-end, but with the
> string_cst semantic rework I did for gcc-9 this is no longer allowed and
> results in (checking) assertions in varasm.c.
> 
>>> @@ -617,16 +616,9 @@ playback::rvalue *
>>>   playback::context::
>>>   new_string_literal (const char *value)
>>>   {
>>> -  tree t_str = build_string (strlen (value), value);
>>> -  gcc_assert (m_char_array_type_node);
>>> -  TREE_TYPE (t_str) = m_char_array_type_node;
>>> -
>>> -  /* Convert to (const char*), loosely based on
>>> -     c/c-typeck.c: array_to_pointer_conversion,
>>> -     by taking address of start of string.  */
>>> -  tree t_addr = build1 (ADDR_EXPR, m_const_char_ptr, t_str);
>>> +  tree t_str = build_string_literal (strlen (value) + 1, value);
>>> -  return new rvalue (this, t_addr);
>>> +  return new rvalue (this, t_str);
>>>   }
>>
>> In the above, the call to build_string with strlen is replaced with
>> build_string_literal with strlen + 1.
>>
>>
>> build_string's comment says:
>>
>> "Note that for a C string literal, LEN should include the trailing
>> NUL."
>>
>> but has:
>>
>>    length = len + offsetof (struct tree_string, str) + 1;
>>
>> and:
>>
>>    TREE_STRING_LENGTH (s) = len;
>>    memcpy (s->string.str, str, len);
>>    s->string.str[len] = '\0';
>>
>> suggesting that the "len" parameter is in fact the length *without* the
>> trailing NUL, and that a trailing NUL is added by build_string.
>>
> 
> Yes, string constants in tree objects have another zero termiation,
> but varasm.c does something different, there the index range takes
> precedence.
> The index range is built in build_string_literal as follows:
> 
>    elem = build_type_variant (char_type_node, 1, 0);
>    index = build_index_type (size_int (len - 1));
>    type = build_array_type (elem, index);
> 
> therefore the string constant hast the type char[0..len-1]
> thus only len bytes are significant for code generation, the extra
> nul is just for "convenience".
> 
>> However build_string_literal has:
>>
>>    t = build_string (len, str);
>>    elem = build_type_variant (char_type_node, 1, 0);
>>    index = build_index_type (size_int (len - 1));
>>
>> suggesting that the len is passed directly to build_string (and thus
>> ought to be strlen), but the build_index_type uses len - 1 (which
>> suggests that len is the size of the string, rather than its length).
>>
>> What's the intended meaning of len in these functions?
>>
> 
> I hope this helps.
> 
> 
> Thanks
> Bernd.
> 
> 
>> Thanks
>> Dave
>>

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]