This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Make GO string literals properly NUL terminated


>> > The change to have all STRING_CSTs NUL terminated (but that NUL
>> > termination not necessarily inclided in STRING_LENGTH) is a good
>> > one.
>> >
>> > I'm not sure how we can reliably verify NUL termination after the
>> > fact though and build_string already makes sure to NUL terminate
>> > STRING_CSTs.  So if GO strings are not NUL terminated then
>> > the STRING_CSTs still are.
>>
>> The advantage is that there are less variations how string literals look
>> like in the middle end.  We will have a simple way to determine if
>> a string literal is NUL terminated or not.  And checking that property
>> in varasm.c is exactly the right thing to do.
>>
>> String literals always have an array_type which may be shorter
>> than TREE_STRING_LENGTH, but that chops off only exactly
>> one wide character nul. Otherwise if the array_type is equal or larger,
>> we know for sure the value is nul terminated. In the middle-end
>> we can easily determine if a string is not NUL terminated by:
>>
>> compare_tree_int (TYPE_SIZE_UNIT (TREE_TYPE (init)),
>>                        TREE_STRING_LENGTH (init)) < 0
>>
>> I did use that already in my patch for pr86711.
>
> Hmm.  How does that tell you whether the string is NUL terminated?
> TREE_STRING_LENGTH includes the NUL termination and if it is
> a significant char then TYPE_SIZE_UNIT should as well.

I debugged that code a lot recently.
static const char x[2] = "ab"
gives a TREE_STRING_LENGTH of 3, the TREE_TYPE of that
beast is an array type for char[2]. and TYPE_SIZE_UNIT = 2.

An ordinary C string "ab" has TYPE_SIZE_UNIT(TREE_TYPE(x)) = 3.

Of course with wide caracher strings the TREE_STING_LENGTH
and TYPE_SIZE_UNIT of the ARRAY_TYPE are multiple of
the used wide character type.

So I would like to be able to assume that the STRING_CST objects
are internally always generated properly by the front end.
And that the ARRAY_TYPE of the string literal either has the
same length than the TREE_STRING_LENGTH or if it is shorter,
this is always exactly one (wide) character size less than TREE_STRING_LENGTH

The idea is to use this property of string literals where needed,
and check rigorously in varasm.c.

Does that make sense?

>
> Isn't a proper test to look at TREE_STRING_POINTER[TREE_STRING_LENGTH - 1]
> (for HOST_CHAR_BITS strings)?
>

There are also wide character strings, and all those test are broken everywhere
for wide characters right now.

Therefore checking the string constants at varasm.c revealed a lot of intersting
things.
I will post several patches in the afternoon.

> Relying on the type here looks somewhat fragile to me.
>
> Abstracting a string_cst_nul_terminated_p () helper would be a good
> idea I guess.

Yes. indeed.

> I realize using strlen(TREE_STRING_POINTER) doesn't work because
> of embedded NULs.
>
>> Additionally not having oversize string constants produced
>> by the front ends, where the extra characters are good for nothing,
>> also helps to improve correctness.
>>


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]