This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Check the STRING_CSTs in varasm.c


On 08/22/2018 08:27 AM, Bernd Edlinger wrote:
Hi,


this is an updated version of my STRING_CSTs checking in varasm.c
patch.

It tries to answer the questions that were raised in th GO string literals
thread.

The answers are:
a) strings with TYPE_SIZE_UNIT == NULL do exist, but only for STRING_CSTs
in constructors of flexible array struct members.  Not for string literals.
b) In all cases where the STRING_CSTs do have a TYPE_SIZE_UNIT, the
DECL_SIZE_UNIT has the same value.
c) When STRING_CSTs do not have a TYPE_SIZE_UNIT, that is in the constructor
of a flexible array member.  In that case the TREE_STRING_LENGTH
determines the flexible array size.


It changes varasm's get_constant_size to return the TYPE_SIZE_UNIT of
a STRING_CST literal as it is, without increasing the storage size
to TREE_STRING_LENGTH.  I added an assertion to make sure that
all STRING_CSTs have a type size; size == 0 can happen for empty Ada
strings.

Furthermore it adds code to compare_constant to also compare the
STRING_CSTs TYPE_SIZE_UNIT if available.

So I want to remove that from get_constant_size in order to not change
the memory layout of GO and Ada strings, while still having them look
mostly like C string literals.

Furthermore I added one more consistency check to check_string_literal,
that makes sure that all STRING_CSTs that do have a TYPE_SIZE_UNIT,
the size matches the DECL_SIZE_UNIT.

Those newly discovered properties of string literals make the code in
c_strlen and friends a lot simpler.



Bootstrapped and reg-tested on x86_64-pc-linux-gnu.
Is it OK for trunk?


@@ -1162,9 +1162,13 @@ These nodes represent string-constants.

 returns the length of the string, as an @code{int}.  The

 @code{TREE_STRING_POINTER} is a @code{char*} containing the string

 itself.  The string may not be @code{NUL}-terminated, and it may contain

-embedded @code{NUL} characters.  Therefore, the

-@code{TREE_STRING_LENGTH} includes the trailing @code{NUL} if it is

-present.

+embedded @code{NUL} characters.  However, the

+@code{TREE_STRING_LENGTH} always includes a trailing @code{NUL} that

+is not part of the language string literal but appended by the front end.

+If the string shall not be @code{NUL}-terminated the @code{TREE_TYPE}


If the string is not NUL-terminated (not shall not be -- that
makes it sound like it must not be nul-terminated).

+is one character shorter than @code{TREE_STRING_LENGTH}.


Presumably the phrase "the @code{TREE_TYPE} is shorter" means
that the type of the string is an array whose domain is
[0, TREE_STRING_LENGTH - 1], or something like that.  It would
help to make this clear in a less informal way, especially if
not all STRING_CST types have a domain (sounds like some don't
if they have a null TYPE_SIZE_UNIT).

+Excess characters other than one trailing @code{NUL} character are not

+permitted.



Does this mean that they can be counted on not to exist
because the front ends make sure they don't and the middle
end doesn't create them, or that should not be created but
that they might still exist?

You also mentioned a lot of detail in your answers above that
isn't captured here.  Can that be added?  E.g., the part about
TYPE_SIZE_UNIT of a STRING_CST being allowed to be null seems
especially important, as does the bit about flexible array
members.

Martin


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]