This is the mail archive of the mailing list for the GCC project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Check the STRING_CSTs in varasm.c

On Fri, 17 Aug 2018, Bernd Edlinger wrote:

> Richard Biener wrote:
> > Note that I'm a little bit confused here given build_string
> > already appends a '\0' after TREE_STRING_LENGTH.  So it is safe
> > to call strlen() on all STRING_CSTs.  I think I raised this in
> > the review of one of Bernds patches but to be honest all the
> > various threads have become a bit convoluted (esp. if joining
> > in after two weeks of vacation).
> >
> > So -- why is what we do currently not enough?
> >
> > Quoting our single STRING_CST creator:
> >
> > tree
> > build_string (int len, const char *str)
> > {
> >   tree s;
> >   size_t length;
> >
> >  /* Do not waste bytes provided by padding of struct tree_string.  */
> >  length = len + offsetof (struct tree_string, str) + 1;
> >
> >  record_node_allocation_statistics (STRING_CST, length);
> >
> >  s = (tree) ggc_internal_alloc (length);
> >
> >  memset (s, 0, sizeof (struct tree_typed));
> >  TREE_CONSTANT (s) = 1;
> >  TREE_STRING_LENGTH (s) = len;
> >  memcpy (s->string.str, str, len);
> >  s->string.str[len] = '\0';
> >
> >  return s;
> >}
> Thanks for this question.
> I have two issues with that.
> 1. it adds not a wide character nul, only a single byte '\0'.
> 2. the '\0' here is _guaranteed_ not to be assembled
>     by varasm.
>    Since it is at TREE_STRING_LENGTH+1.
> That is fine for some string constants, like ASM constaraints.
> it makes most of the time sense, as long as it is not
> used for folding of memxxx functions.
> Of course the whole patch series is dependent on the
> consensus about how we want to handle string constants
> in the middle-end, so no problem with going back to the
> drawing board, that's the reason why I did not apply the
> already approved bits, and I guess we are not in a hurry.
> What I see, is that string constants are most of the time
> handled like the C language strings, that is there
> are different character wides, and the array_type on the
> string constants, includes another NUL (wide) char which
> is assembled along with the rest of the bytes, but it is
> not written in the language string constant.
> The front end appends this before giving the string
> constant to build_string.
> That means at least most of the time.
> So I thought it might be good to have some more
> easily checkable things regarding the zero termiation,
> or what is allowed for excess precision.
> It's possible that this shoots over the target, but I think
> this hardening in the varasm is of some value.
> This way how this patch works has one advantage:
> That it is easy to check for "there must be one wide char zero",
> if it is someting that can't be checked, like:
> "there may be a nul or not", then it is impossible to be checked.

What I most object to is the fact that this makes the requirement
of TREE_TYPE of a STRING_CST to be an ARRAY_TYPE a hard one
while the ARRAY_TYPE doesn't add any useful information.  Those
array types are quite heavy structures while the only thing
we really need is the type of the elements.  We could improve
memory use greatly here by using charT[] as TREE_TYPE of

Note that not all STRING_CSTs do get a type (probably all that
end up in the IL do).

So, why not add a TREE_STRING_NUL_TERMINATED flag instad that
is true when the string is NUL-terminated within TREE_STRING_LENGTH
and is false when we do not know (to make the transition easy).


> Well yeah, it's an idea.
> Bernd.

Richard Biener <>
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]