This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Add a character size parameter to c_strlen/get_range_strlen


On 08/21/2018 03:44 PM, Joseph Myers wrote:
On Tue, 21 Aug 2018, Martin Sebor wrote:

On 08/21/2018 09:44 AM, Joseph Myers wrote:
On Tue, 21 Aug 2018, Martin Sebor wrote:

Sure, but the only valid argument to %ls is wchar_t*.  Passing
it something else is undefined.

Well, (wchar_t *)"something\0\0\0\0" would be OK given
-fno-strict-aliasing and if you know the alignment is OK.  Do we have that
information about the type cast to, as opposed to the type of the string
constant, at this point?

In the simple cases like the one above the cast is gone.  Only
in some more involved cases is the type of the argument preserved.
I responded to Jeff with one such example here:

  https://gcc.gnu.org/ml/gcc-patches/2018-08/msg01296.html

If supporting (wchar_t *)"...\0\0\0\0" with %ls is viewed as
important (despite it being undefined) then the function does

There are different cases of support.  It doesn't need to be highly
optimized or get particularly good diagnostics.  It does need to avoid
being miscompiled or getting actively incorrect diagnostics.

Sure.

Given -fno-strict-aliasing and appropriate alignment, it's not undefined.
(To ensure appropriate alignment one might use a target where
BIGGEST_ALIGNMENT is 8, or one where it is 16 and a char16_t[] string
constant is cast to pointer to 32-bit wchar_t.)

I was referring to the undefined behavior according to the letter
of the C standard which says about %ls:

  If an l length modifier is present, the argument shall be
  a pointer to the initial element of an array of wchar_t type.

Despite the cast, (wchar_t*)"...\0\0\0" is an array of char,
not one of wchar_t type.  The same would be true if the cast
were to void* or any other object pointer type other that
wchar_t.

But now we're in a language lawyer territory and I'm pretty
sure you were making a point about the library call having
the intuitive semantics of interpreting the narrow string
representation as an array of wchar_t.

In any case, I have posted a prototype to issue a warning for
these mismatches as has been suggested that should have these
same semantics, or could easily be tweaked to achieve them if
or where it doesn't.  To goal is to show the approach I had
in mind to make good diagnostics possible.  I'm open to
fine-tuning the details.

Martin


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]