[Fwd: Patch for Recovery from Bad Multibyte Characters]
Wed Jun 2 08:23:00 GMT 1999
Jason Merrill wrote:
> > Perhaps the test needs to be reclassified. The warnings are legitemate
> > on the platforms on which they occur.
> Legitimate? How is it legitimate to complain about using a high-ASCII
> value in normal code when LANG is unset? What's wrong with
> char *b = "a\352";
What the standard requires is that the compiler detect multibyte character in
string literals, char literals and comments "as if" mblen were called as
defined by the host locale. The use of LANG is one way in which the host locale
is commonly defined, but it is not the exclusive method. Other variables are
also used and some locales may support multibyte characters by default. This
would appear to be the case on the platform on which you are testing. Perhaps I
should debug the testcase on that platform to make sure all is behaving as
So, if the compiler is configured to support multibyte characters and your
testcase is compiled in a locale in which the byte \352 is the first byte of a
multibyte character, then either the terminating quote will be (correctly)
consumed as part of a valid multibyte character, or (as is the case here) the
terminating quote is an invalid second byte and a warning is generated. My
patch fixes the problem where the terminating quote was (incorrectly) consumed
in the second scenario, causing further errors in parsing.
Note that we're talking about \352 coded directly into the source as is the
case in the testsuite. In order to portably code the byte \352 in all locales,
a program should use the octal (or hex) escape sequence. Each escape sequence
specifies a single character in the target character set.
> > What's the proper way to classify a test that may or may not emit
> > warnings depending on the platform and/or configuration?
> I'd say mark it as conditional XFAIL and note in the comments that it's not
> actually a failure.
Sounds right to me.
More information about the Gcc-patches