This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] avoid non-printable characters in diagnostics (c/77620, c/77521)


On 09/09/2016 07:59 AM, Joseph Myers wrote:
On Thu, 8 Sep 2016, Martin Sebor wrote:

PS I used hexadecimal based on what c-format.c does but now that
I checked more carefully how %qE formats string literals I see it
uses octal.  I think hexadecimal is preferable because it avoids
ambiguity but I'm open to changing it to octal if there's a strong

I'm not clear what you mean about ambiguity.  In C strings, an octal
escape sequence has up to three characters, so if it has three characters
it's unambiguous, whereas a hex escape sequence can have any number of
characters, so if the unprintable character is followed by a valid hex
digit then in C you need to represent that as an escape (or use string
constant concatenation, etc.).  The patch doesn't try to do that as far as
I can see.

Now, presumably the output isn't intended to be interpreted as C strings
anyway (if it was, you'd need to escape " and \ as well), so the patch is
OK, but I don't think it avoids ambiguity (and there's a clear case that
it shouldn't - that if the string passed to %qs is printable, it should be
printed as-is even if it contains escape sequences that could also result
from a non-printable string passed to %qs).

Thank you.

I tried to be clear about it in the description of the changes
but I see the PS caused some confusion.  Let me clarify that
the patch has nothing to do with with ambiguity (perceived or
real) in the representation of the escape sequences.  The only
purpose of the change is to avoid printing non-printable
characters or excessively large escape sequences in GCC
diagnostics.

I mentioned the hex vs octal notation to invite input into which
of the two of them people would prefer to see used by the %qc and
qs directives, and whether it's worth considering changing the %qE
directive to use the same notation as well, for consistency (and
to help with readability if there is consensus that one is clearer
than the other).

What I meant by ambiguity is for example a string like "\1234"
where it's not obvious where the octal sequence ends.  Is it '\1'
followed  by "234" or '\12' followed by "34" or '\123' followed
by "4"?  (It's only possible to tell if one knows that GCC always
uses three digits for the octal character, but not everyone knows
that.)  To be clear: I'm talking about the GCC output and not
necessarily about what the standard has to say about it.

In contrast to the octal notation, I find the string "\x1234"
clearer.  It can only mean '\x1' followed by "234" or '\x12'
followed by "34" and I think more people will expect it to be
the latter because representing characters using two hex digits
is more common.  But this is just my own perception and YMMV.

Martin


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]