This is the mail archive of the
mailing list for the GCC project.
Re: Get rid of -trigraphs?
Zack Weinberg wrote:-
> I don't think this is a good idea in general. I think there are lots
> of people out there counting on GCC not converting trigraphs - mainly,
> possibly exclusively - in string constants, but still. Yeah, their
> code is unportable, but I am not convinced we gain much of anything by
> breaking it.
What kind of code would this be? I'd be very surprised if any code
really cared, other than wacky stuff like used to be in the kernel
having excess punctuation at the end of a sentence "(WTF??)" etc.
In such cases outputting a different character is not going to hurt;
it may even make it more effective 8-)
The only case where it might hurt are people doing some kind of
clever shorthand for byte lists with ASCII characters IMO. And
they're asking for it anyway...
> Now, if we can *prove* that there is no C program, that contains a
> trigraph other than ??/ outside a string constant, which has a valid
> stage-7 parse when the trigraph is not converted, and additionally if
> it causes *major* improvements to the simplicity of your planned new
> lexer, then I wouldn't have a problem with silently converting
> those trigraphs outside string constants. I think ??/ outside a
> string constant should still be warned about and ignored by default.
There is no meaning to two '?' in C or C++. That's obvious. The
C++ extension in G++ has the max / min thingy, but then it's two
operators in a row and a syntax error.
This was the whole reason the committee chose the syntax they did:
it couldn't really break anything.
> I think it's not going to be much of a win though. You're still stuck
> with all the complexity of \-newline and ??/-newline.
The win is not having to worry about emitting diagnostics twice if
you're doing a re-scan for some reason, and also not having to defer
diagnostics until the characters in question are used by the lexer (I'm
thinking of a logical line pre-scan pass). You might want to defer for
many reasons: e.g. correct ordering of diagnostics in output, and you
don't know whether you're in a comment or a skipped directive.
It's all or nothing really; if you don't want to completely switch
then there's no point changing anything. Let's leave all this until
someone (not me; I'm fed up with it) does multibyte characters. Whoever
does that may see the issues I came up against.
It's not really worth improving the lexer until there is a concrete
plan in place for multibyte issues; otherwise improvements that work
now may be impractical in mb scenarios.