Bug 48740 - Raw C++0x strings and trigraphs mix badly
Summary: Raw C++0x strings and trigraphs mix badly
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: preprocessor (show other bugs)
Version: 4.6.1
: P3 normal
Target Milestone: 4.6.1
Assignee: Jakub Jelinek
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-04-23 09:16 UTC by Kay Hayen
Modified: 2011-04-26 10:39 UTC (History)
2 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2011-04-23 10:29:36


Attachments
One liner with constant demonstrating the error. (56 bytes, text/x-c++src)
2011-04-23 09:16 UTC, Kay Hayen
Details
Sorry for the typo, that's what you get for extracting test case. This one is better. (57 bytes, text/x-c++src)
2011-04-23 11:05 UTC, Kay Hayen
Details
gcc46-pr48740.patch (632 bytes, patch)
2011-04-23 12:34 UTC, Jakub Jelinek
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Kay Hayen 2011-04-23 09:16:03 UTC
Created attachment 24078 [details]
One liner with constant demonstrating the error.

Hello there,

my file has just one line:

char const *R = "raw(foo%sbar%sfred%sbob?????)raw";

This is supposed to work, but gives:

g++-4.6 -c -std=c++0x -Wall -Werror something.build/__constants.cpp 
something.build/__constants.cpp:2:44: error: trigraph ??) converted to ] [-Werror=trigraphs]
cc1plus: all warnings being treated as errors

Adding a space before the closing ")" of the raw string gives me no problem at all. I have had instances where the compiler complained about an unterminated strings constant after shuffling raw strings around. Having multiple raw strings in my file, separated by raw()raw each, it seems, I had the case, where a raw constant would eat up towards the next, very unpleasant.

Question: Are trigraphs still in C++0x, are they supposed to be applied to raw strings too? How can I disable trigraphs. I only found an option to enable it, and a reference that -std would.

Note: I really want to use raw strings to encode arbitrary data for my Python compiler. I truly need a way to avoid trigraphs. So please remove trigraphs from the C++0x case, or add an option to disable them.

Yours,
Kay Hayen
Comment 1 Kay Hayen 2011-04-23 09:17:04 UTC
Just to note: I found this bug with 4.5.2 originally. I then used 4.6 only to check that it wasn't solved in the mean time.
Comment 2 Daniel Krügler 2011-04-23 10:22:27 UTC
(In reply to comment #1)
> Just to note: I found this bug with 4.5.2 originally. I then used 4.6 only to
> check that it wasn't solved in the mean time.

Please always ensure that the defect still exists in the most recent compiler version, which is 4.7. I verified that 4.7.0 20110422 (experimental) still contains this warning.

In regard to your questions: Standard C++0x does still contain trigraphs, so a conforming compiler has to handle them correctly. But any trigraph transformations that happened within a raw string shall be reverted afterwards, es clearly expressed in [lex.pptoken] p. 3 b. 1:

"If the next character begins a sequence of characters that could be the prefix and initial double quote of a raw string literal, such as R", the next preprocessing token shall be a raw string literal. Between the initial and final double quote characters of the raw string, any transformations performed in phases 1 and 2 (trigraphs, universal-character-names, and line splicing) are reverted; this reversion shall apply before any d-char, r-char, or delimiting parenthesis is identified."

The most important aspect is to test that this is not just a warning, but that trigraph transformations have indeed been performed. I can confirm this to happen with gcc 4.7 200110409 (4.7.0 20110422 version cannot produce executable programs), so this is clearly not only a false warning.
Comment 3 Jonathan Wakely 2011-04-23 10:29:36 UTC
confirmed
Comment 4 Jakub Jelinek 2011-04-23 10:55:24 UTC
"raw(foo%sbar%sfred%sbob?????)raw" is not a raw string literal,
R"raw(foo%sbar%sfred%sbob?????)raw" is, and that one doesn't warn or error and is handled correctly.
Comment 5 Daniel Krügler 2011-04-23 10:58:44 UTC
(In reply to comment #4)
> "raw(foo%sbar%sfred%sbob?????)raw" is not a raw string literal,
> R"raw(foo%sbar%sfred%sbob?????)raw" is, and that one doesn't warn or error and
> is handled correctly.

<blush> you are right, I completely missed the lack of the R prefix here
Comment 6 Jonathan Wakely 2011-04-23 11:05:18 UTC
Oops! thanks, Jakub!
Comment 7 Kay Hayen 2011-04-23 11:05:52 UTC
Created attachment 24079 [details]
Sorry for the typo, that's what you get for extracting test case. This one is better.

With this one I can reproduce my original worry indeed:

[~]> g++ -c --std=c++0x testcase.cpp -Wall -Werror
testcase.cpp:2:17: error: unterminated raw string
testcase.cpp:2:15: error: expected primary-expression at end of input
testcase.cpp:2:15: error: expected ‘,’ or ‘;’ at end of input

Since the trigraph eats up the trigraph terminator, it's not terminated. Now imagine a second constant in the same file and you do get a corrupted output.
Comment 8 Daniel Krügler 2011-04-23 11:14:01 UTC
(In reply to comment #7)
> Created attachment 24079 [details]
> Sorry for the typo, that's what you get for extracting test case. This one is
> better.

I cannot reproduce this warning with gcc 4.7. If I create a program with observable behaviour like this:

#include <cstdio>

char const *R = R"raw(foo%sbar%sfred%sbob?????)raw";

int main()
{
  std::printf("%s\n", R);
}

the output is loss-free and whithout remaining trigraph transformations:

foo%sbar%sfred%sbob?????
Comment 9 Kay Hayen 2011-04-23 11:24:54 UTC
On 4.6 it exists though, I pasted your code to a new file:

-----

[~]> g++-4.6 -c --std=c++0x testcase.cpp -Wall -Werror
testcase.cpp:3:17: error: unterminated raw string
testcase.cpp:3:15: error: expected primary-expression at end of input
testcase.cpp:3:15: error: expected ‘,’ or ‘;’ at end of input

[~]> cat testcase.cpp 
#include <cstdio>

char const *R = R"raw(foo%sbar%sfred%sbob?????)raw";

int main()
{
  std::printf("%s\n", R);
}

[~]> g++-4.6 -v
Using built-in specs.
COLLECT_GCC=/usr/bin/g++-4.6
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/4.6.1/lto-wrapper
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Debian 4.6.0-2' --with-bugurl=file:///usr/share/doc/gcc-4.6/README.Bugs --enable-languages=c,c++,fortran,objc,obj-c++,go --prefix=/usr --program-suffix=-4.6 --enable-shared --enable-multiarch --enable-linker-build-id --with-system-zlib --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --with-gxx-include-dir=/usr/include/c++/4.6 --libdir=/usr/lib --enable-nls --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-plugin --enable-gold --enable-ld=default --with-plugin-ld --enable-objc-gc --with-arch-32=i586 --with-tune=generic --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu
Thread model: posix
gcc version 4.6.1 20110329 (prerelease) (Debian 4.6.0-2) 

----------

Yours,
Kay
Comment 10 Jonathan Wakely 2011-04-23 11:36:03 UTC
I get no error with gcc version 4.6.0 20110419 (Red Hat 4.6.0-5)
(I can't test vanilla 4.6 until I finish reinstalling my pc)
Comment 11 Jakub Jelinek 2011-04-23 11:55:24 UTC
I get no error with 4.6.1 20110415 (prerelease), nor current trunk, nor 4.6.0 20101005 (experimental), and all the way back to
http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=157804
What doesn't work though is -save-temps, apparently
it is preprocessed as
# 1 "q.C"
# 1 "<built-in>"
# 1 "<command-line>"
# 1 "q.C"
char const *R = R"raw(foo%sbar%sfred%sbob?????]raw";
with ] instead of ), so that is something we should fix.  Probably you are using ccache and failed to mention that important thing...
Comment 12 Jakub Jelinek 2011-04-23 12:34:04 UTC
Created attachment 24080 [details]
gcc46-pr48740.patch

Untested fix.
Comment 13 Kay Hayen 2011-04-23 15:10:32 UTC
Oh, it occured to me something that I could test:

[~]> g++-4.6 -c --std=c++0x testcase.cpp -Wall -Werror
testcase.cpp:3:17: error: unterminated raw string
testcase.cpp:3:15: error: expected primary-expression at end of input
testcase.cpp:3:15: error: expected ‘,’ or ‘;’ at end of input
[~]> /usr/bin/g++-4.6 -c --std=c++0x testcase.cpp -Wall -Werror
[~]> which g++-4.6
/usr/local/bin//g++-4.6
[~]> ls -l /usr/local/bin//g++-4.6
lrwxrwxrwx 1 root staff 15  7. Jan 15:12 /usr/local/bin//g++-4.6 -> /usr/bin/ccache

So, it's a bug of ccache or the way it interacts with g++

How on earth exactly do I apologize. In no way had I thought it possible that ccache would ever cause a problem, not was I aware or near sure if it is used at all. If you close this bug as invalid, I will take it to ccache upstream or Debian.

Yours,
Kay Hayen
Comment 14 Jakub Jelinek 2011-04-23 23:32:12 UTC
Author: jakub
Date: Sat Apr 23 23:32:09 2011
New Revision: 172903

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=172903
Log:
	PR preprocessor/48740
	* lex.c (lex_raw_string): When raw string ends with
	??) followed by raw prefix and ", ensure it is preprocessed
	with ??) rather than ??].

	* c-c++-common/raw-string-11.c: New test.

Added:
    trunk/gcc/testsuite/c-c++-common/raw-string-11.c
Modified:
    trunk/gcc/testsuite/ChangeLog
    trunk/libcpp/ChangeLog
    trunk/libcpp/lex.c
Comment 15 Jakub Jelinek 2011-04-23 23:33:42 UTC
Author: jakub
Date: Sat Apr 23 23:33:39 2011
New Revision: 172904

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=172904
Log:
	PR preprocessor/48740
	* lex.c (lex_raw_string): When raw string ends with
	??) followed by raw prefix and ", ensure it is preprocessed
	with ??) rather than ??].

	* c-c++-common/raw-string-11.c: New test.

Added:
    branches/gcc-4_6-branch/gcc/testsuite/c-c++-common/raw-string-11.c
Modified:
    branches/gcc-4_6-branch/gcc/testsuite/ChangeLog
    branches/gcc-4_6-branch/libcpp/ChangeLog
    branches/gcc-4_6-branch/libcpp/lex.c
Comment 16 Jakub Jelinek 2011-04-26 10:04:22 UTC
Author: jakub
Date: Tue Apr 26 10:04:18 2011
New Revision: 172956

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=172956
Log:
2011-04-26  Jakub Jelinek  <jakub@redhat.com>

	Backported from mainline
	2011-04-24  Jakub Jelinek  <jakub@redhat.com>

	PR preprocessor/48740
	* lex.c (lex_raw_string): When raw string ends with
	??) followed by raw prefix and ", ensure it is preprocessed
	with ??) rather than ??].

	* c-c++-common/raw-string-11.c: New test.

Added:
    branches/gcc-4_5-branch/gcc/testsuite/c-c++-common/raw-string-11.c
Modified:
    branches/gcc-4_5-branch/gcc/testsuite/ChangeLog
    branches/gcc-4_5-branch/libcpp/ChangeLog
    branches/gcc-4_5-branch/libcpp/lex.c
Comment 17 Jakub Jelinek 2011-04-26 10:39:43 UTC
Fixed.