This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug c++/77573] bogus wide string literals in diagnostics
- From: "dmalcolm at gcc dot gnu.org" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: Thu, 15 Dec 2016 21:17:29 +0000
- Subject: [Bug c++/77573] bogus wide string literals in diagnostics
- Auto-submitted: auto-generated
- References: <bug-77573-4@http.gcc.gnu.org/bugzilla/>
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77573
David Malcolm <dmalcolm at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |dmalcolm at gcc dot gnu.org
--- Comment #1 from David Malcolm <dmalcolm at gcc dot gnu.org> ---
http://en.cppreference.com/w/cpp/language/escape says:
"Hexadecimal escape sequences have no length limit and terminate at the first
character that is not a valid hexadecimal digit."
These are 4-byte wchars, so the value fits.
emit_numeric_escape is called twice, once with 0x12345678, then with 0 for the
implicit terminator.
(gdb) p tbuf
$45 = {text = 0x23e77f0 "xV4\022", asize = 256, len = 8}
(gdb) p tbuf->text[0]
$37 = 120 'x'
(gdb) p tbuf->text[1]
$38 = 86 'V'
(gdb) p tbuf->text[2]
$39 = 52 '4'
(gdb) p tbuf->text[3]
$40 = 18 '\022'
Note that "xV4\022" is 0x12345678:
(gdb) p /x tbuf->text[0]
$46 = 0x78
(gdb) p /x tbuf->text[1]
$47 = 0x56
(gdb) p /x tbuf->text[2]
$48 = 0x34
(gdb) p /x tbuf->text[3]
$49 = 0x12
...and then the terminator:
(gdb) p tbuf->text[4]
$41 = 0 '\000'
(gdb) p tbuf->text[5]
$42 = 0 '\000'
(gdb) p tbuf->text[6]
$43 = 0 '\000'
(gdb) p tbuf->text[7]
$44 = 0 '\000'
So I think that the sequence that's printed is valid.
If I'm reading the following right, internally it's stored as a conversion of a
one-byte-per-char array string to a wchar_t:
(gdb) call debug_tree(t)
<convert_expr 0x7ffff1a2b5c0
type <integer_type 0x7ffff18d5690 wchar_t type_6 SI
size <integer_cst 0x7ffff18cd0d8 constant 32>
unit size <integer_cst 0x7ffff18cd0f0 constant 4>
align 32 symtab 0 alias set -1 canonical type 0x7ffff18d5690 precision
32 min <integer_cst 0x7ffff18cd468 -2147483648> max <integer_cst 0x7ffff18cd480
2147483647>>
readonly constant
arg 0 <nop_expr 0x7ffff1a2b5a0
type <pointer_type 0x7ffff1a17f18 type <integer_type 0x7ffff1a17bd0
wchar_t>
unsigned DI
size <integer_cst 0x7ffff18abe88 constant 64>
unit size <integer_cst 0x7ffff18abea0 constant 8>
align 64 symtab 0 alias set -1 canonical type 0x7ffff1a17f18>
readonly constant
arg 0 <addr_expr 0x7ffff1a2b580 type <pointer_type 0x7ffff1a17a80>
readonly constant
arg 0 <string_cst 0x7ffff1a2b560 type <array_type 0x7ffff1a17e70>
readonly constant static "xV4\022\000\000\000\000">>>>
(gdb) call debug_tree((tree)0x7ffff1a2b560)
<string_cst 0x7ffff1a2b560
type <array_type 0x7ffff1a17e70
type <integer_type 0x7ffff1a17bd0 wchar_t readonly type_6 SI
size <integer_cst 0x7ffff18cd0d8 constant 32>
unit size <integer_cst 0x7ffff18cd0f0 constant 4>
align 32 symtab 0 alias set -1 canonical type 0x7ffff1a17bd0
precision 32 min <integer_cst 0x7ffff18cd468 -2147483648> max <integer_cst
0x7ffff18cd480 2147483647>
pointer_to_this <pointer_type 0x7ffff1a17f18>>
DI
size <integer_cst 0x7ffff18abe88 constant 64>
unit size <integer_cst 0x7ffff18abea0 constant 8>
align 32 symtab 0 alias set -1 canonical type 0x7ffff1a17e70
domain <integer_type 0x7ffff1a17c78 type <integer_type 0x7ffff18ca000
sizetype>
type_6 DI size <integer_cst 0x7ffff18abe88 64> unit size
<integer_cst 0x7ffff18abea0 8>
align 64 symtab 0 alias set -1 canonical type 0x7ffff1a17c78
precision 64 min <integer_cst 0x7ffff18abeb8 0> max <integer_cst 0x7ffff18abf90
1>>
pointer_to_this <pointer_type 0x7ffff1a17a80>>
readonly constant static "xV4\022\000\000\000\000">
The title of this bug is "bogus wide string literals in diagnostics", but the
diagnostic contains a regular string literal, not a wide string literal.
Perhaps we should be printing it as something like;
L"\x12345678\x00"
or somesuch, for such cases.
FWIW, compare with this:
z.C:1:23: error: invalid conversion from ‘const wchar_t*’ to ‘wchar_t’
[-fpermissive]
constexpr wchar_t s = L"pqrstuvw";
^~~~~~~~~~~
z.C:1:23: error: ‘(wchar_t)((const
wchar_t*)"p\000\000\000q\000\000\000r\000\000\000s\000\000\000t\000\000\000u\000\000\000v\000\000\000w\000\000\000\000\000\000")’
is not a constant expression