This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug c++/77573] bogus wide string literals in diagnostics


https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77573

David Malcolm <dmalcolm at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |dmalcolm at gcc dot gnu.org

--- Comment #1 from David Malcolm <dmalcolm at gcc dot gnu.org> ---
http://en.cppreference.com/w/cpp/language/escape says:
"Hexadecimal escape sequences have no length limit and terminate at the first
character that is not a valid hexadecimal digit."

These are 4-byte wchars, so the value fits.

emit_numeric_escape is called twice, once with 0x12345678, then with 0 for the
implicit terminator.

(gdb) p tbuf
$45 = {text = 0x23e77f0 "xV4\022", asize = 256, len = 8}

(gdb) p tbuf->text[0]
$37 = 120 'x'
(gdb) p tbuf->text[1]
$38 = 86 'V'
(gdb) p tbuf->text[2]
$39 = 52 '4'
(gdb) p tbuf->text[3]
$40 = 18 '\022'

Note that "xV4\022" is 0x12345678:

(gdb) p /x tbuf->text[0]
$46 = 0x78
(gdb) p /x tbuf->text[1]
$47 = 0x56
(gdb) p /x tbuf->text[2]
$48 = 0x34
(gdb) p /x tbuf->text[3]
$49 = 0x12

...and then the terminator:

(gdb) p tbuf->text[4]
$41 = 0 '\000'
(gdb) p tbuf->text[5]
$42 = 0 '\000'
(gdb) p tbuf->text[6]
$43 = 0 '\000'
(gdb) p tbuf->text[7]
$44 = 0 '\000'

So I think that the sequence that's printed is valid.

If I'm reading the following right, internally it's stored as a conversion of a
one-byte-per-char array string to a wchar_t:

(gdb) call debug_tree(t)
 <convert_expr 0x7ffff1a2b5c0
    type <integer_type 0x7ffff18d5690 wchar_t type_6 SI
        size <integer_cst 0x7ffff18cd0d8 constant 32>
        unit size <integer_cst 0x7ffff18cd0f0 constant 4>
        align 32 symtab 0 alias set -1 canonical type 0x7ffff18d5690 precision
32 min <integer_cst 0x7ffff18cd468 -2147483648> max <integer_cst 0x7ffff18cd480
2147483647>>
    readonly constant
    arg 0 <nop_expr 0x7ffff1a2b5a0
        type <pointer_type 0x7ffff1a17f18 type <integer_type 0x7ffff1a17bd0
wchar_t>
            unsigned DI
            size <integer_cst 0x7ffff18abe88 constant 64>
            unit size <integer_cst 0x7ffff18abea0 constant 8>
            align 64 symtab 0 alias set -1 canonical type 0x7ffff1a17f18>
        readonly constant
        arg 0 <addr_expr 0x7ffff1a2b580 type <pointer_type 0x7ffff1a17a80>
            readonly constant
            arg 0 <string_cst 0x7ffff1a2b560 type <array_type 0x7ffff1a17e70>
                readonly constant static "xV4\022\000\000\000\000">>>>


(gdb) call debug_tree((tree)0x7ffff1a2b560)
 <string_cst 0x7ffff1a2b560
    type <array_type 0x7ffff1a17e70
        type <integer_type 0x7ffff1a17bd0 wchar_t readonly type_6 SI
            size <integer_cst 0x7ffff18cd0d8 constant 32>
            unit size <integer_cst 0x7ffff18cd0f0 constant 4>
            align 32 symtab 0 alias set -1 canonical type 0x7ffff1a17bd0
precision 32 min <integer_cst 0x7ffff18cd468 -2147483648> max <integer_cst
0x7ffff18cd480 2147483647>
            pointer_to_this <pointer_type 0x7ffff1a17f18>>
        DI
        size <integer_cst 0x7ffff18abe88 constant 64>
        unit size <integer_cst 0x7ffff18abea0 constant 8>
        align 32 symtab 0 alias set -1 canonical type 0x7ffff1a17e70
        domain <integer_type 0x7ffff1a17c78 type <integer_type 0x7ffff18ca000
sizetype>
            type_6 DI size <integer_cst 0x7ffff18abe88 64> unit size
<integer_cst 0x7ffff18abea0 8>
            align 64 symtab 0 alias set -1 canonical type 0x7ffff1a17c78
precision 64 min <integer_cst 0x7ffff18abeb8 0> max <integer_cst 0x7ffff18abf90
1>>
        pointer_to_this <pointer_type 0x7ffff1a17a80>>
    readonly constant static "xV4\022\000\000\000\000">

The title of this bug is "bogus wide string literals in diagnostics", but the
diagnostic contains a regular string literal, not a wide string literal.

Perhaps we should be printing it as something like;

L"\x12345678\x00"

or somesuch, for such cases.

FWIW, compare with this:

z.C:1:23: error: invalid conversion from ‘const wchar_t*’ to ‘wchar_t’
[-fpermissive]
 constexpr wchar_t s = L"pqrstuvw";
                       ^~~~~~~~~~~
z.C:1:23: error: ‘(wchar_t)((const
wchar_t*)"p\000\000\000q\000\000\000r\000\000\000s\000\000\000t\000\000\000u\000\000\000v\000\000\000w\000\000\000\000\000\000")’
is not a constant expression

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]