Bug 110341 - [C++26] P1854R4 - Making non-encodable string literals ill-formed
Summary: [C++26] P1854R4 - Making non-encodable string literals ill-formed
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: c++ (show other bugs)
Version: 14.0
: P3 normal
Target Milestone: ---
Assignee: Jakub Jelinek
URL:
Keywords:
Depends on:
Blocks: c++26-core
  Show dependency treegraph
 
Reported: 2023-06-21 16:08 UTC by Marek Polacek
Modified: 2023-11-14 17:33 UTC (History)
1 user (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2023-06-21 00:00:00


Attachments
gcc14-pr110341.patch (2.37 KB, patch)
2023-08-25 15:49 UTC, Jakub Jelinek
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Marek Polacek 2023-06-21 16:08:14 UTC
See <https://wg21.link/P1854R4>.
Comment 1 Andrew Pinski 2023-06-21 22:37:25 UTC
I don't think there is anything to do for this paper:
`GCC exposes the same behavior(the one proposed by this paper) in all language modes.`
Comment 2 Marek Polacek 2023-06-21 22:48:53 UTC
It's possible/likely; I haven't actually read the paper yet.  We want to add any possible testcases.  I should be able to address this by the end of next week if nobody beats me to it.
Comment 3 Jakub Jelinek 2023-08-25 15:49:45 UTC
Created attachment 55795 [details]
gcc14-pr110341.patch

Untested implementation.
Comment 4 GCC Commits 2023-11-14 17:32:00 UTC
The master branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>:

https://gcc.gnu.org/g:194825f20619a1c4b51eaea84f20432fefc0db03

commit r14-5454-g194825f20619a1c4b51eaea84f20432fefc0db03
Author: Jakub Jelinek <jakub@redhat.com>
Date:   Tue Nov 14 18:28:34 2023 +0100

    c++: Implement C++26 P1854R4 - Making non-encodable string literals ill-formed [PR110341]
    
    This paper voted in as DR makes some multi-character literals ill-formed.
    'abcd' stays valid, but e.g. 'รก' is newly invalid in UTF-8 exec charset
    while valid e.g. in ISO-8859-1, because it is a single character which needs
    2 bytes to be encoded.
    
    The following patch does that by checking (only pedantically, especially
    because it is a DR) if we'd emit a -Wmultichar warning because character
    constant has more than one byte in it whether the number of source characters
    is equal to the number of bytes in the multichar string.
    If it is, it is normal multi-character literal constant
    and is diagnosed normally with -Wmultichar, otherwise at least one of the
    c-chars in the sequence was encoded as 2+ bytes.
    
    2023-11-14  Jakub Jelinek  <jakub@redhat.com>
    
            PR c++/110341
    libcpp/
            * charset.cc: Implement C++26 P1854R4 - Making non-encodable string
            literals ill-formed.
            (one_count_chars, convert_count_chars, count_source_chars): New
            functions.
            (narrow_str_to_charconst): Change last arg type from cpp_ttype to
            const cpp_token *.  For C++ if pedantic and i > 1 in CPP_CHAR
            interpret token also as CPP_STRING32 and if number of characters
            in the CPP_STRING32 is larger than number of bytes in CPP_CHAR,
            pedwarn on it.  Make the diagnostics more detailed.
            (wide_str_to_charconst): Change last arg type from cpp_ttype to
            const cpp_token *.  Make the diagnostics more detailed.
            (cpp_interpret_charconst): Adjust narrow_str_to_charconst and
            wide_str_to_charconst callers.
    gcc/testsuite/
            * g++.dg/cpp26/literals1.C: New test.
            * g++.dg/cpp26/literals2.C: New test.
            * g++.dg/cpp23/wchar-multi1.C: Adjust expected diagnostic wordings.
            * g++.dg/cpp23/wchar-multi2.C: Likewise.
            * gcc.dg/c23-utf8char-3.c: Likewise.
            * gcc.dg/cpp/charconst-4.c: Likewise.
            * gcc.dg/cpp/charconst.c: Likewise.
            * gcc.dg/cpp/if-2.c: Likewise.
            * gcc.dg/utf16-4.c: Likewise.
            * gcc.dg/utf32-4.c: Likewise.
            * g++.dg/cpp1z/utf8-neg.C: Likewise.
            * g++.dg/cpp2a/ucn2.C: Likewise.
            * g++.dg/ext/utf16-4.C: Likewise.
            * g++.dg/ext/utf32-4.C: Likewise.
Comment 5 Jakub Jelinek 2023-11-14 17:33:47 UTC
Implemented for GCC 14 now.