Bug 110348 - [C++26] P2741R3 - User-generated static_assert messages
Summary: [C++26] P2741R3 - User-generated static_assert messages
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: c++ (show other bugs)
Version: 14.0
: P3 normal
Target Milestone: ---
Assignee: Jakub Jelinek
URL:
Keywords:
Depends on:
Blocks: c++26-core
  Show dependency treegraph
 
Reported: 2023-06-21 16:25 UTC by Marek Polacek
Modified: 2023-11-23 09:58 UTC (History)
5 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2023-06-21 00:00:00


Attachments
gcc14-pr110348-wip.patch (2.39 KB, patch)
2023-08-23 16:46 UTC, Jakub Jelinek
Details | Diff
gcc14-pr110348.patch (5.17 KB, patch)
2023-08-24 11:22 UTC, Jakub Jelinek
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Marek Polacek 2023-06-21 16:25:10 UTC
See <https://wg21.link/P2741R3>.
Comment 1 Andrew Pinski 2023-06-21 22:47:11 UTC
Confirmed.
Comment 2 Jakub Jelinek 2023-08-23 16:46:01 UTC
Created attachment 55780 [details]
gcc14-pr110348-wip.patch

Untested WIP on this.  Now need to figure out which usual cases I'm going to handle an easy way (I think STRIP_NOPS of ADDR_EXPR of STRING_CST, POINTER_PLUS of that plus INTEGER_CST, ADDR_EXPR of a VAR_DECL with STRING_CST DECL_INITIAL,
ADDR_EXPR of ARRAY_REF of a VAR_DECL with STRING_CST DECL_INITIAL/STRING_CST
for start, for others I'll need to build ARRAY_REF for each char and evaluate).
Plus testcase coverage.
It isn't clear what will or should happen if one uses some special execution
character set, because then strings and literals will be translated into that
execution character set and writing that as message might be weird.
Comment 3 Jakub Jelinek 2023-08-23 20:23:06 UTC
I wonder if the paper wording isn't incorrect, or at least comparing the clang++
implementation vs. the paper gives some differences.

One (minor) is that they emit errors when the size () and/or data () members aren't constexpr, while the paper as voted in only requires that
"- the expression M.size() is implicitly convertible to std::size_t, and
- the expression M.data() is implicitly convertible to ”pointer to const char”."
unless the static assertion fails.  The WIP patch doesn't do that, only effectively diagnoses it during constant evaluation of those when the static assertion fails.

More important, they have in the testcase something similar to what I filed in PR111122, but let's use what works also in GCC:
struct T {
  const char *d = init ();
  constexpr int size () const { return 2; }
  constexpr const char *data () const { return d; }
  constexpr const char *init () const { return new char[2] { 'o', 'k' }; }
  constexpr ~T () { delete[] d; }
};
constexpr int a = T{}.size (); // Ok, a = 2
constexpr int b = T{}.data ()[0]; // Ok, b = 'o'
constexpr const char *c = T{}.data (); // Not constant expression, because it returns
// address from new which is later in the dtor deleted.
static_assert (false, T{}); // Valid?

"- M.data(), implicitly converted to the type ”pointer to const char”, shall be a core
constant expression and let D denote the converted expression,

– for each i where 0 ≤ i < N , D[i] shall be an integral constant expression, and"

Now, I believe T{}.data () is not a core constant expression exactly because it returns address of later deleted heap object, but sure, both T{}.data ()[0] and T{}.data ()[1]
are integral constant expressions.

I don't know how std::format if constexpr (is it?) or string_view etc. work, do they
need M.data () not be actual constant expression and only M.data ()[0] through M.data ()[M.size () - 1] constant expressions?  In the patch I can surely try to constant expr evaluate M.data () quietly and if it isn't constant expression, just use a slower way which will ask for each character individually.  More important question is what is the intention for the standard...
Comment 4 Jakub Jelinek 2023-08-24 11:22:23 UTC
Created attachment 55785 [details]
gcc14-pr110348.patch

Full untested patch.
Comment 5 Jonathan Wakely 2023-08-24 11:56:31 UTC
(In reply to Jakub Jelinek from comment #3)
> I wonder if the paper wording isn't incorrect, or at least comparing the
> clang++
> implementation vs. the paper gives some differences.
> 
> One (minor) is that they emit errors when the size () and/or data () members
> aren't constexpr, while the paper as voted in only requires that
> "- the expression M.size() is implicitly convertible to std::size_t, and
> - the expression M.data() is implicitly convertible to ”pointer to const
> char”."
> unless the static assertion fails.  The WIP patch doesn't do that, only
> effectively diagnoses it during constant evaluation of those when the static
> assertion fails.

I agree with your WIP patch. The requirements for data() and size() to be constant expressions are in p11 (11.2) which only apply if the static assertions fails.

 
> More important, they have in the testcase something similar to what I filed
> in PR111122, but let's use what works also in GCC:
> struct T {
>   const char *d = init ();
>   constexpr int size () const { return 2; }
>   constexpr const char *data () const { return d; }
>   constexpr const char *init () const { return new char[2] { 'o', 'k' }; }
>   constexpr ~T () { delete[] d; }
> };
> constexpr int a = T{}.size (); // Ok, a = 2
> constexpr int b = T{}.data ()[0]; // Ok, b = 'o'
> constexpr const char *c = T{}.data (); // Not constant expression, because
> it returns
> // address from new which is later in the dtor deleted.
> static_assert (false, T{}); // Valid?

Interesting one. The call to .data() occurs before the destructor, so I would naively expect it to be valid. I think it should be equivalent to:

constexpr int i = (T{}.data(), 0);

The data() pointer is valid during the lifetime of the T prvalue. I would expect that a failed static assertion creates some kind of scope for the constant expression M, and evaluation of the data() string occurs before the destruction of any objects created in that scope.

> I don't know how std::format if constexpr (is it?) or string_view etc. work,

std::format isn't a constexpr function. The "with this proposal" example in P2741R3 doesn't actually work with that proposal, it would require changes to std::format that haven't been proposed for the standard.

> do they
> need M.data () not be actual constant expression and only M.data ()[0]
> through M.data ()[M.size () - 1] constant expressions?

string_view doesn't require a data() function in any way, constexpr or not. It just takes a pointer and optional length.

> In the patch I can
> surely try to constant expr evaluate M.data () quietly and if it isn't
> constant expression, just use a slower way which will ask for each character
> individually.  More important question is what is the intention for the
> standard...

I think it has to be a constant expression, the slower fallback shouldn't be needed IMHO.
Comment 6 Jonathan Wakely 2023-08-24 12:10:38 UTC
(In reply to Jonathan Wakely from comment #5)
> I agree with your WIP patch. The requirements for data() and size() to be
> constant expressions are in p11 (11.2) which only apply if the static
> assertions fails.

In other words, I think the paper is clear and clang is wrong here.

Although arguably what clang does is more useful. I'm not sure why you'd want to use a non-constexpr size() or data() that only compiles as long as the static assertion passes. It means you won't find out that your user-generated message can't actually be printed until you run on a target where the assertion fails.

I suppose there could be a case where those functions are only constexpr sometimes, and that happens to coincide with exactly the conditions where the assertion fails. That seems unlikely though. It seems more likely that you would use an assertion to terminate compilation when your code is *missing* features, not make it fail only when features are present.

The wording voted into the draft also seems counter to the design expressed in the paper that says "The message-producing expression is intended to be always instantiated, but only evaluated if the assertion failed." Does instantiating it require it to be a valid constant expression, even if not evaluated?
Comment 7 Jonathan Wakely 2023-08-24 12:19:14 UTC
(In reply to Jonathan Wakely from comment #6)
> Although arguably what clang does is more useful. I'm not sure why you'd
> want to use a non-constexpr size() or data() that only compiles as long as
> the static assertion passes. It means you won't find out that your
> user-generated message can't actually be printed until you run on a target
> where the assertion fails.

In the general case, the user-generated string might only be a constant expression for some inputs, which is why it's unevaluated unless the assertion fails. And the member functions being non-constexpr is only one of many ways in which those functions could fail to be usable in constant expressions. So just checking for constexpr members isn't sufficient to avoid the problem of a message that can never be printed.
Comment 8 Jakub Jelinek 2023-08-24 12:28:04 UTC
(In reply to Jonathan Wakely from comment #6)
> (In reply to Jonathan Wakely from comment #5)
> > I agree with your WIP patch. The requirements for data() and size() to be
> > constant expressions are in p11 (11.2) which only apply if the static
> > assertions fails.
> 
> In other words, I think the paper is clear and clang is wrong here.
> 
> Although arguably what clang does is more useful. I'm not sure why you'd
> want to use a non-constexpr size() or data() that only compiles as long as
> the static assertion passes. It means you won't find out that your
> user-generated message can't actually be printed until you run on a target
> where the assertion fails.

Sure, if the standard is changed such that size() and data() must be constexpr,
it would be nice to check it.  Without full evaluation one can't guarantee it will
be a constant expression and there could be tons of other reasons why it isn't constant expression (say it returns some class and conversion operator isn't constexpr, or it throws, or has asm and many other reasons).

For the M.data () not being a constant expression, it depends on the exact wording as well.  Either the standard could drop the requirement that it is a core constant expression altogether, then e.g. nothing will require that (M.data (), 0) is constant expression when M.size () is 0.  Or it could say that (M.data (), 0) is an integer constant expression, then one can verify that, and then quietly try if M.data () is a constant expression; if it is, it can grab the message from it using say GCC's c_getstr if it works.  If it isn't, it would need to evaluate it character by character.

BTW, there is a third difference between my latest patch and clang++ implementation.
They reject static_assert (false, "foo"_myd); which I have in the testcase.  IMHO
"foo"_myd doesn't match the syntactic requirements of unevaluated-string as the https://eel.is/c++draft/dcl.pre#10 wording says, because unevaluated-string non-terminal is string-literal with some extra rules, while user-defined-literal is some other non-terminal.  And as I've tried to show in the testcase, a constexpr operator ""
can return something on which .size () / .data () can be called and can satisfy the requirements.
Comment 9 corentinjabot 2023-09-12 11:30:08 UTC
(In reply to Jakub Jelinek from comment #3)
> I wonder if the paper wording isn't incorrect, or at least comparing the
> clang++
> implementation vs. the paper gives some differences.
> 
> One (minor) is that they emit errors when the size () and/or data () members
> aren't constexpr, while the paper as voted in only requires that
> "- the expression M.size() is implicitly convertible to std::size_t, and
> - the expression M.data() is implicitly convertible to ”pointer to const
> char”."
> unless the static assertion fails.  The WIP patch doesn't do that, only
> effectively diagnoses it during constant evaluation of those when the static
> assertion fails.

During review in clang we felt that it diagnosing it it all cases
would be preferable to our users, as otherwise errors only manifest when the static assertion fails,
likely at a point where the person getting the diagnostic would not be able to act on it.
So we made it a warning that defaults to an error.

Note that core felt strongly we should not check for constant expressions at the time, 
but maybe opinions changed?

> 
> More important, they have in the testcase something similar to what I filed
> in PR111122, but let's use what works also in GCC:
> struct T {
>   const char *d = init ();
>   constexpr int size () const { return 2; }
>   constexpr const char *data () const { return d; }
>   constexpr const char *init () const { return new char[2] { 'o', 'k' }; }
>   constexpr ~T () { delete[] d; }
> };
> constexpr int a = T{}.size (); // Ok, a = 2
> constexpr int b = T{}.data ()[0]; // Ok, b = 'o'
> constexpr const char *c = T{}.data (); // Not constant expression, because
> it returns
> // address from new which is later in the dtor deleted.
> static_assert (false, T{}); // Valid?


See https://github.com/cplusplus/CWG/issues/350, because i was confused too.
`data()` is a core constant expression. the implementation should behave _as if_ `T{}.data ()[N]` is evaluated for each `N`
even if that would be pretty bad implementation strategy.

> 
> "- M.data(), implicitly converted to the type ”pointer to const char”, shall
> be a core
> constant expression and let D denote the converted expression,
> 
> – for each i where 0 ≤ i < N , D[i] shall be an integral constant
> expression, and"
> 
> Now, I believe T{}.data () is not a core constant expression exactly because
> it returns address of later deleted heap object, but sure, both T{}.data
> ()[0] and T{}.data ()[1]
> are integral constant expressions.
> 
> I don't know how std::format if constexpr (is it?) or string_view etc. work,
> do they
> need M.data () not be actual constant expression and only M.data ()[0]
> through M.data ()[M.size () - 1] constant expressions?  In the patch I can
> surely try to constant expr evaluate M.data () quietly and if it isn't
> constant expression, just use a slower way which will ask for each character
> individually.  More important question is what is the intention for the
> standard...
Comment 10 Jakub Jelinek 2023-09-12 12:03:11 UTC
(In reply to corentinjabot from comment #9)
> During review in clang we felt that it diagnosing it it all cases
> would be preferable to our users, as otherwise errors only manifest when the
> static assertion fails,
> likely at a point where the person getting the diagnostic would not be able
> to act on it.
> So we made it a warning that defaults to an error.

Then it should be a warning rather than error IMHO.  Because it isn't invalid, just
likely unintended.

> See https://github.com/cplusplus/CWG/issues/350, because i was confused too.
> `data()` is a core constant expression. the implementation should behave _as
> if_ `T{}.data ()[N]` is evaluated for each `N`
> even if that would be pretty bad implementation strategy.

Jason, do we have a way to test whether something is a core constant expression in the FE?  Seems the https://eel.is/c++draft/expr.const#13 checks are done in cxx_eval_outermost_constant_expression and I don't see a way to ignore them.

Because for N > 0, I think we can as well check it just by evaluating T{}.data ()[0]
etc., but for N == 0 we can't.
Comment 11 Jakub Jelinek 2023-09-12 12:23:18 UTC
(In reply to Jakub Jelinek from comment #10)
> Jason, do we have a way to test whether something is a core constant
> expression in the FE?  Seems the https://eel.is/c++draft/expr.const#13
> checks are done in cxx_eval_outermost_constant_expression and I don't see a
> way to ignore them.
> 
> Because for N > 0, I think we can as well check it just by evaluating
> T{}.data ()[0]
> etc., but for N == 0 we can't.

Maybe that is what Jonathan was suggesting, see if (T{}.data (), 0) is a constant expression.
Comment 12 Jakub Jelinek 2023-09-12 13:14:35 UTC
BTW, shall size() and data() be manifestly constant-evaluated?
I think it doesn't satisfy any of the https://eel.is/c++draft/expr.const#19
bullets (unlike first static_assert argument).
Comment 13 Jason Merrill 2023-09-12 15:20:41 UTC
(In reply to Jakub Jelinek from comment #12)
> BTW, shall size() and data() be manifestly constant-evaluated?
> I think it doesn't satisfy any of the https://eel.is/c++draft/expr.const#19
> bullets (unlike first static_assert argument).

Good point, I think we're missing some wording to make that all manifestly constant-evaluated; it absolutely should be.

(In reply to Jakub Jelinek from comment #10)
> Then it should be a warning rather than error IMHO.  Because it isn't
> invalid, just likely unintended.

Agreed.
Comment 14 GCC Commits 2023-11-23 08:27:12 UTC
The master branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>:

https://gcc.gnu.org/g:6ce952188ab39e303e4f63e474b5cba83b5b12fd

commit r14-5771-g6ce952188ab39e303e4f63e474b5cba83b5b12fd
Author: Jakub Jelinek <jakub@redhat.com>
Date:   Thu Nov 23 09:13:37 2023 +0100

    c++: Implement C++26 P2741R3 - user-generated static_assert messages [PR110348]
    
    The following patch implements the user generated static_assert messages next
    to string literals.
    
    As I wrote already in the PR, in addition to looking through the paper
    I looked at the clang++ testcase for this feature implemented there from
    paper's author and on godbolt played with various parts of the testcase
    coverage below, and there are some differences between what the patch
    implements and what clang++ implements.
    
    The first is that clang++ diagnoses if M.size () or M.data () methods
    are present, but aren't constexpr; while the paper introduction talks about
    that, the standard wording changes don't seem to require that, all they say
    is that those methods need to exist (assuming accessible and the like)
    and be implicitly convertible to std::size_t or const char *, but rest is
    only if the static assertion fails.  If there is intent to change that
    wording, the question is how far to go, e.g. while M.size () could be
    constexpr, they could e.g. return some class object which wouldn't have
    constexpr conversion operator to size_t/const char * and tons of other
    reasons why the constant evaluation could fail.  Without actually evaluating
    it I don't see how we could guarantee anything for non-failed static_assert.
    
    The second difference is that
    static_assert (false, "foo"_myd);
    in the testcase is normal failed static assertion and
    static_assert (true, "foo"_myd);
    would be accepted, while clang++ rejects it.  IMHO
    "foo"_myd doesn't match the syntactic requirements of unevaluated-string
    as mentioned in http://eel.is/c++draft/dcl.pre#10 , and because
    a constexpr udlit operator can return something which is valid, it shouldn't
    be rejected just in case.
    Last is clang++ ICEs on non-static data members size/data.
    
    The first version of this support had a difference where M.data () was not
    a constant expression but a core constant expression, but if M.size () != 0
    M.data ()[0] ... M.data ()[M.size () - 1] were integer constant expressions.
    We don't have any routine to test whether an expression is a core constant
    expression, so what the code does is try silently whether M.data () is
    a constant expression (maybe_constant_value), if it is, nice, we can use
    that result to attempt to optimize the extraction of the message from it
    if it is some recognized form involving a STRING_CST and just to double-check
    try to constant evaluate M.data ()[0] and M.data ()[M.size () - 1] expressions
    as boundaries but not anything in between.  If M.data () is not a constant
    expression, we don't fail, but use a slower method of evaluating M.data ()[i]
    for i 0, 1, ... M.size () - 1.  And if M.size () == 0, the above wouldn't
    evaluate anything, so we try to constant evaluate (M.data (), 0) as constant
    expression, which should succeed if M.data () is a core constant expression
    and fail otherwise.
    
    The patch assumes that these expressions are manifestly constant evaluated.
    
    The patch implements what I see in the paper, because it is unclear what
    further changes will be voted in (and the changes can be done at that
    point).
    The initial patch used tf_none in 6 spots so that just the static_assert
    specific errors were emitted and not others, but during review this has been
    changed, so that we emit both the more detailed errors why something wasn't
    found or wasn't callable or wasn't convertible and diagnostics that
    static_assert second argument needs to satisfy some of the needed properties.
    
    2023-11-23  Jakub Jelinek  <jakub@redhat.com>
    
            PR c++/110348
    gcc/
            * doc/invoke.texi (-Wno-c++26-extensions): Document.
    gcc/c-family/
            * c.opt (Wc++26-extensions): New option.
            * c-cppbuiltin.cc (c_cpp_builtins): For C++26 predefine
            __cpp_static_assert to 202306L rather than 201411L.
    gcc/cp/
            * parser.cc: Implement C++26 P2741R3 - user-generated static_assert
            messages.
            (cp_parser_static_assert): Parse message argument as
            conditional-expression if it is not a pure string literal or
            several of them concatenated followed by closing paren.
            * semantics.cc (finish_static_assert): Handle message which is not
            STRING_CST.  For condition with bare parameter packs return early.
            * pt.cc (tsubst_expr) <case STATIC_ASSERT>: Also tsubst_expr
            message and make sure that if it wasn't originally STRING_CST, it
            isn't after tsubst_expr either.
    gcc/testsuite/
            * g++.dg/cpp26/static_assert1.C: New test.
            * g++.dg/cpp26/feat-cxx26.C (__cpp_static_assert): Expect
            202306L rather than 201411L.
            * g++.dg/cpp0x/udlit-error1.C: Expect different diagnostics for
            static_assert with user-defined literal.
Comment 15 Jakub Jelinek 2023-11-23 08:42:08 UTC
Now implemented for GCC 14.
Comment 16 corentinjabot 2023-11-23 09:38:17 UTC
(In reply to Jakub Jelinek from comment #15)
> Now implemented for GCC 14.

Thanks for working on this
    
>     As I wrote already in the PR, in addition to looking through the paper
>     I looked at the clang++ testcase for this feature implemented there from
>     paper's author and on godbolt played with various parts of the testcase
>     coverage below, and there are some differences between what the patch
>     implements and what clang++ implements.
>     
>     The first is that clang++ diagnoses if M.size () or M.data () methods
>     are present, but aren't constexpr; while the paper introduction talks
> about
>     that, the standard wording changes don't seem to require that, all they
> say
>     is that those methods need to exist (assuming accessible and the like)
>     and be implicitly convertible to std::size_t or const char *, but rest is
>     only if the static assertion fails.  If there is intent to change that
>     wording, the question is how far to go, e.g. while M.size () could be
>     constexpr, they could e.g. return some class object which wouldn't have
>     constexpr conversion operator to size_t/const char * and tons of other
>     reasons why the constant evaluation could fail.  Without actually
> evaluating
>     it I don't see how we could guarantee anything for non-failed
> static_assert.


Clang always evaluate if i recall, the error is actually a warning that defaults to error.

>     
>     The second difference is that
>     static_assert (false, "foo"_myd);
>     in the testcase is normal failed static assertion and
>     static_assert (true, "foo"_myd);
>     would be accepted, while clang++ rejects it.  IMHO
>     "foo"_myd doesn't match the syntactic requirements of unevaluated-string
>     as mentioned in http://eel.is/c++draft/dcl.pre#10 , and because
>     a constexpr udlit operator can return something which is valid, it
> shouldn't
>     be rejected just in case.

Seems like a bug in Clang indeed, I will investigate. 

>     Last is clang++ ICEs on non-static data members size/data.

You have a reproducer for that ?
Comment 17 Jakub Jelinek 2023-11-23 09:58:06 UTC
(In reply to corentinjabot from comment #16)
> Clang always evaluate if i recall, the error is actually a warning that
> defaults to error.

I initially had a warning for it, but Jason preferred not to warn at all.
Non-constexpr methods/functions is just one of many reasons why it could fail
to constant evaluate.

> >     Last is clang++ ICEs on non-static data members size/data.
> 
> You have a reproducer for that ?

The static_assert1.C test above.
https://godbolt.org/z/s3fffdsEq