Bug 111351 - constexpr std::string objects permitted to escape constant evaluation when SSO
Summary: constexpr std::string objects permitted to escape constant evaluation when SSO
Status: UNCONFIRMED
Alias: None
Product: gcc
Classification: Unclassified
Component: libstdc++ (show other bugs)
Version: 14.0
: P3 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-09-08 22:52 UTC by James Y Knight
Modified: 2023-09-27 02:09 UTC (History)
4 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description James Y Knight 2023-09-08 22:52:28 UTC
In C++20, libstdc++ currently allows an std::string instance to escape from constant evaluation to runtime, as long as the string fit within the SSO length.

E.g., as a global variable, compiled with -std=c++20:
constexpr std::string s1; // OK
constexpr std::string s1 = ""; // OK
constexpr std::string s1 = "0123456789abcde"; // OK
constexpr std::string s2 = "0123456789abcdef"; // FAIL

I believe all of the above ought to fail to compile.

This will result in user code which can be built or not based on whether their string happens to fit within the SSO string length. I find that quite unfortunate, since it is supposed to be an internal implementation detail/optimization, and this makes it effectively part of the API that code will grow to depend on.

As comparison, libc++ rejects all the above examples, by forcing the SSO-size to zero in constant evaluation context, so that a pointer to an external allocation is always used.

This was brought to my attention from 
https://quuxplusone.github.io/blog/2023/09/08/constexpr-string-firewall/
Comment 1 Arthur O'Dwyer 2023-09-09 03:28:43 UTC
(Author of the blog post here.)
In contrast to James' view, I think the libstdc++/MSVC behavior is relatively easy to explain; I think libc++'s `if consteval` approach is baroque and confusing. [That is, _both_ behaviors are confusing to the newbie and need expert explanation, but libc++'s choice is confusing even for the experts, who have to maintain its split-brain SSO logic forever because Hyrum's Law. If you have to maintain something forever, you should at least choose to make it _simple_! As I say in the blog post, in hindsight I think libc++ screwed up here.]

IMHO it is a feature, not a bug, that I can write these lines:

    constinit std::string s1;
    constinit std::vector<char> v1;

libstdc++ would be within its rights, paper-Standard-wise, to reject both of these lines; but I don't think libstdc++ _should_ reject either of them. They're both fine code as far as I'm concerned. I think libc++ is the user-hostile/broken implementation here, not libstdc++.

Anyone who thinks libstdc++ ought to reject `s1` above should at least be forced to explain what libstdc++ ought to do about `v1`. From the user-programmer's POV, there's no difference between a default-initialized string and a default-initialized vector. Users don't care about these SSO details; they just want the code to work. That's what libstdc++ currently does. Good, IMO.
Comment 2 Jiang An 2023-09-11 12:31:09 UTC
> This will result in user code which can be built or not based on whether their string happens to fit within the SSO string length. I find that quite unfortunate, since it is supposed to be an internal implementation detail/optimization, and this makes it effectively part of the API that code will grow to depend on.

This comes from the nature of SSO and constexpr variables. I don't think it worthwhile to revert this PR (https://github.com/microsoft/STL/pull/1735) for MSVC STL - which makes the codes harder to understand and slows down the compilation.

> As comparison, libc++ rejects all the above examples, by forcing the SSO-size to zero in constant evaluation context, so that a pointer to an external allocation is always used.


libc++ is, unfortunately, currently unable to implement constexpr SSO, IIUC. I failed to find a constexpr-compatible way to determine libc++'s std::string is in short mode if the short mode were enabled in constant evaluation. (As a result, libc++'s string always uses long mode during constant evaluation.)

https://github.com/llvm/llvm-project/blob/0954dc3fb9214b994623f5306473de075f8e3593/libcxx/include/string#L829-L837

Note that there's no tag outside of the union, so we can't know which variant member is active without causing constant evaluation failure - unless the mechanism of std::is_within_lifetime (https://wg21.link/p2641r4) gets implemented.
Comment 3 Jonathan Wakely 2023-09-11 13:23:20 UTC
Does using __builtin_is_constant_p on the union member not work?
Comment 4 James Y Knight 2023-09-11 20:28:06 UTC
vector and string are different in one key way: a zero-sized vector has no accessible storage, while a zero-sized string has 1 byte of readable storage -- the NUL terminator. Because of that, I don't think it's unreasonable for a zero-length vector to be constinit'able, while a zero-length string is not.

But, certainly the _more_ concerning issue is with non-zero-sized strings where the validity of the program depends on what exact SSO size was chosen by the implementation.

> libc++'s choice is confusing even for the experts, who have to maintain its split-brain SSO logic forever because Hyrum's Law

"Hyrum's Law" is exactly why I think it's a mistake to permit SSO-allocated strings. It makes the SSO length a critical part of the interface. People will start writing code which assumes that a constexpr global string of size "N" works, and that will cause problems for other standard libraries which use a different SSO size "M", if M < N. E.g. if libc++ starts allowing this, then people who first target libc++ will find that strings up to 22 characters work, and be surprised/annoyed by libstdc++ failing to build their program.

It is much simpler to say to users "you cannot make a constexpr std::string unless it lives fully within constant-evaluation-time." then to also add "...OR unless the size is short enough, where that size depends on your implementation."
Comment 5 James Y Knight 2023-09-12 21:43:21 UTC
> Does using __builtin_is_constant_p on the union member not work?

I've created a proof-of-concept patch for libc++ to support SSO strings during constant evaluation. It works.

If everyone disagrees with me and believes that this is a really awesome foot-gun to give to users, I will go ahead and propose that patch to libc++ maintainers. (As mentioned, that'll cause more code to be compilable under libc++ than is possible to permit under libstdc++/MSVC implementations).

However, I continue to believe the opposite outcome, prohibiting this everywhere, would be preferable.
Comment 6 Arthur O'Dwyer 2023-09-12 23:14:58 UTC
(In reply to James Y Knight from comment #5)
> > Does using __builtin_is_constant_p on the union member not work?
> 
> I've created a proof-of-concept patch for libc++ to support SSO strings
> during constant evaluation. It works.
> 
> If everyone disagrees with me and believes that this is a really awesome
> foot-gun to give to users, I will go ahead and propose that patch to libc++
> maintainers. (As mentioned, that'll cause more code to be compilable under
> libc++ than is possible to permit under libstdc++/MSVC implementations).

FWIW #1: Personally I would be weakly in favor of that patch, but I would also be pessimistic about its chances of getting accepted in the current libc++ climate.

FWIW #2: A worst-of-both-worlds option ;) would be for your patch to `if consteval` the SSO buffer size so that it would be 24 at runtime (matching libc++'s current behavior) but 16 at compile time (matching libstdc++ and Microsoft if I'm not mistaken, so you'd get your cross-vendor portability at compile time). *I* would still consider that an unnecessary-and-thus-bad crippling of libc++ string's cool 24-byte-SSO feature; but I could imagine someone else finding it more palatable than any other alternative.  ["Worst-of-both-worlds" in the sense that you're paying to change the code at all, but the end result still has two codepaths that both need to be maintained, and divergence between compile-time and runtime SSO behavior.]
Comment 7 James Y Knight 2023-09-27 02:09:00 UTC
On the libc++ side, a suggestion was given that instead of making this an _error_, we could instead emit a warning if "a constexpr or constinit object is a basic_string or contains a basic_string subobject, or the definition of a constexpr or constinit variable extends the lifetime of a temporary object that meets the previous condition."

I think that was a really great suggestion -- diagnosing via a warning is a nicer solution than putting is_constant_evaluated hacks into the library (as MSVC had, and libc++ currently has but will likely remove).

One could either hardcode std::basic_string for the diagnostic, or add a new type-attribute to permit any type to opt-in to such a warning. You'd want to use it if you have a type where you don't _intend_ to support constant-initialization, but where it may sometimes be possible as an implementation detail, and you want to tell users not to rely on that implementation detail.