gcc -x c++ -Og -std=c++20 yields inconsistent compile time evaluation for following example: struct Str { constexpr Str(){ } constexpr Str(const char *instr) { str = instr; length = 0; for(auto index = 0;instr[index];++index) { ++length; } } const char *str = nullptr; int length = 0; }; extern void callback(Str str); void func1() { callback(Str{"Test"}); // Does not get constexpr optimized by gcc } void func2() { Str str{"Test"}; callback(str); // Does get constexpr optimized by gcc } func2's lvalue str initializer gets compile time evaluated and the call yields plain mov-s in assembly, but prvalue Str{"Test"} does not - instead it inlines the constructor and performs the counting each time func1 is invoked. constexpr does not guarantee compile time evaluation, however this specific usecase is extremely important to our code base for not counting the lengths of strings each time performance-wise, and it seems to be some small semantic difference? Please note that the "callback" function call is not important, same happens without it, it is in the example to illustrate that it is unrelated to dead code removal. That optimization is something we could try and look into in the gcc codebase to propose a patch, but we would appreciate any advice/pointers as to where to look. Godbolt: https://godbolt.org/z/W1KK3hqao
For user vars we just try to silently non-manifestly-constant-evaluated constant evaluate initializers. Would this request be just try to do the same on TARGET_EXPRs, or try to constant evaluate any subexpression of any expression? E.g. shall constexpr int foo (int x) { return x; } int bar (); int a = foo (42) + bar (); be simplified to 42 + bar (); already in the FE (rather than say through inlining)? The whole initializer (but could be any expression inside of a function body too) isn't a constant expression, just parts of it...
Though, we clearly do that already for CALL_EXPRs to constexpr functions during cp_fold. So guess it is just TARGET_EXPRs or what exactly this testcase needs.
(In reply to Jakub Jelinek from comment #2) > Though, we clearly do that already for CALL_EXPRs to constexpr functions > during cp_fold. So guess it is just TARGET_EXPRs or what exactly this > testcase needs. Yes, I think the point here is to turn e.g. this: callback (TARGET_EXPR <D.2890, <<< Unknown tree: aggr_init_expr 5 __ct_comp D.2890 (struct Str *) <<< Unknown tree: void_cst >>> (const char *) "Test" >>>>) into: callback (TARGET_EXPR <D.2890, {.str=(const char *) "Test", .length=4}>)
The master branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>: https://gcc.gnu.org/g:0547dbb725b6d8e878a79e28a2e171eafcfbc1aa commit r15-5746-g0547dbb725b6d8e878a79e28a2e171eafcfbc1aa Author: Jakub Jelinek <jakub@redhat.com> Date: Thu Nov 28 11:18:07 2024 +0100 expr, c: Don't clear whole unions [PR116416] As discussed earlier, we currently clear padding bits even when we don't have to and that causes pessimization of emitted code, e.g. for union U { int a; long b[64]; }; void bar (union U *); void foo (void) { union U u = { 0 }; bar (&u); } we need to clear just u.a, not the whole union, but on the other side in cases where the standard requires padding bits to be zeroed, like for C23 {} initializers of aggregates with padding bits, or for C++11 zero initialization we don't do that. This patch a) moves some of the stuff into complete_ctor_at_level_p (but not all the *p_complete = 0; case, for that it would need to change so that it passes around the ctor rather than just its type) and changes the handling of unions b) introduces a new option, so that users can either get the new behavior (only what is guaranteed by the standards, the default), or previous behavior (union padding zero initialization, no such guarantees in structures) or also a guarantee in structures c) introduces a new CONSTRUCTOR flag which says that the padding bits (if any) should be zero initialized (and sets it for now in the C FE for C23 {} initializers). Am not sure the CONSTRUCTOR_ZERO_PADDING_BITS flag is really needed for C23, if there is just empty initializer, I think we already mark it as incomplete if there are any missing initializers. Maybe with some designated initializer games, say void foo () { struct S { char a; long long b; }; struct T { struct S c; } t = { .c = {}, .c.a = 1, .c.b = 2 }; ... } Is this supposed to initialize padding bits in C23 and then the .c.a = 1 and .c.b = 2 stores preserve those padding bits, so is that supposed to be different from struct T t2 = { .c = { 1, 2 } }; ? What about just struct T t3 = { .c.a = 1, .c.b = 2 }; ? And I haven't touched the C++ FE for the flag, because I'm afraid I'm lost on where exactly is zero-initialization done (vs. other types of initialization) and where is e.g. zero-initialization of a temporary then (member-wise) copied. Say struct S { char a; long long b; }; struct T { constexpr T (int a, int b) : c () { c.a = a; c.b = b; } S c; }; void bar (T *); void foo () { T t (1, 2); bar (&t); } Is the c () value-initialization of t.c followed by c.a and c.b updates which preserve the zero initialized padding bits? Or is there some copy construction involved which does member-wise copying and makes the padding bits undefined? Looking at (older) clang++ with -O2, it initializes also the padding bits when c () is used and doesn't with c {}. For GCC, note that there is that optimization from Alex to zero padding bits for optimization purposes for small aggregates, so either one needs to look at -O0 -fdump-tree-gimple dumps, or use larger structures which aren't optimized that way. 2024-11-28 Jakub Jelinek <jakub@redhat.com> PR c++/116416 gcc/ * flag-types.h (enum zero_init_padding_bits_kind): New type. * tree.h (CONSTRUCTOR_ZERO_PADDING_BITS): Define. * common.opt (fzero-init-padding-bits=): New option. * expr.cc (categorize_ctor_elements_1): Handle CONSTRUCTOR_ZERO_PADDING_BITS or flag_zero_init_padding_bits == ZERO_INIT_PADDING_BITS_ALL. Fix up *p_complete = -1; setting for unions. (complete_ctor_at_level_p): Handle unions differently for flag_zero_init_padding_bits == ZERO_INIT_PADDING_BITS_STANDARD. * gimple-fold.cc (type_has_padding_at_level_p): Fix up UNION_TYPE handling, return also true for UNION_TYPE with no FIELD_DECLs and non-zero size, handle QUAL_UNION_TYPE like UNION_TYPE. * doc/invoke.texi (-fzero-init-padding-bits=@var{value}): Document. gcc/c/ * c-parser.cc (c_parser_braced_init): Set CONSTRUCTOR_ZERO_PADDING_BITS for flag_isoc23 empty initializers. * c-typeck.cc (constructor_zero_padding_bits): New variable. (struct constructor_stack): Add zero_padding_bits member. (really_start_incremental_init): Save and clear constructor_zero_padding_bits. (push_init_level): Save constructor_zero_padding_bits. Or into it CONSTRUCTOR_ZERO_PADDING_BITS from previous value if implicit. (pop_init_level): Set CONSTRUCTOR_ZERO_PADDING_BITS if constructor_zero_padding_bits and restore constructor_zero_padding_bits. gcc/testsuite/ * gcc.dg/plugin/infoleak-1.c (test_union_2b, test_union_4b): Expect diagnostics. * gcc.dg/c23-empty-init-5.c: New test. * gcc.dg/gnu11-empty-init-1.c: New test. * gcc.dg/gnu11-empty-init-2.c: New test. * gcc.dg/gnu11-empty-init-3.c: New test. * gcc.dg/gnu11-empty-init-4.c: New test.
Marek, feel free to commit your patch now (I think it has been approved and was just waiting for my side, right?).
Thanks. Yeah, it was approved. But I still see FAIL: g++.dg/tree-ssa/pr78687.C scan-tree-dump sra "Removing load:.*ptr;" even once I've rebased my patch. Weird, because the FAIL *was* gone when I tried your patch a few weeks ago.
(In reply to Marek Polacek from comment #6) > Thanks. Yeah, it was approved. But I still see > > FAIL: g++.dg/tree-ssa/pr78687.C scan-tree-dump sra "Removing load:.*ptr;" > > even once I've rebased my patch. Weird, because the FAIL *was* gone when I > tried your patch a few weeks ago. Guess it would pass with just r15-5746 + your https://gcc.gnu.org/pipermail/gcc-patches/2024-October/664680.html patch, but not with r15-5747 . inline ref_proxy<option_2, option_ref > make_object_1() { return ref_proxy<option_2, option_ref >(option_2()); } I think option_2() above is value initialization and zero initialization in that case given that it is struct option_2 { }; and so we are required to zero initialize even the padding bits (sizeof (option_2) == 1, so that exact byte). So, can we perhaps commit with xfailing that testcase? Martin, could you please have a look what could be done on the SRA side? I'm afraid with the r15-5746 + r15-5747 changes there might be more zero initialization of the padding bits than in the past (ditto for the optimization Alexandre added for small aggregates with padding bits) and it would be nice if SRA was still able to optimize it assuming it can prove nothing really attempts to read the padding bits etc., basically just add some smart handling of = {}; initialization.
Of course we could also try to get smarter at gimplification time. We have {._storage={.D.9582={.D.9163={._tail={.D.9221={._tail={.D.9280={._head={}}}}}}}, ._which=2}} ctor and the only CONSTRUCTOR_ZERO_PADDING_BITS ctor is that {}, for 1-byte empty structure, while the whole structure has 48 bytes, so we could also gimplify it as D.10137._storage..D.9582.D.9163._tail.D.9221._tail.D.9280._head = {}; D.10137._storage._which = 2; rather than D.10137 = {}; D.10137._storage._which = 2; But we'd need to figure out not just that there is padding, but also how much padding we need to clear, how consecutive or non-consecutive it is, etc. Maybe as a hack just let the code handle one special case, if there is just one sub-field which needs zero padding, only zero initialize that, if there are 2+, zero it all. Changing - return ref_proxy<option_2, option_ref >(option_2()); + return ref_proxy<option_2, option_ref >(option_2{}); in the testcase also achieves the same effect.
The trunk branch has been updated by Marek Polacek <mpolacek@gcc.gnu.org>: https://gcc.gnu.org/g:12de1942a0a673f9f2f1c2bfce4279a666061ffc commit r15-6052-g12de1942a0a673f9f2f1c2bfce4279a666061ffc Author: Marek Polacek <polacek@redhat.com> Date: Thu Aug 29 12:58:41 2024 -0400 c++: compile time evaluation of prvalues [PR116416] This PR reports a missed optimization. When we have: Str str{"Test"}; callback(str); as in the test, we're able to evaluate the Str::Str() call at compile time. But when we have: callback(Str{"Test"}); we are not. With this patch (in fact, it's Patrick's patch with a little tweak), we turn callback (TARGET_EXPR <D.2890, <<< Unknown tree: aggr_init_expr 5 __ct_comp D.2890 (struct Str *) <<< Unknown tree: void_cst >>> (const char *) "Test" >>>>) into callback (TARGET_EXPR <D.2890, {.str=(const char *) "Test", .length=4}>) I explored the idea of calling maybe_constant_value for the whole TARGET_EXPR in cp_fold. That has three problems: - we can't always elide a TARGET_EXPR, so we'd have to make sure the result is also a TARGET_EXPR; - the resulting TARGET_EXPR must have the same flags, otherwise Bad Things happen; - getting a new slot is also problematic. I've seen a test where we had "TARGET_EXPR<D.2680, ...>, D.2680", and folding the whole TARGET_EXPR would get us "TARGET_EXPR<D.2681, ...>", but since we don't see the outer D.2680, we can't replace it with D.2681, and things break. With this patch, two tree-ssa tests regressed: pr78687.C and pr90883.C. FAIL: g++.dg/tree-ssa/pr90883.C scan-tree-dump dse1 "Deleted redundant store: .*.a = {}" is easy. Previously, we would call C::C, so .gimple has: D.2590 = {}; C::C (&D.2590); D.2597 = D.2590; return D.2597; Then .einline inlines the C::C call: D.2590 = {}; D.2590.a = {}; // #1 D.2590.b = 0; // #2 D.2597 = D.2590; D.2590 ={v} {CLOBBER(eos)}; return D.2597; then #2 is removed in .fre1, and #1 is removed in .dse1. So the test passes. But with the patch, .gimple won't have that C::C call, so the IL is of course going to look different. The .optimized dump looks the same though so there's no problem. pr78687.C is XFAILed because the test passes with r15-5746 but not with r15-5747 as well. I opened <https://gcc.gnu.org/PR117971>. PR c++/116416 gcc/cp/ChangeLog: * cp-gimplify.cc (cp_fold_r) <case TARGET_EXPR>: Try to fold TARGET_EXPR_INITIAL and replace it with the folded result if it's TREE_CONSTANT. gcc/testsuite/ChangeLog: * g++.dg/analyzer/pr97116.C: Adjust dg-message. * g++.dg/tree-ssa/pr78687.C: Add XFAIL. * g++.dg/tree-ssa/pr90883.C: Adjust dg-final. * g++.dg/cpp0x/constexpr-prvalue1.C: New test. * g++.dg/cpp1y/constexpr-prvalue1.C: New test. Co-authored-by: Patrick Palka <ppalka@redhat.com> Reviewed-by: Jason Merrill <jason@redhat.com>
Implemented in GCC 15.
Created attachment 59834 [details] gcc15-pr116416-opt.patch I've tried to implement a simple optimization (on top of the PR118002 fix), where we clear just a subobject that needs padding or other uninitialized bits cleared rather than the whole object. On the pr78687.C testcase, this results (in gimple dump) in: - D.10177 = {}; + D.10177._storage.D.9582.D.9163._tail.D.9221._tail.D.9280._head = {}; because as I wrote, the single byte option_2() is the only part of the object that actually needs to have padding bits cleared, rather than the rather large whole object. Though, SRA still punts even on that.
struct S { int a; }; struct T { struct S b; int c; struct S d; }; struct U { struct T e; int f; struct T g; }; void f0 (struct U *); void f1 (void) { struct U u = { { {}, 1, {} }, 2, { {}, 3, {} } }; f0 (&u); } void f2 (void) { struct U u = { { { 1 }, 2, { 3 } }, 4, { { 5 }, 6, {} } }; f0 (&u); } void f3 (void) { struct U u = { { { }, 2, { } }, 4, { { 5 }, 6, { 7 } } }; f0 (&u); } is a C23 testcase showing what it can do, for f1 there is no difference, for f2 - u = {}; u.e.b.a = 1; u.e.c = 2; u.e.d.a = 3; u.f = 4; u.g.b.a = 5; u.g.c = 6; + u.g.d = {}; f0 (&u); and for f3: - u = {}; + u.e = {}; u.e.c = 2; u.f = 4; u.g.b.a = 5; ...
The master branch has been updated by Patrick Palka <ppalka@gcc.gnu.org>: https://gcc.gnu.org/g:e7bccec33beece4a46bc1b20ed375e803e97aa88 commit r15-9400-ge7bccec33beece4a46bc1b20ed375e803e97aa88 Author: Patrick Palka <ppalka@redhat.com> Date: Sat Apr 12 14:06:56 2025 -0400 c++: improve constexpr prvalue folding [PR116416] This patch improves upon r15-6052-g12de1942a0a673 by performing prvalue folding with mce_false rather than mce_unknown when it's safe to do so (i.e. ff_mce_false is set), so that we can also fold temporary initializers that call is_constant_evaluated etc. In passing I noticed constexpr-prvalue1.C could more precisely verify the optimization happened by inspecting what the front end spits out instead of inspecting the optimized assembly -- that there's no constructor call doesn't necessarily imply the constructor has been completely folded away, only that its body has been inlined. PR c++/116416 gcc/cp/ChangeLog: * constexpr.cc (maybe_constant_init_1): Generalize type of of manifestly_const_eval parameter from bool to mce_value. (maybe_constant_init): Define 3-parameter version taking a manifestly_const_eval instead of bool parameter. (cxx_constant_init): Adjust. * cp-gimplify.cc (cp_fold_r) <case TARGET_EXPR>: Pass mce_false to maybe_constant_init during prvalue folding if ff_mce_false is set. * cp-tree.h (maybe_constant_init): Declare new overload. gcc/testsuite/ChangeLog: * g++.dg/cpp1y/constexpr-prvalue1.C: Adjust to instead inspect the 'original' dump. * g++.dg/cpp1y/constexpr-prvalue1a.C: New test. Reviewed-by: Jason Merrill <jason@redhat.com>