Bug 116416 - Missing optimization: compile time evaluation of prvalue
Summary: Missing optimization: compile time evaluation of prvalue
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: c++ (show other bugs)
Version: 14.2.0
: P3 normal
Target Milestone: 15.0
Assignee: Marek Polacek
URL:
Keywords: missed-optimization
Depends on:
Blocks:
 
Reported: 2024-08-19 11:18 UTC by Aleksandr
Modified: 2025-08-28 04:49 UTC (History)
8 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2024-09-18 00:00:00


Attachments
gcc15-pr116416-opt.patch (4.12 KB, patch)
2024-12-11 16:21 UTC, Jakub Jelinek
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Aleksandr 2024-08-19 11:18:45 UTC
gcc -x c++ -Og -std=c++20 yields inconsistent compile time evaluation for following example:

struct Str {
  constexpr Str(){
  }
  constexpr Str(const char *instr) {
      str = instr; length = 0;
      for(auto index = 0;instr[index];++index) {
        ++length; 
      }
  }
  const char *str = nullptr;
  int length = 0;
};
extern void callback(Str str);
void func1() {
    callback(Str{"Test"}); // Does not get constexpr optimized by gcc
}
void func2() {
    Str str{"Test"};
    callback(str); // Does get constexpr optimized by gcc
}

func2's lvalue str initializer gets compile time evaluated and the call yields plain mov-s in assembly,
but prvalue Str{"Test"} does not - instead it inlines the constructor and performs the counting each time func1 is invoked.

constexpr does not guarantee compile time evaluation, however this specific usecase is extremely important to our code base for not counting the lengths of strings each time performance-wise, and it seems to be some small semantic difference? 
Please note that the "callback" function call is not important, same happens without it, it is in the example to illustrate that it is unrelated to dead code removal.

That optimization is something we could try and look into in the gcc codebase to propose a patch, but we would appreciate any advice/pointers as to where to look.

Godbolt: https://godbolt.org/z/W1KK3hqao
Comment 1 Jakub Jelinek 2024-09-19 16:06:01 UTC
For user vars we just try to silently non-manifestly-constant-evaluated constant evaluate initializers.
Would this request be just try to do the same on TARGET_EXPRs, or try to constant evaluate any subexpression of any expression?
E.g. shall
constexpr int foo (int x) { return x; }
int bar ();
int a = foo (42) + bar ();
be simplified to 42 + bar (); already in the FE (rather than say through inlining)?
The whole initializer (but could be any expression inside of a function body too) isn't a constant expression, just parts of it...
Comment 2 Jakub Jelinek 2024-09-19 16:10:30 UTC
Though, we clearly do that already for CALL_EXPRs to constexpr functions during cp_fold.  So guess it is just TARGET_EXPRs or what exactly this testcase needs.
Comment 3 Marek Polacek 2024-09-19 18:22:08 UTC
(In reply to Jakub Jelinek from comment #2)
> Though, we clearly do that already for CALL_EXPRs to constexpr functions
> during cp_fold.  So guess it is just TARGET_EXPRs or what exactly this
> testcase needs.

Yes, I think the point here is to turn e.g. this:

  callback (TARGET_EXPR <D.2890, <<< Unknown tree: aggr_init_expr
    5   
    __ct_comp 
    D.2890
    (struct Str *) <<< Unknown tree: void_cst >>> 
    (const char *) "Test" >>>>)

into:

  callback (TARGET_EXPR <D.2890, {.str=(const char *) "Test", .length=4}>)
Comment 4 GCC Commits 2024-11-28 10:38:30 UTC
The master branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>:

https://gcc.gnu.org/g:0547dbb725b6d8e878a79e28a2e171eafcfbc1aa

commit r15-5746-g0547dbb725b6d8e878a79e28a2e171eafcfbc1aa
Author: Jakub Jelinek <jakub@redhat.com>
Date:   Thu Nov 28 11:18:07 2024 +0100

    expr, c: Don't clear whole unions [PR116416]
    
    As discussed earlier, we currently clear padding bits even when we
    don't have to and that causes pessimization of emitted code,
    e.g. for
    union U { int a; long b[64]; };
    void bar (union U *);
    void
    foo (void)
    {
      union U u = { 0 };
      bar (&u);
    }
    we need to clear just u.a, not the whole union, but on the other side
    in cases where the standard requires padding bits to be zeroed, like for
    C23 {} initializers of aggregates with padding bits, or for C++11 zero
    initialization we don't do that.
    
    This patch
    a) moves some of the stuff into complete_ctor_at_level_p (but not
       all the *p_complete = 0; case, for that it would need to change
       so that it passes around the ctor rather than just its type) and
       changes the handling of unions
    b) introduces a new option, so that users can either get the new
       behavior (only what is guaranteed by the standards, the default),
       or previous behavior (union padding zero initialization, no such
       guarantees in structures) or also a guarantee in structures
    c) introduces a new CONSTRUCTOR flag which says that the padding bits
       (if any) should be zero initialized (and sets it for now in the C
       FE for C23 {} initializers).
    
    Am not sure the CONSTRUCTOR_ZERO_PADDING_BITS flag is really needed
    for C23, if there is just empty initializer, I think we already mark
    it as incomplete if there are any missing initializers.  Maybe with
    some designated initializer games, say
    void foo () {
      struct S { char a; long long b; };
      struct T { struct S c; } t = { .c = {}, .c.a = 1, .c.b = 2 };
    ...
    }
    Is this supposed to initialize padding bits in C23 and then the .c.a = 1
    and .c.b = 2 stores preserve those padding bits, so is that supposed
    to be different from struct T t2 = { .c = { 1, 2 } };
    ?  What about just struct T t3 = { .c.a = 1, .c.b = 2 }; ?
    
    And I haven't touched the C++ FE for the flag, because I'm afraid I'm lost
    on where exactly is zero-initialization done (vs. other types of
    initialization) and where is e.g. zero-initialization of a temporary then
    (member-wise) copied.
    Say
    struct S { char a; long long b; };
    struct T { constexpr T (int a, int b) : c () { c.a = a; c.b = b; } S c; };
    void bar (T *);
    
    void
    foo ()
    {
      T t (1, 2);
      bar (&t);
    }
    Is the c () value-initialization of t.c followed by c.a and c.b updates
    which preserve the zero initialized padding bits?  Or is there some
    copy construction involved which does member-wise copying and makes the
    padding bits undefined?
    Looking at (older) clang++ with -O2, it initializes also the padding bits
    when c () is used and doesn't with c {}.
    For GCC, note that there is that optimization from Alex to zero padding bits
    for optimization purposes for small aggregates, so either one needs to look
    at -O0 -fdump-tree-gimple dumps, or use larger structures which aren't
    optimized that way.
    
    2024-11-28  Jakub Jelinek  <jakub@redhat.com>
    
            PR c++/116416
    gcc/
            * flag-types.h (enum zero_init_padding_bits_kind): New type.
            * tree.h (CONSTRUCTOR_ZERO_PADDING_BITS): Define.
            * common.opt (fzero-init-padding-bits=): New option.
            * expr.cc (categorize_ctor_elements_1): Handle
            CONSTRUCTOR_ZERO_PADDING_BITS or
            flag_zero_init_padding_bits == ZERO_INIT_PADDING_BITS_ALL.  Fix
            up *p_complete = -1; setting for unions.
            (complete_ctor_at_level_p): Handle unions differently for
            flag_zero_init_padding_bits == ZERO_INIT_PADDING_BITS_STANDARD.
            * gimple-fold.cc (type_has_padding_at_level_p): Fix up UNION_TYPE
            handling, return also true for UNION_TYPE with no FIELD_DECLs
            and non-zero size, handle QUAL_UNION_TYPE like UNION_TYPE.
            * doc/invoke.texi (-fzero-init-padding-bits=@var{value}): Document.
    gcc/c/
            * c-parser.cc (c_parser_braced_init): Set CONSTRUCTOR_ZERO_PADDING_BITS
            for flag_isoc23 empty initializers.
            * c-typeck.cc (constructor_zero_padding_bits): New variable.
            (struct constructor_stack): Add zero_padding_bits member.
            (really_start_incremental_init): Save and clear
            constructor_zero_padding_bits.
            (push_init_level): Save constructor_zero_padding_bits.  Or into it
            CONSTRUCTOR_ZERO_PADDING_BITS from previous value if implicit.
            (pop_init_level): Set CONSTRUCTOR_ZERO_PADDING_BITS if
            constructor_zero_padding_bits and restore
            constructor_zero_padding_bits.
    gcc/testsuite/
            * gcc.dg/plugin/infoleak-1.c (test_union_2b, test_union_4b): Expect
            diagnostics.
            * gcc.dg/c23-empty-init-5.c: New test.
            * gcc.dg/gnu11-empty-init-1.c: New test.
            * gcc.dg/gnu11-empty-init-2.c: New test.
            * gcc.dg/gnu11-empty-init-3.c: New test.
            * gcc.dg/gnu11-empty-init-4.c: New test.
Comment 5 Jakub Jelinek 2024-11-28 10:54:55 UTC
Marek, feel free to commit your patch now (I think it has been approved and was just waiting for my side, right?).
Comment 6 Marek Polacek 2024-11-28 16:13:41 UTC
Thanks.  Yeah, it was approved.  But I still see

FAIL: g++.dg/tree-ssa/pr78687.C   scan-tree-dump sra "Removing load:.*ptr;"

even once I've rebased my patch.  Weird, because the FAIL *was* gone when I tried your patch a few weeks ago.
Comment 7 Jakub Jelinek 2024-11-28 17:38:43 UTC
(In reply to Marek Polacek from comment #6)
> Thanks.  Yeah, it was approved.  But I still see
> 
> FAIL: g++.dg/tree-ssa/pr78687.C   scan-tree-dump sra "Removing load:.*ptr;"
> 
> even once I've rebased my patch.  Weird, because the FAIL *was* gone when I
> tried your patch a few weeks ago.

Guess it would pass with just r15-5746 + your
https://gcc.gnu.org/pipermail/gcc-patches/2024-October/664680.html
patch, but not with r15-5747 .
inline ref_proxy<option_2, option_ref > make_object_1()
{
    return ref_proxy<option_2, option_ref >(option_2());
}
I think option_2() above is value initialization and zero initialization in that case given that it is
struct option_2
{
};
and so we are required to zero initialize even the padding bits (sizeof (option_2) == 1, so that exact byte).

So, can we perhaps commit with xfailing that testcase?
Martin, could you please have a look what could be done on the SRA side?
I'm afraid with the r15-5746 + r15-5747 changes there might be more zero initialization of the padding bits than in the past (ditto for the optimization Alexandre added for small aggregates with padding bits) and it would be nice if SRA was still able to optimize it assuming it can prove nothing really attempts to read the padding bits etc., basically just add some smart handling of = {}; initialization.
Comment 8 Jakub Jelinek 2024-11-28 18:00:23 UTC
Of course we could also try to get smarter at gimplification time.
We have
{._storage={.D.9582={.D.9163={._tail={.D.9221={._tail={.D.9280={._head={}}}}}}}, ._which=2}}
ctor and the only CONSTRUCTOR_ZERO_PADDING_BITS ctor is that {}, for 1-byte empty structure, while the whole structure has 48 bytes, so we could also gimplify it as
  D.10137._storage..D.9582.D.9163._tail.D.9221._tail.D.9280._head = {};
  D.10137._storage._which = 2;
rather than
  D.10137 = {};
  D.10137._storage._which = 2;
But we'd need to figure out not just that there is padding, but also how much padding we need to clear, how consecutive or non-consecutive it is, etc.
Maybe as a hack just let the code handle one special case, if there is just one sub-field which needs zero padding, only zero initialize that, if there are 2+, zero it all.

Changing
-    return ref_proxy<option_2, option_ref >(option_2());
+    return ref_proxy<option_2, option_ref >(option_2{});
in the testcase also achieves the same effect.
Comment 9 GCC Commits 2024-12-09 17:17:04 UTC
The trunk branch has been updated by Marek Polacek <mpolacek@gcc.gnu.org>:

https://gcc.gnu.org/g:12de1942a0a673f9f2f1c2bfce4279a666061ffc

commit r15-6052-g12de1942a0a673f9f2f1c2bfce4279a666061ffc
Author: Marek Polacek <polacek@redhat.com>
Date:   Thu Aug 29 12:58:41 2024 -0400

    c++: compile time evaluation of prvalues [PR116416]
    
    This PR reports a missed optimization.  When we have:
    
      Str str{"Test"};
      callback(str);
    
    as in the test, we're able to evaluate the Str::Str() call at compile
    time.  But when we have:
    
      callback(Str{"Test"});
    
    we are not.  With this patch (in fact, it's Patrick's patch with a little
    tweak), we turn
    
      callback (TARGET_EXPR <D.2890, <<< Unknown tree: aggr_init_expr
        5
        __ct_comp
        D.2890
        (struct Str *) <<< Unknown tree: void_cst >>>
        (const char *) "Test" >>>>)
    
    into
    
      callback (TARGET_EXPR <D.2890, {.str=(const char *) "Test", .length=4}>)
    
    I explored the idea of calling maybe_constant_value for the whole
    TARGET_EXPR in cp_fold.  That has three problems:
    - we can't always elide a TARGET_EXPR, so we'd have to make sure the
      result is also a TARGET_EXPR;
    - the resulting TARGET_EXPR must have the same flags, otherwise Bad
      Things happen;
    - getting a new slot is also problematic.  I've seen a test where we
      had "TARGET_EXPR<D.2680, ...>, D.2680", and folding the whole TARGET_EXPR
      would get us "TARGET_EXPR<D.2681, ...>", but since we don't see the outer
      D.2680, we can't replace it with D.2681, and things break.
    
    With this patch, two tree-ssa tests regressed: pr78687.C and pr90883.C.
    
    FAIL: g++.dg/tree-ssa/pr90883.C   scan-tree-dump dse1 "Deleted redundant store: .*.a = {}"
    is easy.  Previously, we would call C::C, so .gimple has:
    
      D.2590 = {};
      C::C (&D.2590);
      D.2597 = D.2590;
      return D.2597;
    
    Then .einline inlines the C::C call:
    
      D.2590 = {};
      D.2590.a = {}; // #1
      D.2590.b = 0;  // #2
      D.2597 = D.2590;
      D.2590 ={v} {CLOBBER(eos)};
      return D.2597;
    
    then #2 is removed in .fre1, and #1 is removed in .dse1.  So the test
    passes.  But with the patch, .gimple won't have that C::C call, so the
    IL is of course going to look different.  The .optimized dump looks the
    same though so there's no problem.
    
    pr78687.C is XFAILed because the test passes with r15-5746 but not with
    r15-5747 as well.  I opened <https://gcc.gnu.org/PR117971>.
    
            PR c++/116416
    
    gcc/cp/ChangeLog:
    
            * cp-gimplify.cc (cp_fold_r) <case TARGET_EXPR>: Try to fold
            TARGET_EXPR_INITIAL and replace it with the folded result if
            it's TREE_CONSTANT.
    
    gcc/testsuite/ChangeLog:
    
            * g++.dg/analyzer/pr97116.C: Adjust dg-message.
            * g++.dg/tree-ssa/pr78687.C: Add XFAIL.
            * g++.dg/tree-ssa/pr90883.C: Adjust dg-final.
            * g++.dg/cpp0x/constexpr-prvalue1.C: New test.
            * g++.dg/cpp1y/constexpr-prvalue1.C: New test.
    
    Co-authored-by: Patrick Palka <ppalka@redhat.com>
    Reviewed-by: Jason Merrill <jason@redhat.com>
Comment 10 Marek Polacek 2024-12-09 17:20:24 UTC
Implemented in GCC 15.
Comment 11 Jakub Jelinek 2024-12-11 16:21:26 UTC
Created attachment 59834 [details]
gcc15-pr116416-opt.patch

I've tried to implement a simple optimization (on top of the PR118002 fix), where we clear just a subobject that needs padding or other uninitialized bits
cleared rather than the whole object.
On the pr78687.C testcase, this results (in gimple dump) in:
-      D.10177 = {};
+      D.10177._storage.D.9582.D.9163._tail.D.9221._tail.D.9280._head = {};
because as I wrote, the single byte option_2() is the only part of the object
that actually needs to have padding bits cleared, rather than the rather large whole object.
Though, SRA still punts even on that.
Comment 12 Jakub Jelinek 2024-12-11 16:24:00 UTC
struct S { int a; };
struct T { struct S b; int c; struct S d; };
struct U { struct T e; int f; struct T g; };
void f0 (struct U *);

void
f1 (void)
{
  struct U u = { { {}, 1, {} }, 2, { {}, 3, {} } };
  f0 (&u);
}

void
f2 (void)
{
  struct U u = { { { 1 }, 2, { 3 } }, 4, { { 5 }, 6, {} } };
  f0 (&u);
}

void
f3 (void)
{
  struct U u = { { { }, 2, { } }, 4, { { 5 }, 6, { 7 } } };
  f0 (&u);
}

is a C23 testcase showing what it can do, for f1 there is no difference, for f2
-      u = {};
       u.e.b.a = 1;
       u.e.c = 2;
       u.e.d.a = 3;
       u.f = 4;
       u.g.b.a = 5;
       u.g.c = 6;
+      u.g.d = {};
       f0 (&u);
and for f3:
-      u = {};
+      u.e = {};
       u.e.c = 2;
       u.f = 4;
       u.g.b.a = 5;
...
Comment 13 GCC Commits 2025-04-12 18:07:13 UTC
The master branch has been updated by Patrick Palka <ppalka@gcc.gnu.org>:

https://gcc.gnu.org/g:e7bccec33beece4a46bc1b20ed375e803e97aa88

commit r15-9400-ge7bccec33beece4a46bc1b20ed375e803e97aa88
Author: Patrick Palka <ppalka@redhat.com>
Date:   Sat Apr 12 14:06:56 2025 -0400

    c++: improve constexpr prvalue folding [PR116416]
    
    This patch improves upon r15-6052-g12de1942a0a673 by performing prvalue
    folding with mce_false rather than mce_unknown when it's safe to do so
    (i.e. ff_mce_false is set), so that we can also fold temporary initializers
    that call is_constant_evaluated etc.
    
    In passing I noticed constexpr-prvalue1.C could more precisely verify the
    optimization happened by inspecting what the front end spits out instead
    of inspecting the optimized assembly -- that there's no constructor call
    doesn't necessarily imply the constructor has been completely folded away,
    only that its body has been inlined.
    
            PR c++/116416
    
    gcc/cp/ChangeLog:
    
            * constexpr.cc (maybe_constant_init_1): Generalize type of
            of manifestly_const_eval parameter from bool to mce_value.
            (maybe_constant_init): Define 3-parameter version taking a
            manifestly_const_eval instead of bool parameter.
            (cxx_constant_init): Adjust.
            * cp-gimplify.cc (cp_fold_r) <case TARGET_EXPR>: Pass mce_false
            to maybe_constant_init during prvalue folding if ff_mce_false is
            set.
            * cp-tree.h (maybe_constant_init): Declare new overload.
    
    gcc/testsuite/ChangeLog:
    
            * g++.dg/cpp1y/constexpr-prvalue1.C: Adjust to instead inspect
            the 'original' dump.
            * g++.dg/cpp1y/constexpr-prvalue1a.C: New test.
    
    Reviewed-by: Jason Merrill <jason@redhat.com>