Bug 106922 - [12 Regression] Bogus uninitialized warning on boost::optional<<std::vector<std::string>>>, missed FRE
Summary: [12 Regression] Bogus uninitialized warning on boost::optional<<std::vector<s...
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 12.2.1
: P2 normal
Target Milestone: 12.3
Assignee: Richard Biener
URL:
Keywords: diagnostic, missed-optimization
Depends on:
Blocks: Wuninitialized
  Show dependency treegraph
 
Reported: 2022-09-13 08:18 UTC by Jan Žižka
Modified: 2023-08-18 02:26 UTC (History)
1 user (show)

See Also:
Host:
Target:
Build:
Known to work: 12.2.1, 13.0
Known to fail: 12.2.0
Last reconfirmed: 2022-09-13 00:00:00


Attachments
Reproducer for maybe uninitialized warning with gcc 12 (473 bytes, text/x-csrc)
2022-09-13 08:18 UTC, Jan Žižka
Details
testcase for missed PRE (784 bytes, text/plain)
2022-09-15 08:41 UTC, Richard Biener
Details
testcase for missed PRE on trunk (842 bytes, text/plain)
2022-09-15 11:26 UTC, Richard Biener
Details
CFG of the second testcase (13.41 KB, application/pdf)
2022-09-15 11:56 UTC, Richard Biener
Details
Second reproducer failing with 5edf02ed2b6de024f83a023d046a6a18f645bc83 (507 bytes, text/x-csrc)
2022-09-15 22:13 UTC, Jan Žižka
Details
candidate patch (2.36 KB, patch)
2022-09-21 06:55 UTC, Richard Biener
Details | Diff
candidate patch 2 (2.90 KB, patch)
2022-09-21 12:52 UTC, Richard Biener
Details | Diff
Third reproducer failing with 9baee6181b4e427e0b5ba417e51424c15858dce7 (559 bytes, text/x-csrc)
2022-09-23 11:31 UTC, Jan Žižka
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Jan Žižka 2022-09-13 08:18:53 UTC
Created attachment 53568 [details]
Reproducer for maybe uninitialized warning with gcc 12

In our production code when upgrading to gcc 12 from gcc 11 we have faced bogus warning
"may be used uninitialized" on bit complex use of boost::optional vector of strings used
in a loop with unrelated string stream redirect.

> In file included from <..>/include/c++/12.0.0/vector:67,
>                  from reproduce.cpp:13:
> In destructor ‘std::vector<_Tp, _Alloc>::~vector() [with _Tp = std::__cxx11::basic_string<char>; _Alloc = std::allocator<std::__cxx11::basic_string<char> >]’,
>     inlined from ‘void boost::optional_detail::optional_base<T>::destroy_impl() [with T = std::vector<std::__cxx11::basic_string<char> >]’ at /usr/include/boost/optional/optional.hpp:771:50,
>     inlined from ‘void boost::optional_detail::optional_base<T>::destroy() [with T = std::vector<std::__cxx11::basic_string<char> >]’ at /usr/include/boost/optional/optional.hpp:757:21,
>     inlined from ‘void boost::optional_detail::optional_base<T>::assign(const boost::optional_detail::optional_base<T>&) [with T = std::vector<std::__cxx11::basic_string<char> >]’ at /usr/include/boost/optional/optional.hpp:264:21,
>     inlined from ‘boost::optional_detail::optional_base<T>& boost::optional_detail::optional_base<T>::operator=(const boost::optional_detail::optional_base<T>&) [with T = std::vector<std::__cxx11::basic_string<char> >]’ at /usr/include/boost/optional/optional.hpp:241:19,
>     inlined from ‘boost::optional<T>& boost::optional<T>::operator=(const boost::optional<T>&) [with T = std::vector<std::__cxx11::basic_string<char> >]’ at /usr/include/boost/optional/optional.hpp:1040:15, 
>     inlined from ‘void test()’ at reproduce.cpp:55:13:
> <..>/include/c++/12.0.0/bits/stl_vector.h:680:22: error: ‘*(std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >*)((char*)&external + offsetof(boost::Optional<std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > >,boost::optional<std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > >::<unnamed>.boost::optional_detail::optional_base<std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > >::m_storage.boost::optional_detail::aligned_storage<std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > >::dummy_)).std::vector<std::__cxx11::basic_string<char> >::<anonymous>.std::_Vector_base<std::__cxx11::basic_string<char>, std::allocator<std::__cxx11::basic_string<char> > >::_M_impl.std::_Vector_base<std::__cxx11::basic_string<char>, std::allocator<std::__cxx11::basic_string<char> > >::_Vector_impl::<anonymous>.std::_Vector_base<std::__cxx11::basic_string<char>, std::allocator<std::__cxx11::basic_string<char> > >::_Vector_impl_data::_M_finish’ may be used uninitialized [-Werror=maybe-uninitialized]
>   680 |         std::_Destroy(this->_M_impl._M_start, this->_M_impl._M_finish,
>       |         ~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>   681 |                       _M_get_Tp_allocator());
>       |                       ~~~~~~~~~~~~~~~~~~~~~~
> reproduce.cpp: In function ‘void test()’:
> reproduce.cpp:45:40: note: ‘external’ declared here
>    45 |     Optional<std::vector<std::string>> external;
>       |                                        ^~~~~~~~

I have tested this with gcc 12 (12.1.1, 12.2.1 and release/gcc-12 commit
https://gcc.gnu.org/git/?p=gcc.git&a=commit;h=4ce316ca54c863cf0fd4257ba0ab768ab83c62e5") 
with boost 1.69.0, 1.75.0, 1.76.0 and 1.80.0. All fails.

With any boost version and gcc 11 (11.3.1 and release/gcc-11 tip commit
https://gcc.gnu.org/git/?p=gcc.git&a=commit;h=7e356c3083c79473c941bc92d61f755e923bc86c)
the warning doesn't appear.

To reproduce:

> $ g++ -Werror -std=c++20 -O2 -Wall -o reproduce.o -c reproduce.cpp

I was not able to bump the reproducer to simpliear case. In all modification where loop
is removed the warning disappears.

The problem doesn't occur when std::optional is used.

Only workaround I found is to add following pragma at top of the module:

> #pragma GCC diagnostic push
> #pragma GCC diagnostic ignored "-Wmaybe-uninitialized"
> #include <vector>
> #pragma GCC diagnostic pop

I have bisected the problem to start with 
https://gcc.gnu.org/git/?p=gcc.git&a=commit;h=5b8b1522e04adc20980f396571be1929a32d148a

commit 5b8b1522e04adc20980f396571be1929a32d148a (HEAD, refs/bisect/bad)
Author: Richard Biener <rguenther@suse.de>
Date:   Mon Sep 27 12:01:38 2021 +0200

    tree-optimization/100112 - VN last_vuse and redundant store elimination
    
    This avoids the last_vuse optimization hindering redundant store
    elimination by always also recording the original VUSE that was
    in effect on the load.
    
    In stage3 gcc/*.o we have 3182752 times recorded a single
    entry and 903409 times two entries (that's ~20% overhead).
    
    With just recording a single entry the number of hashtable lookups
    done when walking the vuse->vdef links to find an earlier access
    is 28961618.  When recording the second entry this makes us find
    that earlier for donwnstream redundant accesses, reducing the number
    of hashtable lookups to 25401052 (that's a ~10% reduction).
    
    2021-09-27  Richard Biener  <rguenther@suse.de>
    
            PR tree-optimization/100112
            * tree-ssa-sccvn.c (visit_reference_op_load): Record the
            referece into the hashtable twice in case last_vuse is
            different from the original vuse on the stmt.
    
            * gcc.dg/tree-ssa/ssa-fre-95.c: New testcase.
Comment 1 Richard Biener 2022-09-13 09:13:50 UTC
Confirmed also on trunk and also with -O1 and the usual suspects -fno-ivopts -fno-thread-jumps.

The question is whether

        external = internal;

is problematic.  For example

struct MyStruct { };

std::vector<MyStruct> getMyStructs();

void test()
{
    Optional<std::vector<std::string>> external;
    Optional<std::vector<std::string>> internal;
    external = internal;
}

is diagnosed with -std=c++20 -O -Wuninitialized -fno-tree-fre -fno-tree-dominator-opts

that's because we then no longer forward the uninit state of the Optionals
and thus diagnose the conditional (on initialized)

  <bb 6> [local count: 118702158]:
  _34 = MEM[(struct vector *)&external + 8B].D.88471._M_impl.D.87778._M_finish;
  _35 = MEM[(struct vector *)&external + 8B].D.88471._M_impl.D.87778._M_start;
  if (_34 != _35)
    goto <bb 41>; [89.00%]
...

the same happens with the unreduced testcase, there we have similar code
conditional on external.m_initialized where we are not able to forward
the m_initialized state.

It might be the bisected rev. causes a missed optimization here, I will have
to check.  At least I don't see a good reason why we don't forward it here.
Comment 2 Richard Biener 2022-09-15 07:55:37 UTC
The rev. in question shows changes in what PRE does.  Before the change disabling PRE also leads to the diagnostic so there's a missed optimization there.  I see less fully redundant values discovered.

I'm trying to reduce a testcase.
Comment 3 Richard Biener 2022-09-15 08:41:55 UTC
Created attachment 53576 [details]
testcase for missed PRE

here's a reduction that works around r12-3918-g5b8b1522e04adc but not on the
tip of the GCC 12 branch or trunk.  I'm going to reduce another for current trunk.
Comment 4 Richard Biener 2022-09-15 11:26:28 UTC
Created attachment 53577 [details]
testcase for missed PRE on trunk

Here's one.

So the issue is that PRE EXP_GEN now contains

exp_gen[10] := { {component_ref<m_initialized>,mem_ref<0B>,addr_expr<&external>}@.MEM_31 (0009) }

for the block

<bb 10> [local count: 97603128]:
# .MEM_31 = VDEF <.MEM_55>
internal ={v} {CLOBBER};
# VUSE <.MEM_32>
_10 = MEM[(struct optional_base *)&external].m_initialized;
if (_10 != 0)

instead of the one with last_vuse .MEM_45.  That in turn causes prune_clobbered_mems to prune the expression from ANTIC_OUT of BB 9 which
has a 9->10 edge removing the incentive to perform FRE on this value inside
the loop.  prune_clobbered_mems does this because

          if (ref->vuse)
            {
              gimple *def_stmt = SSA_NAME_DEF_STMT (ref->vuse);
              if (!gimple_nop_p (def_stmt)
                  && ((gimple_bb (def_stmt) != block
                       && !dominated_by_p (CDI_DOMINATORS,
                                           block, gimple_bb (def_stmt)))
                      || (gimple_bb (def_stmt) == block
                          && value_dies_in_block_x (expr, block))))
                to_remove = i;

where block == BB9 so the dominance test fails.  The dominance test should
be redundant (unless we have translated something over a backedge!?) but
it really shows the issue that we fail to record the memory expression
with the correct VUSE into EXP_GEN and fail to adjust the VUSE when
translating from ANTIC_OUT to ANTIC_IN as well.  When we'd do all this
"correctly", then we'd get a) a ton more hashtable entries with questionable
value, b) the dominance test will never fail.

By that reasoning a cheaper fix should be to instead perform the
value_dies_in_block_x when a dominance check cannot elide it.

There's a possible latent issue with virtual operands PHI translation
over backedges then though.  The safest thing would be to eventually
force a set of the virtual operand when any virtual PHI is on the
edge we are translating over - we are actually doing this but the
no-PHI-nodes optimization in phi_translate_set defeats the intent of
adjusting to BB_LIVE_VOP_ON_EXIT that translate_vuse_through_block
performs and its simple check of BB equality means we don't fixup later
either.

Fixing that spot in the simplistic way as well (rather than consistently
handling the virtual operand everywhere) fixes the observed bogus
diagnostic.
Comment 5 Richard Biener 2022-09-15 11:56:29 UTC
Created attachment 53578 [details]
CFG of the second testcase

With -m32 one can see the full redundancy discovery is a bit fragile since it requires an out-of-loop use to make the cross loop redundancy anticipated.
See the CFG graph for illustration.  It should be possible to create a
smaller testcase from this, when the use of external.m_initialized is
removed then the PRE does no longer work.  That somehow happens with -m32
but not -m64 ...
Comment 6 Richard Biener 2022-09-15 11:57:35 UTC
Anyhow - I'm testing a (half-way) fix for the issue in PRE, the testcase proved quite useful.

Thanks a lot for opening this bugreport!
Comment 7 Jan Žižka 2022-09-15 12:12:06 UTC
> Thanks a lot for opening this bugreport!

You are welcome! :-) Unfortunately I'm not familiar with gcc internals that much in order to comment on the analysis :-(. But once there will be patch for this issue I can quickly try that on our production code.
Comment 8 GCC Commits 2022-09-15 12:36:07 UTC
The master branch has been updated by Richard Biener <rguenth@gcc.gnu.org>:

https://gcc.gnu.org/g:5edf02ed2b6de024f83a023d046a6a18f645bc83

commit r13-2683-g5edf02ed2b6de024f83a023d046a6a18f645bc83
Author: Richard Biener <rguenther@suse.de>
Date:   Thu Sep 15 13:33:23 2022 +0200

    tree-optimization/106922 - PRE and virtual operand translation
    
    PRE implicitely keeps virtual operands at the blocks incoming version
    but the explicit updating point during PHI translation fails to trigger
    when there are no PHIs at all in a block.  Later lazy updating then
    fails because of a too lose block check.  A similar issues plagues
    reference invalidation when checking the ANTIC_OUT to ANTIC_IN
    translation.  The following fixes both and makes the lazy updating
    work.
    
    The diagnostic testcase unfortunately requires boost so the
    testcase is the one I reduced for a missed optimization in PRE.
    The testcase fails with -m32 on x86_64 because we optimize too
    much before PRE which causes PRE to not trigger so we fail to
    eliminate a full redundancy.  I'm going to open a separate bug
    for this.  Hopefully the !lp64 selector is good enough.
    
            PR tree-optimization/106922
            * tree-ssa-pre.cc (translate_vuse_through_block): Only
            keep the VUSE if its def dominates PHIBLOCK.
            (prune_clobbered_mems): Rewrite logic so we check whether
            a value dies in a block when the VUSE def doesn't dominate it.
    
            * g++.dg/tree-ssa/pr106922.C: New testcase.
Comment 9 Richard Biener 2022-09-15 12:38:55 UTC
I verified the fix cherry-picks to GCC 12 and fixes the testcase there as well
(on x86_64-linux, that is).
Comment 10 Jan Žižka 2022-09-15 22:13:27 UTC
Created attachment 53581 [details]
Second reproducer failing with 5edf02ed2b6de024f83a023d046a6a18f645bc83

I have cherry-picked the fix on top of gcc 12 and I confirm that my original reproducer passes. But our production code still fails. I have created new reproducer, the change is that the optional is inside a struct and seems there need to be two elements. With first reproducer I have eliminated this additional struct as that was also failing.

Please try with:

g++ -Werror -std=c++20 -O2 -Wall -o reproduce1.o -c reproduce1.cpp
Comment 11 Richard Biener 2022-09-19 14:16:21 UTC
So there's a similar missed optimization but it's not caused by the bisected revision.  The situation is like

float bar, baz;
void foo (int *p, int n)
{
  *p = 0;
  do
    {
      bar = 1.;
      if (*p)
        {
          baz = 2.;
          *p = 0;
        }
    }
  while (--n);
}

where we fail to realize that in the first if (*p), *p stays zero.  I have
a patch that fixes the above but that miscompiles genrecog.cc merge_into_decision (but no testcase in the testsuite ...).
Comment 12 Jan Žižka 2022-09-19 15:03:34 UTC
(In reply to Richard Biener from comment #11)
> So there's a similar missed optimization but it's not caused by the bisected
> revision.  

Ah I see. I didn't try to bisect this again. I can do that if that would help anything, let me know if I should try. I have bisect script so I can run it overnight ...

Also is it better then to open new BZ for this case then?

> I have
> a patch that fixes the above but that miscompiles genrecog.cc
> merge_into_decision (but no testcase in the testsuite ...).

I could at least try to test the patch with our production code. Unfortunately can't help with genrecog.cc ..
Comment 13 Richard Biener 2022-09-21 06:55:19 UTC
Created attachment 53597 [details]
candidate patch

For reference this is the patch I was talking about.  I'm sure I've made a mistake in reasoning but I've not yet nailed it ;)

Let's leave the issue in this PR.

If you have spare cycles to do another bisection that might help as well.  If your real code doesn't use LTO then bisecting preprocessed source from the affected TU might be also interesting (but I suppose the issue might affect multiple TUs?)
Comment 14 Richard Biener 2022-09-21 12:52:28 UTC
Created attachment 53599 [details]
candidate patch 2

OK, so I think I know what's going wrong eventually.  The following implements
the trick differently.  We might eventually be able to use dominance of the
virtual operands to check for the cross-iteration case as further improvement.
Comment 15 GCC Commits 2022-09-22 11:10:31 UTC
The master branch has been updated by Richard Biener <rguenth@gcc.gnu.org>:

https://gcc.gnu.org/g:9baee6181b4e427e0b5ba417e51424c15858dce7

commit r13-2772-g9baee6181b4e427e0b5ba417e51424c15858dce7
Author: Richard Biener <rguenther@suse.de>
Date:   Wed Sep 21 13:52:56 2022 +0200

    tree-optimization/106922 - missed FRE/PRE
    
    The following enhances the store-with-same-value trick in
    vn_reference_lookup_3 by not only looking for
    
      a = val;
      *ptr = val;
      .. = a;
    
    but also
    
      *ptr = val;
      other = x;
      .. = a;
    
    where the earlier store is more than one hop away.  It does this
    by queueing the actual value to compare until after the walk but
    as disadvantage only allows a single such skipped store from a
    constant value.
    
    Unfortunately we cannot handle defs from non-constants this way
    since we're prone to pick up values from the past loop iteration
    this way and we have no good way to identify values that are
    invariant in the currently iterated cycle.  That's why we keep
    the single-hop lookup for those cases.  gcc.dg/tree-ssa/pr87126.c
    would be a testcase that's un-XFAILed when we'd handle those
    as well.
    
            PR tree-optimization/106922
            * tree-ssa-sccvn.cc (vn_walk_cb_data::same_val): New member.
            (vn_walk_cb_data::finish): Perform delayed verification of
            a skipped may-alias.
            (vn_reference_lookup_pieces): Likewise.
            (vn_reference_lookup): Likewise.
            (vn_reference_lookup_3): When skipping stores of the same
            value also handle constant stores that are more than a
            single VDEF away by delaying the verification.
    
            * gcc.dg/tree-ssa/ssa-fre-100.c: New testcase.
            * g++.dg/tree-ssa/pr106922.C: Adjust.
Comment 16 Richard Biener 2022-09-22 11:11:37 UTC
This addressed the other missed optimization (which isn't a regression), it should make optimizing the m_initialized flag status more consistent.
Comment 17 GCC Commits 2022-09-23 07:48:02 UTC
The master branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>:

https://gcc.gnu.org/g:a0de11d0d22054b6fd76a0730a3ec807542379d0

commit r13-2806-ga0de11d0d22054b6fd76a0730a3ec807542379d0
Author: Jakub Jelinek <jakub@redhat.com>
Date:   Fri Sep 23 09:46:59 2022 +0200

    testsuite: Fix up pr106922.C test
    
    On Thu, Sep 22, 2022 at 01:10:08PM +0200, Richard Biener via Gcc-patches wrote:
    >       * g++.dg/tree-ssa/pr106922.C: Adjust.
    
    > --- a/gcc/testsuite/g++.dg/tree-ssa/pr106922.C
    > +++ b/gcc/testsuite/g++.dg/tree-ssa/pr106922.C
    > @@ -87,5 +87,4 @@ void testfunctionfoo() {
    >    }
    >  }
    >
    > -// { dg-final { scan-tree-dump-times "Found fully redundant value" 4 "pre" { xfail { ! lp64 } } } }
    > -// { dg-final { scan-tree-dump-not "m_initialized" "cddce3" { xfail { ! lp64 } } } }
    > +// { dg-final { scan-tree-dump-not "m_initialized" "dce3" } }
    
    I've noticed
    +UNRESOLVED: g++.dg/tree-ssa/pr106922.C  -std=gnu++20  scan-tree-dump-not dce3 "m_initialized"
    +UNRESOLVED: g++.dg/tree-ssa/pr106922.C  -std=gnu++2b  scan-tree-dump-not dce3 "m_initialized"
    with this change, both on x86_64 and i686.
    The dump is still cddce3, additionally as the last reference to the pre
    dump is gone, not sure it is worth creating that dump.
    
    With the following patch, there aren't FAILs nor UNRESOLVED tests with
    GXX_TESTSUITE_STDS=98,11,14,17,20,2b make check-g++ RUNTESTFLAGS="--target_board=unix\{-m32,-m64\} dg.exp='pr106922.C'"
    
    2022-09-23  Jakub Jelinek  <jakub@redhat.com>
    
            PR tree-optimization/106922
            * g++.dg/tree-ssa/pr106922.C: Scan in cddce3 dump rather than
            dce3.  Remove -fdump-tree-pre-details from dg-options.
Comment 18 Jan Žižka 2022-09-23 11:31:49 UTC
Created attachment 53617 [details]
Third reproducer failing with 9baee6181b4e427e0b5ba417e51424c15858dce7

I did cherry-pick 9baee6181b4e427e0b5ba417e51424c15858dce7 on top of 5edf02ed2b6de024f83a023d046a6a18f645bc83 and gcc 12 and I confirm that the second reproducer works with this. But unfortunately our production code still fails.

Sorry that I cannot provide better more reliable reproducer, but our production code is complex and I try to bump it to smallest snippet which still fails. I can try to refactor our production code somehow to remove all proprietary stuff but that might take some time.

I have created new reproducer, this time it fails both with boost::optional and with std::optional. In the reproducer I have commented lines which will cause the warning to disappear when removed. There is additional int in the structure which needs to be there and also the assignment needs to be repeated in the loop.
Comment 19 Richard Biener 2022-09-23 11:59:23 UTC
(In reply to Jan Žižka from comment #18)
> Created attachment 53617 [details]
> Third reproducer failing with 9baee6181b4e427e0b5ba417e51424c15858dce7
> 
> I did cherry-pick 9baee6181b4e427e0b5ba417e51424c15858dce7 on top of
> 5edf02ed2b6de024f83a023d046a6a18f645bc83 and gcc 12 and I confirm that the
> second reproducer works with this. But unfortunately our production code
> still fails.
> 
> Sorry that I cannot provide better more reliable reproducer, but our
> production code is complex and I try to bump it to smallest snippet which
> still fails. I can try to refactor our production code somehow to remove all
> proprietary stuff but that might take some time.
> 
> I have created new reproducer, this time it fails both with boost::optional
> and with std::optional. In the reproducer I have commented lines which will
> cause the warning to disappear when removed. There is additional int in the
> structure which needs to be there and also the assignment needs to be
> repeated in the loop.

Ah, thanks - a testcase that breaks with std::optional is good to have.

Aha, the new trick fails because

Value numbering stmt = _271 = MEM[(struct _Optional_payload_base *)&externals + 48B]._M_engaged;
Skipping possible redundant definition MEM[(struct _Optional_payload_base *)&externals + 48B]._M_engaged = 0;
Cannot skip second possible redundant definition MEM[(struct _Optional_payload_base *)&externals + 48B]._M_engaged = 0;
Setting value number of _271 to _271 (changed)

of course a simple change can handle this - stupid me of not thinking of that!
Comment 20 GCC Commits 2022-09-23 13:11:17 UTC
The master branch has been updated by Richard Biener <rguenth@gcc.gnu.org>:

https://gcc.gnu.org/g:af611afe5fcc908a6678b5b205fb5af7d64fbcb2

commit r13-2817-gaf611afe5fcc908a6678b5b205fb5af7d64fbcb2
Author: Richard Biener <rguenther@suse.de>
Date:   Fri Sep 23 14:28:52 2022 +0200

    tree-optimization/106922 - extend same-val clobber FRE
    
    The following extends the skipping of same valued stores to
    handle an arbitrary number of them as long as they are from the
    same value (which we now record).  That's an obvious extension
    which allows to optimize the m_engaged member of std::optional
    more reliably.
    
            PR tree-optimization/106922
            * tree-ssa-sccvn.cc (vn_reference_lookup_3): Allow
            an arbitrary number of same valued skipped stores.
    
            * g++.dg/torture/pr106922.C: New testcase.
Comment 21 Richard Biener 2022-09-23 13:14:19 UTC
try again!
Comment 22 Jan Žižka 2022-09-23 14:22:20 UTC
Great, our production code builds just fine with af611afe5fcc908a6678b5b205fb5af7d64fbcb2 :-) thanks a lot.
Comment 23 GCC Commits 2022-10-11 12:06:55 UTC
The releases/gcc-12 branch has been updated by Richard Biener <rguenth@gcc.gnu.org>:

https://gcc.gnu.org/g:e364e27b6636ba09755790358910f199d07194b3

commit r12-8820-ge364e27b6636ba09755790358910f199d07194b3
Author: Richard Biener <rguenther@suse.de>
Date:   Thu Sep 15 13:33:23 2022 +0200

    tree-optimization/106922 - PRE and virtual operand translation
    
    PRE implicitely keeps virtual operands at the blocks incoming version
    but the explicit updating point during PHI translation fails to trigger
    when there are no PHIs at all in a block.  Later lazy updating then
    fails because of a too lose block check.  A similar issues plagues
    reference invalidation when checking the ANTIC_OUT to ANTIC_IN
    translation.  The following fixes both and makes the lazy updating
    work.
    
    The diagnostic testcase unfortunately requires boost so the
    testcase is the one I reduced for a missed optimization in PRE.
    The testcase fails with -m32 on x86_64 because we optimize too
    much before PRE which causes PRE to not trigger so we fail to
    eliminate a full redundancy.  I'm going to open a separate bug
    for this.  Hopefully the !lp64 selector is good enough.
    
            PR tree-optimization/106922
            * tree-ssa-pre.cc (translate_vuse_through_block): Only
            keep the VUSE if its def dominates PHIBLOCK.
            (prune_clobbered_mems): Rewrite logic so we check whether
            a value dies in a block when the VUSE def doesn't dominate it.
    
            * g++.dg/tree-ssa/pr106922.C: New testcase.
    
    (cherry picked from commit 5edf02ed2b6de024f83a023d046a6a18f645bc83)
Comment 24 Richard Biener 2022-10-11 12:08:06 UTC
Note I'm still pondering whether to backport the VN enhancement, for now I've backported the VN/PRE optimization regression fix.
Comment 25 Jan Žižka 2022-10-11 13:13:34 UTC
I have backported all three patches but true that I didn't try to test without VN enhancement. Would it help if I'd try that with our production code and the reproducers? Or anything else I could try so that you'd know if the VM enhancement should be backported also?
Comment 26 rguenther@suse.de 2022-10-13 07:44:08 UTC
On Tue, 11 Oct 2022, jan.zizka at nokia dot com wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106922
> 
> --- Comment #25 from Jan ?i?ka <jan.zizka at nokia dot com> ---
> I have backported all three patches but true that I didn't try to test without
> VN enhancement. Would it help if I'd try that with our production code and the
> reproducers? Or anything else I could try so that you'd know if the VM
> enhancement should be backported also?

They are clearly necessary to fix this bug.  What I'm unsure yet about
is the risk of generally enhancing VN for this diagnostic regression.
The enhancement itself is quite small and self-contained which is
why I'm still considering it ;)
Comment 27 Jan Žižka 2022-10-13 10:50:39 UTC
(In reply to rguenther@suse.de from comment #26)
> 
> They are clearly necessary to fix this bug.  What I'm unsure yet about
> is the risk of generally enhancing VN for this diagnostic regression.
> The enhancement itself is quite small and self-contained which is
> why I'm still considering it ;)

Understood. And I didn't report this, maybe I should have, with these
fixes we had another uninitialized value warning ... it was actually bug
so I did fix it ... but the interesting was that same/similar bug was
couple lines below and this one the compiler didn't complain about.
I'm not sure if all that could be related to your considerations but
if it would help I could create a reproducer for this particular case.
I guess it is false negative :-) of some sorts.
Comment 28 GCC Commits 2022-10-17 13:10:29 UTC
The releases/gcc-12 branch has been updated by Richard Biener <rguenth@gcc.gnu.org>:

https://gcc.gnu.org/g:e8d5f3a1b5a5839cb1db57d6f6766469cc4f8747

commit r12-8835-ge8d5f3a1b5a5839cb1db57d6f6766469cc4f8747
Author: Richard Biener <rguenther@suse.de>
Date:   Wed Sep 21 13:52:56 2022 +0200

    tree-optimization/106922 - missed FRE/PRE
    
    The following enhances the store-with-same-value trick in
    vn_reference_lookup_3 by not only looking for
    
      a = val;
      *ptr = val;
      .. = a;
    
    but also
    
      *ptr = val;
      other = x;
      .. = a;
    
    where the earlier store is more than one hop away.  It does this
    by queueing the actual value to compare until after the walk but
    as disadvantage only allows a single such skipped store from a
    constant value.
    
    Unfortunately we cannot handle defs from non-constants this way
    since we're prone to pick up values from the past loop iteration
    this way and we have no good way to identify values that are
    invariant in the currently iterated cycle.  That's why we keep
    the single-hop lookup for those cases.  gcc.dg/tree-ssa/pr87126.c
    would be a testcase that's un-XFAILed when we'd handle those
    as well.
    
            PR tree-optimization/106922
            * tree-ssa-sccvn.cc (vn_walk_cb_data::same_val): New member.
            (vn_walk_cb_data::finish): Perform delayed verification of
            a skipped may-alias.
            (vn_reference_lookup_pieces): Likewise.
            (vn_reference_lookup): Likewise.
            (vn_reference_lookup_3): When skipping stores of the same
            value also handle constant stores that are more than a
            single VDEF away by delaying the verification.
    
            * gcc.dg/tree-ssa/ssa-fre-100.c: New testcase.
            * g++.dg/tree-ssa/pr106922.C: Adjust.
    
    (cherry picked from commit 9baee6181b4e427e0b5ba417e51424c15858dce7)
Comment 29 GCC Commits 2022-10-17 13:10:35 UTC
The releases/gcc-12 branch has been updated by Richard Biener <rguenth@gcc.gnu.org>:

https://gcc.gnu.org/g:641369e29f57c508e6316d5d221c1a92900163f9

commit r12-8836-g641369e29f57c508e6316d5d221c1a92900163f9
Author: Jakub Jelinek <jakub@redhat.com>
Date:   Fri Sep 23 09:46:59 2022 +0200

    testsuite: Fix up pr106922.C test
    
    On Thu, Sep 22, 2022 at 01:10:08PM +0200, Richard Biener via Gcc-patches wrote:
    >       * g++.dg/tree-ssa/pr106922.C: Adjust.
    
    > --- a/gcc/testsuite/g++.dg/tree-ssa/pr106922.C
    > +++ b/gcc/testsuite/g++.dg/tree-ssa/pr106922.C
    > @@ -87,5 +87,4 @@ void testfunctionfoo() {
    >    }
    >  }
    >
    > -// { dg-final { scan-tree-dump-times "Found fully redundant value" 4 "pre" { xfail { ! lp64 } } } }
    > -// { dg-final { scan-tree-dump-not "m_initialized" "cddce3" { xfail { ! lp64 } } } }
    > +// { dg-final { scan-tree-dump-not "m_initialized" "dce3" } }
    
    I've noticed
    +UNRESOLVED: g++.dg/tree-ssa/pr106922.C  -std=gnu++20  scan-tree-dump-not dce3 "m_initialized"
    +UNRESOLVED: g++.dg/tree-ssa/pr106922.C  -std=gnu++2b  scan-tree-dump-not dce3 "m_initialized"
    with this change, both on x86_64 and i686.
    The dump is still cddce3, additionally as the last reference to the pre
    dump is gone, not sure it is worth creating that dump.
    
    With the following patch, there aren't FAILs nor UNRESOLVED tests with
    GXX_TESTSUITE_STDS=98,11,14,17,20,2b make check-g++ RUNTESTFLAGS="--target_board=unix\{-m32,-m64\} dg.exp='pr106922.C'"
    
    2022-09-23  Jakub Jelinek  <jakub@redhat.com>
    
            PR tree-optimization/106922
            * g++.dg/tree-ssa/pr106922.C: Scan in cddce3 dump rather than
            dce3.  Remove -fdump-tree-pre-details from dg-options.
    
    (cherry picked from commit a0de11d0d22054b6fd76a0730a3ec807542379d0)
Comment 30 GCC Commits 2022-10-17 13:10:40 UTC
The releases/gcc-12 branch has been updated by Richard Biener <rguenth@gcc.gnu.org>:

https://gcc.gnu.org/g:b9f58edfc2ccb0fb3840751a2fb4268ce5dd9b3d

commit r12-8837-gb9f58edfc2ccb0fb3840751a2fb4268ce5dd9b3d
Author: Richard Biener <rguenther@suse.de>
Date:   Fri Sep 23 14:28:52 2022 +0200

    tree-optimization/106922 - extend same-val clobber FRE
    
    The following extends the skipping of same valued stores to
    handle an arbitrary number of them as long as they are from the
    same value (which we now record).  That's an obvious extension
    which allows to optimize the m_engaged member of std::optional
    more reliably.
    
            PR tree-optimization/106922
            * tree-ssa-sccvn.cc (vn_reference_lookup_3): Allow
            an arbitrary number of same valued skipped stores.
    
            * g++.dg/torture/pr106922.C: New testcase.
    
    (cherry picked from commit af611afe5fcc908a6678b5b205fb5af7d64fbcb2)
Comment 31 Richard Biener 2022-10-17 13:14:30 UTC
Fixed.