Bug 105577 - [12 Regression] ICE in delete_unmarked_insns, at dce.cc:653 since r12-248-gb58dc0b803057c0e
Summary: [12 Regression] ICE in delete_unmarked_insns, at dce.cc:653 since r12-248-gb5...
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: rtl-optimization (show other bugs)
Version: 12.1.0
: P2 normal
Target Milestone: 12.2
Assignee: Richard Biener
URL:
Keywords: EH, ice-on-valid-code
Depends on:
Blocks:
 
Reported: 2022-05-12 08:52 UTC by Curdeius Curdeius
Modified: 2022-06-01 15:18 UTC (History)
2 users (show)

See Also:
Host:
Target: X86_64
Build:
Known to work: 12.1.1, 13.0
Known to fail: 12.1.0
Last reconfirmed: 2022-05-12 00:00:00


Attachments
Preprocessed source of the full reproducer (494.36 KB, application/x-gzip)
2022-05-12 08:53 UTC, Curdeius Curdeius
Details
Preprocessed source of the minimal reproducer (467.52 KB, application/x-gzip)
2022-05-12 08:54 UTC, Curdeius Curdeius
Details
A slightly reduced case (468 bytes, text/plain)
2022-05-12 09:47 UTC, Curdeius Curdeius
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Curdeius Curdeius 2022-05-12 08:52:13 UTC
Similar to bug https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90082.

GCC 12.1.0 ICEs when compiling this code with:
```
g++ -DROCKSDB_PLATFORM_POSIX -isystem rocksdb-cloud -isystem rocksdb-cloud/include -O3 -march=haswell -fnon-call-exceptions -c rocksdb-cloud/db/db_impl/db_impl_compaction_flush.cc
```
All the three flags are important, as the ICE doesn't happen with -O2, nor without -march, nor with -march=skylake, but it does happen with microarchs older than haswell.
ICE doesn't happen without -fnon-call-exceptions either.

Version:
```
$ g++ --version
g++ (GCC) 12.1.0
Copyright (C) 2022 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
```

The minimized repro (couldn't do better for the moment, preprocessed source attached of both minimal repro and the full file attached):
```
#include "db/db_impl/db_impl.h"

namespace ROCKSDB_NAMESPACE {

void DBImpl::InstallSuperVersionAndScheduleWork(
    ColumnFamilyData* cfd, SuperVersionContext* sv_context,
    const MutableCFOptions& mutable_cf_options) {
  if (UNLIKELY(sv_context->new_superversion == nullptr)) {
    sv_context->NewSuperVersion();
  }

  bottommost_files_mark_threshold_ = kMaxSequenceNumber;
  for (auto* my_cfd : *versions_->GetColumnFamilySet()) {
    bottommost_files_mark_threshold_ = std::min(
        bottommost_files_mark_threshold_,
        my_cfd->current()->storage_info()->bottommost_files_mark_threshold());
  }
}

}  // namespace ROCKSDB_NAMESPACE
```
Comment 1 Curdeius Curdeius 2022-05-12 08:53:33 UTC
Created attachment 52965 [details]
Preprocessed source of the full reproducer
Comment 2 Curdeius Curdeius 2022-05-12 08:54:23 UTC
Created attachment 52966 [details]
Preprocessed source of the minimal reproducer
Comment 3 Martin Liška 2022-05-12 09:04:32 UTC
Confirmed, reducing right now..
Comment 4 Curdeius Curdeius 2022-05-12 09:47:41 UTC
Created attachment 52967 [details]
A slightly reduced case

A bit more reduced reproducer.
Not sure it helps.
Comment 5 Martin Liška 2022-05-12 10:18:32 UTC
(In reply to Curdeius Curdeius from comment #4)
> Created attachment 52967 [details]
> A slightly reduced case
> 
> A bit more reduced reproducer.
> Not sure it helps.

No, we would need a pre-processed source file reproducer.
Comment 6 Martin Liška 2022-05-12 10:19:12 UTC
With -fdelete-dead-exceptions, it started with r12-248-gb58dc0b803057c0e.
The reduction is pretty slow..
Comment 7 Andrew Pinski 2022-05-12 10:25:43 UTC
(In reply to Martin Liška from comment #6)
> With -fdelete-dead-exceptions, it started with r12-248-gb58dc0b803057c0e.
> The reduction is pretty slow..

That just exposed the issue I think since the failure is at the rtl level while that change effects things way before in gimple.
Comment 8 Richard Biener 2022-05-12 11:53:29 UTC
(In reply to Andrew Pinski from comment #7)
> (In reply to Martin Liška from comment #6)
> > With -fdelete-dead-exceptions, it started with r12-248-gb58dc0b803057c0e.
> > The reduction is pretty slow..
> 
> That just exposed the issue I think since the failure is at the rtl level
> while that change effects things way before in gimple.

So the insn removed that triggers the must_clean is

(insn/v 27 23 30 3 (set (reg:V2DI 107)
        (const_vector:V2DI [
                (const_int 0 [0]) repeated x2
            ])) "/usr/local/include/c++/12.1.0/bits/shared_ptr_base.h":1463:9 1700 {movv2di_internal}
     (nil))

we first remove that and then call purge_dead_edges which then runs into
the newly(!) last insn:

(call_insn 23 22 30 3 (set (reg:DI 0 ax)
        (call (mem:QI (symbol_ref:DI ("memset") [flags 0x41] <function_decl 0x7ffff65ebc00 __builtin_memset>) [0 __builtin_memset S1 A8])
            (const_int 0 [0]))) "../../thirdparty/rocksdb-cloud/db/job_context.h":49:29 909 {*call_value}
     (expr_list:REG_DEAD (reg:DI 5 di)
        (expr_list:REG_DEAD (reg:SI 4 si)
            (expr_list:REG_DEAD (reg:DI 1 dx)
                (expr_list:REG_UNUSED (reg:DI 0 ax)
                    (expr_list:REG_CALL_DECL (symbol_ref:DI ("memset") [flags 0x41] <function_decl 0x7ffff65ebc00 __builtin_memset>)
                        (expr_list:REG_EH_REGION (const_int 0 [0])
                            (nil)))))))
    (expr_list:DI (set (reg:DI 0 ax)
            (reg:DI 5 di))
        (expr_list:DI (use (reg:DI 5 di))
            (expr_list:SI (use (reg:SI 4 si))
                (expr_list:DI (use (reg:DI 1 dx))
                    (nil))))))

which cannot throw.  But we still have an EH edge out of this block which
is the real issue here.  Somebody forgot to clean the EH edge earlier.
In fact before DSE we have

(insn 27 23 28 3 (set (reg:V2DI 107)
        (const_vector:V2DI [
                (const_int 0 [0]) repeated x2
            ])) "/usr/local/include/c++/12.1.0/bits/shared_ptr_base.h":1463:9 1700 {movv2di_internal}
     (nil))
(insn 28 27 29 3 (set (mem:V2DI (plus:DI (reg/f:DI 94 [ _34 ])
                (const_int 96 [0x60])) [0 MEM <vector(2) long unsigned int> [(void *)_34 + 96B]+0 S16 A64])
        (reg:V2DI 107)) "/usr/local/include/c++/12.1.0/bits/shared_ptr_base.h":1463:9 1700 {movv2di_internal}
     (expr_list:REG_DEAD (reg:V2DI 107)
        (expr_list:REG_EH_REGION (const_int -15 [0xfffffffffffffff1])
            (nil))))
(insn 29 28 30 3 (set (mem:QI (plus:DI (reg/f:DI 94 [ _34 ])
                (const_int 112 [0x70])) [26 MEM[(struct MutableCFOptions *)_34 + 32B].disable_auto_compactions+0 S1 A64])
        (const_int 0 [0])) "../../thirdparty/rocksdb-cloud/options/cf_options.h":173:9 83 {*movqi_internal}
     (expr_list:REG_EH_REGION (const_int 3 [0x3])
        (nil)))
;;  succ:       4 [always]  count:1459806 (estimated locally) (FALLTHRU)
;;              49 [never]  count:0 (precise) (ABNORMAL,EH)

so DSE removes insn 28 and insn 29 but forgets to clean EH.
Comment 9 Richard Biener 2022-05-12 12:03:07 UTC
So DSE does

  /* DSE can eliminate potentially-trapping MEMs.
     Remove any EH edges associated with them.  */
  if ((locally_deleted || globally_deleted)
      && cfun->can_throw_non_call_exceptions
      && purge_all_dead_edges ())
    {
      free_dominance_info (CDI_DOMINATORS);
      cleanup_cfg (0);

which should do the trick, but the fast-DCE is invoked via

  dse_step0 ();
  dse_step1 ();
  dse_step2_init ();
  if (dse_step2 ())
    {
      df_set_flags (DF_LR_RUN_DCE);
      df_analyze ();

and dse_step0/1 already removed the stores, exposing the bad IL.  One
way to fix this might be to run cleanup_cfg after dse_step1 already,
or just remove_unreachable_blocks.

I'm going to test

diff --git a/gcc/dse.cc b/gcc/dse.cc
index b8914a3ae24..bb658a85959 100644
--- a/gcc/dse.cc
+++ b/gcc/dse.cc
@@ -3682,6 +3682,16 @@ rest_of_handle_dse (void)
 
   dse_step0 ();
   dse_step1 ();
+  /* DSE can eliminate potentially-trapping MEMs.
+     Remove any EH edges associated with them, since otherwise
+     DF_LR_RUN_DCE will complain later.  */
+  if ((locally_deleted || globally_deleted)
+      && cfun->can_throw_non_call_exceptions
+      && purge_all_dead_edges ())
+    {
+      free_dominance_info (CDI_DOMINATORS);
+      delete_unreachable_blocks ();
+    }
   dse_step2_init ();
   if (dse_step2 ())
     {
Comment 10 GCC Commits 2022-05-12 13:05:53 UTC
The master branch has been updated by Richard Biener <rguenth@gcc.gnu.org>:

https://gcc.gnu.org/g:dfda40f8147412328f699628a54b0aaa584776e7

commit r13-373-gdfda40f8147412328f699628a54b0aaa584776e7
Author: Richard Biener <rguenther@suse.de>
Date:   Thu May 12 14:03:32 2022 +0200

    rtl-optimization/105577 - RTL DSE and non-call EH
    
    When one of the first two stages of DSE removes a throwing stmt
    we have to purge dead EH edges before the DF re-analyze fires off
    a fast DCE since that cannot cope with the situation.
    
    2022-05-12  Richard Biener  <rguenther@suse.de>
    
            PR rtl-optimization/105577
            * dse.cc (rest_of_handle_dse): Make sure to purge dead EH
            edges before running fast DCE via df_analyze.
Comment 11 Richard Biener 2022-05-12 13:06:15 UTC
Fixed on trunk sofar.
Comment 12 Martin Liška 2022-05-12 14:25:55 UTC
There's a reduced test case, can you please include it in testsuite?



namespace {
typedef long size_t;
}
typedef char uint8_t;
typedef long uint64_t;
namespace {
template <typename _Tp, _Tp __v> struct integral_constant {
  static constexpr _Tp value = __v;
};
template <bool __v> using __bool_constant = integral_constant<bool, __v>;
template <bool> struct __conditional {
  template <typename _Tp, typename> using type = _Tp;
};
template <bool _Cond, typename _If, typename _Else>
using __conditional_t = typename __conditional<_Cond>::type<_If, _Else>;
template <typename...> struct __and_;
template <typename _B1, typename _B2>
struct __and_<_B1, _B2> : __conditional_t<_B1::value, _B2, _B1> {};
template <typename> struct __not_ : __bool_constant<!bool()> {};
template <typename _Tp>
struct __is_constructible_impl : __bool_constant<__is_constructible(_Tp)> {};
template <typename _Tp>
struct is_default_constructible : __is_constructible_impl<_Tp> {};
template <typename _Tp> struct remove_extent { typedef _Tp type; };
template <bool> struct enable_if;
} // namespace
namespace std {
template <typename _Tp> struct allocator_traits { using pointer = _Tp; };
template <typename _Alloc> struct __alloc_traits : allocator_traits<_Alloc> {};
template <typename, typename _Alloc> struct _Vector_base {
  typedef typename __alloc_traits<_Alloc>::pointer pointer;
  struct {
    pointer _M_finish;
    pointer _M_end_of_storage;
  };
};
template <typename _Tp, typename _Alloc = _Tp>
class vector : _Vector_base<_Tp, _Alloc> {
public:
  _Tp value_type;
  typedef size_t size_type;
};
template <typename _Tp, typename _Dp> class __uniq_ptr_impl {
  template <typename _Up, typename> struct _Ptr { using type = _Up *; };

public:
  using _DeleterConstraint =
      enable_if<__and_<__not_<_Dp>, is_default_constructible<_Dp>>::value>;
  using pointer = typename _Ptr<_Tp, _Dp>::type;
};
template <typename _Tp, typename _Dp = _Tp> class unique_ptr {
public:
  using pointer = typename __uniq_ptr_impl<_Tp, _Dp>::pointer;
  pointer operator->();
};
enum _Lock_policy { _S_atomic } const __default_lock_policy = _S_atomic;
template <_Lock_policy = __default_lock_policy> class _Sp_counted_base;
template <typename, _Lock_policy = __default_lock_policy> class __shared_ptr;
template <_Lock_policy> class __shared_count { _Sp_counted_base<> *_M_pi; };
template <typename _Tp, _Lock_policy _Lp> class __shared_ptr {
  using element_type = typename remove_extent<_Tp>::type;
  element_type *_M_ptr;
  __shared_count<_Lp> _M_refcount;
};
template <typename _Tp> class shared_ptr : __shared_ptr<_Tp> {
public:
  shared_ptr() noexcept : __shared_ptr<_Tp>() {}
};
enum CompressionType : char;
class SliceTransform;
enum Temperature : uint8_t;
struct MutableCFOptions {
  MutableCFOptions()
      : soft_pending_compaction_bytes_limit(),
        hard_pending_compaction_bytes_limit(level0_file_num_compaction_trigger),
        level0_slowdown_writes_trigger(level0_stop_writes_trigger),
        max_compaction_bytes(target_file_size_base),
        target_file_size_multiplier(max_bytes_for_level_base),
        max_bytes_for_level_multiplier(ttl), compaction_options_fifo(),
        min_blob_size(blob_file_size), blob_compression_type(),
        enable_blob_garbage_collection(blob_garbage_collection_age_cutoff),
        max_sequential_skip_in_iterations(check_flush_compaction_key_order),
        paranoid_file_checks(bottommost_compression), bottommost_temperature(),
        sample_for_compression() {}
  shared_ptr<SliceTransform> prefix_extractor;
  uint64_t soft_pending_compaction_bytes_limit;
  uint64_t hard_pending_compaction_bytes_limit;
  int level0_file_num_compaction_trigger;
  int level0_slowdown_writes_trigger;
  int level0_stop_writes_trigger;
  uint64_t max_compaction_bytes;
  uint64_t target_file_size_base;
  int target_file_size_multiplier;
  uint64_t max_bytes_for_level_base;
  double max_bytes_for_level_multiplier;
  uint64_t ttl;
  vector<int> compaction_options_fifo;
  uint64_t min_blob_size;
  uint64_t blob_file_size;
  CompressionType blob_compression_type;
  bool enable_blob_garbage_collection;
  double blob_garbage_collection_age_cutoff;
  uint64_t max_sequential_skip_in_iterations;
  bool check_flush_compaction_key_order;
  bool paranoid_file_checks;
  CompressionType bottommost_compression;
  Temperature bottommost_temperature;
  uint64_t sample_for_compression;
};
template <class T, size_t kSize = 8> class autovector {
  using value_type = T;
  using size_type = typename vector<T>::size_type;
  size_type buf_[kSize * sizeof(value_type)];
};
class MemTable;
class ColumnFamilyData;
struct SuperVersion {
  MutableCFOptions write_stall_condition;
  autovector<MemTable *> to_delete;
};
class ColumnFamilySet {
public:
  class iterator {
  public:
    iterator operator++();
    bool operator!=(iterator);
    ColumnFamilyData *operator*();
    ColumnFamilyData *current_;
  };
  iterator begin();
  iterator end();
};
class VersionSet {
public:
  ColumnFamilySet *GetColumnFamilySet();
};
struct SuperVersionContext {
  void NewSuperVersion() { new SuperVersion(); }
};
class DBImpl {
  unique_ptr<VersionSet> versions_;
  void InstallSuperVersionAndScheduleWork(ColumnFamilyData *,
                                          SuperVersionContext *,
                                          const MutableCFOptions &);
};
void DBImpl::InstallSuperVersionAndScheduleWork(ColumnFamilyData *,
                                                SuperVersionContext *sv_context,
                                                const MutableCFOptions &) {
  sv_context->NewSuperVersion();
  for (auto my_cfd : *versions_->GetColumnFamilySet())
    ;
}
} // namespace std
Comment 13 GCC Commits 2022-05-16 10:08:56 UTC
The master branch has been updated by Richard Biener <rguenth@gcc.gnu.org>:

https://gcc.gnu.org/g:ef7b8976b9143aa78dd9cf5cfdaa02552d6e18a0

commit r13-506-gef7b8976b9143aa78dd9cf5cfdaa02552d6e18a0
Author: Richard Biener <rguenther@suse.de>
Date:   Mon May 16 12:07:31 2022 +0200

    rtl-optimization/105577 - testcase for the PR
    
    2022-05-16  Richard Biener  <rguenther@suse.de>
    
            PR rtl-optimization/105577
            * g++.dg/torture/pr105577.C: New testcase.
Comment 14 GCC Commits 2022-05-19 12:47:30 UTC
The releases/gcc-12 branch has been updated by Richard Biener <rguenth@gcc.gnu.org>:

https://gcc.gnu.org/g:b251f8be6b018966edad5daeb45c42fd193b24b4

commit r12-8401-gb251f8be6b018966edad5daeb45c42fd193b24b4
Author: Richard Biener <rguenther@suse.de>
Date:   Thu May 12 14:03:32 2022 +0200

    rtl-optimization/105577 - RTL DSE and non-call EH
    
    When one of the first two stages of DSE removes a throwing stmt
    we have to purge dead EH edges before the DF re-analyze fires off
    a fast DCE since that cannot cope with the situation.
    
    2022-05-12  Richard Biener  <rguenther@suse.de>
    
            PR rtl-optimization/105577
            * dse.cc (rest_of_handle_dse): Make sure to purge dead EH
            edges before running fast DCE via df_analyze.
    
    (cherry picked from commit dfda40f8147412328f699628a54b0aaa584776e7)
Comment 15 GCC Commits 2022-05-19 12:47:36 UTC
The releases/gcc-12 branch has been updated by Richard Biener <rguenth@gcc.gnu.org>:

https://gcc.gnu.org/g:25d7a7381099b46b6554c5e20b00b19d460c2123

commit r12-8402-g25d7a7381099b46b6554c5e20b00b19d460c2123
Author: Richard Biener <rguenther@suse.de>
Date:   Mon May 16 12:07:31 2022 +0200

    rtl-optimization/105577 - testcase for the PR
    
    2022-05-16  Richard Biener  <rguenther@suse.de>
    
            PR rtl-optimization/105577
            * g++.dg/torture/pr105577.C: New testcase.
    
    (cherry picked from commit ef7b8976b9143aa78dd9cf5cfdaa02552d6e18a0)
Comment 16 Richard Biener 2022-05-19 12:49:45 UTC
Fixed.
Comment 17 Curdeius Curdeius 2022-06-01 15:18:04 UTC
Thanks a lot for fixing this quickly!