struct S { S (); S (int i); int s; operator bool () { return s != 0; } }; int bar (); int foo (bool x) { S a; try { x ? #define A(n) (a = S (0)), #define B(n) A(n##0) A(n##1) A(n##2) A(n##3) A(n##4) \ A(n##5) A(n##6) A(n##7) A(n##8) A(n##9) #define C(n) B(n##0) B(n##1) B(n##2) B(n##3) B(n##4) \ B(n##5) B(n##6) B(n##7) B(n##8) B(n##9) #define D(n) C(n##0) C(n##1) C(n##2) C(n##3) C(n##4) \ C(n##5) C(n##6) C(n##7) C(n##8) C(n##9) #define E(n) D(n##0) D(n##1) D(n##2) D(n##3) D(n##4) \ D(n##5) D(n##6) D(n##7) D(n##8) D(n##9) #define F(n) E(n##0) E(n##1) E(n##2) E(n##3) E(n##4) \ E(n##5) E(n##6) E(n##7) E(n##8) E(n##9) E(1) E(2) E(3) 0 : 1; } catch (int) { return 1; } return 0; } hits quadratic behavior in sink_clobbers at -O0. g++ 4.4 compiled this almost instantly, 4.6 too, 4.7/4.8/4.9/5 eat a lot of RAM on this already during into ssa pass, while 6+ just hog compile time (but not memory) in sink_clobbers.
Confirmed. I have a cleanup patch and an idea for fixing the quadraticness as well.
Created attachment 47608 [details] cleanup patch Testing this first, reliably catching secondary opportunities and micro-optimizing virtual operand update.
The 4.7 behavior started with r181332. Then in r182283 the sink_clobbers quadratic behavior has been added. And finally r246314 got rid of the eat all memory and compile time during into ssa pass and only hangs in sink_clobbers.
BTW, if we want to put the testcase into the testsuite, maybe we need to tune the exact resx/CLOBBER count, so that even after the fix it doesn't take way too long, but on the other side with unfixed compiler takes long enough to FAIL it at least on slower machines.
As for the patch, I wonder if internal resx doesn't occur also without the clobbers to move or in places where sink_clobbers would give up. So, perhaps add a dry_run bool to sink_clobbers and in the first loop, if we haven't yet determined we need the second one, run sink_clobbers with dry_run=true which would perform the first half of the function and only if it found clobbers in a bb with EH preds and single successor would tell the caller it should set the bool to trigger the second loop.
On Wed, 8 Jan 2020, jakub at gcc dot gnu.org wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93199 > > --- Comment #5 from Jakub Jelinek <jakub at gcc dot gnu.org> --- > As for the patch, I wonder if internal resx doesn't occur also without the > clobbers to move or in places where sink_clobbers would give up. So, perhaps > add a dry_run bool to sink_clobbers and in the first loop, if we haven't yet > determined we need the second one, run sink_clobbers with dry_run=true which > would perform the first half of the function and only if it found clobbers in a > bb with EH preds and single successor would tell the caller it should set the > bool to trigger the second loop. Hmm, not sure if it's worth that, but yeah, could do that easily I guess. Consider it done.
Author: rguenth Date: Wed Jan 8 12:49:14 2020 New Revision: 280000 URL: https://gcc.gnu.org/viewcvs?rev=280000&root=gcc&view=rev Log: 2019-01-08 Richard Biener <rguenther@suse.de> PR middle-end/93199 c/ * gimple-parser.c (c_parser_parse_gimple_body): Remove __PHI IFN permanently. * gimple-fold.c (rewrite_to_defined_overflow): Mark stmt modified. * tree-ssa-loop-im.c (move_computations_worker): Properly adjust virtual operand, also updating SSA use. * gimple-loop-interchange.cc (loop_cand::undo_simple_reduction): Update stmt after resetting virtual operand. (tree_loop_interchange::move_code_to_inner_loop): Likewise. * gimple-iterator.c (gsi_remove): When not removing the stmt permanently do not delink immediate uses or mark the stmt modified. Modified: trunk/gcc/ChangeLog trunk/gcc/c/ChangeLog trunk/gcc/c/gimple-parser.c trunk/gcc/gimple-fold.c trunk/gcc/gimple-iterator.c trunk/gcc/gimple-loop-interchange.cc trunk/gcc/tree-ssa-loop-im.c
Author: rguenth Date: Wed Jan 8 14:30:44 2020 New Revision: 280006 URL: https://gcc.gnu.org/viewcvs?rev=280006&root=gcc&view=rev Log: 2020-01-08 Richard Biener <rguenther@suse.de> PR middle-end/93199 * tree-eh.c (sink_clobbers): Update virtual operands for the first and last stmt only. Add a dry-run capability. (pass_lower_eh_dispatch::execute): Perform clobber sinking after CFG manipulations and in RPO order to catch all secondary opportunities reliably. Modified: trunk/gcc/ChangeLog trunk/gcc/tree-eh.c
Created attachment 47619 [details] patch fixing the quadraticness Like this for the quadraticness. Still runs into other slowness. pass_lower_eh_dispatch::execute takes less than 10 seconds now.
Still tree eh : 509.70 ( 97%) 1.58 ( 69%) 511.32 ( 97%) 9776324 kB ( 98%) bah. Something else ruins things. Will figure tomorrow.
- 77.83% 0.45% 16118 cc1plus cc1plus [.] (anonymous namespace)::pass_cleanup_eh::execute ▒ - 77.38% (anonymous namespace)::pass_cleanup_eh::execute - 77.29% cleanup_empty_eh_merge_phis - 44.55% redirect_eh_edge_1 30.45% last_stmt + 4.01% lookup_stmt_eh_lp_fn + 2.96% remove_stmt_from_eh_lp_fn 2.77% gimple_block_label 0.55% get_eh_landing_pad_from_number + 16.68% add_stmt_to_eh_lp_fn 5.34% find_edge + 4.58% redirect_edge_succ 0.59% gimple_execute_on_growing_pred that last_stmt figure looks odd tho (I blame perf for this). This is on the original redhat bugzilla testcase btw, will check your reduced one.
(In reply to Richard Biener from comment #11) > - 77.83% 0.45% 16118 cc1plus cc1plus [.] > (anonymous namespace)::pass_cleanup_eh::execute ▒ > - 77.38% (anonymous namespace)::pass_cleanup_eh::execute > - 77.29% cleanup_empty_eh_merge_phis > - 44.55% redirect_eh_edge_1 > 30.45% last_stmt > > + 4.01% lookup_stmt_eh_lp_fn > > + 2.96% remove_stmt_from_eh_lp_fn > > 2.77% gimple_block_label > > 0.55% get_eh_landing_pad_from_number > + 16.68% add_stmt_to_eh_lp_fn > 5.34% find_edge > + 4.58% redirect_edge_succ > 0.59% gimple_execute_on_growing_pred > > that last_stmt figure looks odd tho (I blame perf for this). This is on > the original redhat bugzilla testcase btw, will check your reduced one. Apart from structural quadraticnesses involving edges the main hog seems to be the quadratic updating of the RESX stmt moving here: static void redirect_eh_edge_1 (edge edge_in, basic_block new_bb, bool change_region) { ... /* Maybe move the throwing statement to the new region. */ if (old_lp != new_lp) { remove_stmt_from_eh_lp (throw_stmt); add_stmt_to_eh_lp (throw_stmt, new_lp->index); } which boils down to the very same issue. We're also getting a very big in-degree for the EH redirection probably because we're walking the EH LP array when optimizing empty EH. Thus code like /* Notice when we redirect the last EH edge away from OLD_BB. */ FOR_EACH_EDGE (e, ei, old_bb->preds) if (e != edge_in && (e->flags & EDGE_EH)) break; ends up expensive as well (we will move all EH edges anyway so the above at least could be avoided with some care). Not to mention FOR_EACH_EDGE (e, ei, old_bb->preds) if (find_edge (e->src, new_bb)) return false; which we can short-cut when new_bb has a single predecessor. So I have some micro-optimizing things here (only). But I wonder whether walking the landing pads in some better order in cleanup_all_empty_eh would fix things. Simply walking the array in reverse already helps a tremedous amount! tree eh : 4.75 ( 35%) 0.01 ( 3%) 4.75 ( 34%) 16911 kB ( 9%) vs. tree eh : 182.21 ( 95%) 0.84 ( 65%) 183.07 ( 95%) 4653260 kB ( 97%) on your testcase and tree eh : 29.56 ( 30%) 0.05 ( 1%) 29.60 ( 28%) 246315 kB ( 8%) vs. tree eh : 626.00 ( 89%) 5.75 ( 45%) 631.88 ( 88%)38736930 kB ( 93%) on the redhat bugzilla one.
The recursive lower_eh_construct & collect_finally_tree also are prone to eventually blow the stack with these kind of testcases.
There is always -fstack-reuse=named_vars workaround, or one can bump ulimit -s.
So lower_eh_constructs is what is remaining of EH time and there it's just cleanup_is_dead_in which ends up costly: while (reg && reg->type == ERT_CLEANUP) reg = reg->outer; return (reg && reg->type == ERT_MUST_NOT_THROW); looks like we could easily track that in the leh_state (cache the innermost non-cleanup region). I won't pursue this, but a quick check making the above simply return false shows tree eh : 1.44 ( 2%) 0.05 ( 1%) 1.48 ( 2%) 246315 kB ( 8%) on the RH bugzilla testcase. The reduced testcase can now be conveniently analyzed using callgrind (even with -O0 cc1plus). The RH one is still bigger.
Author: rguenth Date: Fri Jan 10 10:49:57 2020 New Revision: 280101 URL: https://gcc.gnu.org/viewcvs?rev=280101&root=gcc&view=rev Log: 2020-01-10 Richard Biener <rguenther@suse.de> PR middle-end/93199 * tree-eh.c (redirect_eh_edge_1): Avoid some work if possible. (cleanup_all_empty_eh): Walk landing pads in reverse order to avoid quadraticness. Modified: trunk/gcc/ChangeLog trunk/gcc/tree-eh.c
Author: rguenth Date: Fri Jan 10 11:23:53 2020 New Revision: 280102 URL: https://gcc.gnu.org/viewcvs?rev=280102&root=gcc&view=rev Log: 2020-01-10 Richard Biener <rguenther@suse.de> PR middle-end/93199 * tree-eh.c (sink_clobbers): Move clobbers to out-of-IL sequences to avoid walking them again for secondary opportunities. (pass_lower_eh_dispatch::execute): Instead actually insert them here. Modified: trunk/gcc/ChangeLog trunk/gcc/tree-eh.c
At -O2 I see, with just E(1), expand vars : 61.55 ( 23%) 0.01 ( 3%) 61.56 ( 23%) 1267 kB ( 1%) store merging : 185.44 ( 69%) 0.00 ( 0%) 185.44 ( 69%) 625 kB ( 1%) the time is spent in terminate_all_aliasing_chains where it seems the m_stores_head chain is quite long. With D(1) D(2) only we have 4000 calls to this function but then the inner look iterates 8 million times. Guess we miss some limiting there, a testcase might be just a large BB with many (non-aliasing) stores. Micro-optimizing the function is also possible (testing patch for that).
Created attachment 47631 [details] Patch candidate for #c15 I've got a patch for #c15. @Richi: Is it something you expected? I see the following speed up: w/ verification: 9.9 -> 8.7s w/o verification: 4.8 -> 3.7 Now perf report looks like: 5.41% cc1plus cc1plus [.] hash_table<hash_map<gimple*, int, simple_hashmap_traits<default_hash_traits<gimple*>, int> >::hash_entry, false, xcallocator>::find_with_hash 3.79% cc1plus cc1plus [.] mark_used_flags 3.33% cc1plus cc1plus [.] (anonymous namespace)::dom_info::calc_idoms 3.06% cc1plus cc1plus [.] (anonymous namespace)::dom_info::calc_dfs_tree_nonrec 2.68% cc1plus cc1plus [.] rtl_verify_flow_info_1 2.52% cc1plus cc1plus [.] verify_ssa 2.08% cc1plus cc1plus [.] rtl_verify_flow_info
(In reply to Martin Liška from comment #19) > Created attachment 47631 [details] > Patch candidate for #c15 > > I've got a patch for #c15. > @Richi: Is it something you expected? Well - there's the leh_state passed to both callers of the function so I expected a patch to amend that rather than adding an on-the-side caching hash-map. So basically whenever we push a non-CLEANUP update leh_state->xyz and when backtracking update it back (the whole process looked recursive from a quick look). > I see the following speed up: > w/ verification: 9.9 -> 8.7s > w/o verification: 4.8 -> 3.7 > > Now perf report looks like: > 5.41% cc1plus cc1plus [.] > hash_table<hash_map<gimple*, int, > simple_hashmap_traits<default_hash_traits<gimple*>, int> >::hash_entry, > false, xcallocator>::find_with_hash > 3.79% cc1plus cc1plus [.] mark_used_flags > 3.33% cc1plus cc1plus [.] (anonymous > namespace)::dom_info::calc_idoms > 3.06% cc1plus cc1plus [.] (anonymous > namespace)::dom_info::calc_dfs_tree_nonrec > 2.68% cc1plus cc1plus [.] rtl_verify_flow_info_1 > 2.52% cc1plus cc1plus [.] verify_ssa > 2.08% cc1plus cc1plus [.] rtl_verify_flow_info
> Well - there's the leh_state passed to both callers of the function > so I expected a patch to amend that rather than adding an on-the-side > caching hash-map. So basically whenever we push a non-CLEANUP > update leh_state->xyz and when backtracking update it back (the whole > process looked recursive from a quick look). Yes, it's recurring, but leh_state instances are different: #0 cleanup_is_dead_in (reg=0x7ffff5e94478) at /home/marxin/Programming/gcc/gcc/tree-eh.c:1640 #1 0x00000000010c0ca6 in lower_try_finally (state=0x7fffffffc060, tp=0x7ffff2f0f4d0) at /home/marxin/Programming/gcc/gcc/tree-eh.c:1676 #2 0x00000000010c1cdf in lower_eh_constructs_2 (state=0x7fffffffc060, gsi=0x7fffffffc020) at /home/marxin/Programming/gcc/gcc/tree-eh.c:2099 #3 0x00000000010c1e8a in lower_eh_constructs_1 (state=0x7fffffffc060, pseq=0x7ffff2f0f4c0) at /home/marxin/Programming/gcc/gcc/tree-eh.c:2158 #4 0x00000000010c0d53 in lower_try_finally (state=0x7fffffffc220, tp=0x7ffff2f0f498) at /home/marxin/Programming/gcc/gcc/tree-eh.c:1693 #5 0x00000000010c1cdf in lower_eh_constructs_2 (state=0x7fffffffc220, gsi=0x7fffffffc1e0) at /home/marxin/Programming/gcc/gcc/tree-eh.c:2099 #6 0x00000000010c1e8a in lower_eh_constructs_1 (state=0x7fffffffc220, pseq=0x7ffff2f0f488) at /home/marxin/Programming/gcc/gcc/tree-eh.c:2158 #7 0x00000000010c0d53 in lower_try_finally (state=0x7fffffffc3e0, tp=0x7ffff2f0f460) at /home/marxin/Programming/gcc/gcc/tree-eh.c:1693 #8 0x00000000010c1cdf in lower_eh_constructs_2 (state=0x7fffffffc3e0, gsi=0x7fffffffc3a0) at /home/marxin/Programming/gcc/gcc/tree-eh.c:2099 #9 0x00000000010c1e8a in lower_eh_constructs_1 (state=0x7fffffffc3e0, pseq=0x7ffff2f0f450) at /home/marxin/Programming/gcc/gcc/tree-eh.c:2158 #10 0x00000000010c0d53 in lower_try_finally (state=0x7fffffffc5a0, tp=0x7ffff2f0f428) at /home/marxin/Programming/gcc/gcc/tree-eh.c:1693 #11 0x00000000010c1cdf in lower_eh_constructs_2 (state=0x7fffffffc5a0, gsi=0x7fffffffc560) at /home/marxin/Programming/gcc/gcc/tree-eh.c:2099 #12 0x00000000010c1e8a in lower_eh_constructs_1 (state=0x7fffffffc5a0, pseq=0x7ffff2f0f418) at /home/marxin/Programming/gcc/gcc/tree-eh.c:2158 #13 0x00000000010c0d53 in lower_try_finally (state=0x7fffffffc760, tp=0x7ffff2f0f3f0) at /home/marxin/Programming/gcc/gcc/tree-eh.c:1693 #14 0x00000000010c1cdf in lower_eh_constructs_2 (state=0x7fffffffc760, gsi=0x7fffffffc720) at /home/marxin/Programming/gcc/gcc/tree-eh.c:2099 #15 0x00000000010c1e8a in lower_eh_constructs_1 (state=0x7fffffffc760, pseq=0x7ffff2f0f3e0) at /home/marxin/Programming/gcc/gcc/tree-eh.c:2158 #16 0x00000000010c0d53 in lower_try_finally (state=0x7fffffffc920, tp=0x7ffff2f0f3b8) at /home/marxin/Programming/gcc/gcc/tree-eh.c:1693 #17 0x00000000010c1cdf in lower_eh_constructs_2 (state=0x7fffffffc920, gsi=0x7fffffffc8e0) at /home/marxin/Programming/gcc/gcc/tree-eh.c:2099 #18 0x00000000010c1e8a in lower_eh_constructs_1 (state=0x7fffffffc920, pseq=0x7ffff2f0f3a8) at /home/marxin/Programming/gcc/gcc/tree-eh.c:2158 #19 0x00000000010c0d53 in lower_try_finally (state=0x7fffffffcae0, tp=0x7ffff2f0f380) at /home/marxin/Programming/gcc/gcc/tree-eh.c:1693 #20 0x00000000010c1cdf in lower_eh_constructs_2 (state=0x7fffffffcae0, gsi=0x7fffffffcaa0) at /home/marxin/Programming/gcc/gcc/tree-eh.c:2099 where a new 'state' is always created here: 1769 static gimple_seq 1770 lower_catch (struct leh_state *state, gtry *tp) 1771 { 1772 eh_region try_region = NULL; 1773 struct leh_state this_state = *state; ...
On Mon, 13 Jan 2020, marxin at gcc dot gnu.org wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93199 > > --- Comment #21 from Martin Liška <marxin at gcc dot gnu.org> --- > > Well - there's the leh_state passed to both callers of the function > > so I expected a patch to amend that rather than adding an on-the-side > > caching hash-map. So basically whenever we push a non-CLEANUP > > update leh_state->xyz and when backtracking update it back (the whole > > process looked recursive from a quick look). > > Yes, it's recurring, but leh_state instances are different: > > #0 cleanup_is_dead_in (reg=0x7ffff5e94478) at > /home/marxin/Programming/gcc/gcc/tree-eh.c:1640 > #1 0x00000000010c0ca6 in lower_try_finally (state=0x7fffffffc060, > tp=0x7ffff2f0f4d0) at /home/marxin/Programming/gcc/gcc/tree-eh.c:1676 > #2 0x00000000010c1cdf in lower_eh_constructs_2 (state=0x7fffffffc060, > gsi=0x7fffffffc020) at /home/marxin/Programming/gcc/gcc/tree-eh.c:2099 > #3 0x00000000010c1e8a in lower_eh_constructs_1 (state=0x7fffffffc060, > pseq=0x7ffff2f0f4c0) at /home/marxin/Programming/gcc/gcc/tree-eh.c:2158 > #4 0x00000000010c0d53 in lower_try_finally (state=0x7fffffffc220, > tp=0x7ffff2f0f498) at /home/marxin/Programming/gcc/gcc/tree-eh.c:1693 > #5 0x00000000010c1cdf in lower_eh_constructs_2 (state=0x7fffffffc220, > gsi=0x7fffffffc1e0) at /home/marxin/Programming/gcc/gcc/tree-eh.c:2099 > #6 0x00000000010c1e8a in lower_eh_constructs_1 (state=0x7fffffffc220, > pseq=0x7ffff2f0f488) at /home/marxin/Programming/gcc/gcc/tree-eh.c:2158 > #7 0x00000000010c0d53 in lower_try_finally (state=0x7fffffffc3e0, > tp=0x7ffff2f0f460) at /home/marxin/Programming/gcc/gcc/tree-eh.c:1693 > #8 0x00000000010c1cdf in lower_eh_constructs_2 (state=0x7fffffffc3e0, > gsi=0x7fffffffc3a0) at /home/marxin/Programming/gcc/gcc/tree-eh.c:2099 > #9 0x00000000010c1e8a in lower_eh_constructs_1 (state=0x7fffffffc3e0, > pseq=0x7ffff2f0f450) at /home/marxin/Programming/gcc/gcc/tree-eh.c:2158 > #10 0x00000000010c0d53 in lower_try_finally (state=0x7fffffffc5a0, > tp=0x7ffff2f0f428) at /home/marxin/Programming/gcc/gcc/tree-eh.c:1693 > #11 0x00000000010c1cdf in lower_eh_constructs_2 (state=0x7fffffffc5a0, > gsi=0x7fffffffc560) at /home/marxin/Programming/gcc/gcc/tree-eh.c:2099 > #12 0x00000000010c1e8a in lower_eh_constructs_1 (state=0x7fffffffc5a0, > pseq=0x7ffff2f0f418) at /home/marxin/Programming/gcc/gcc/tree-eh.c:2158 > #13 0x00000000010c0d53 in lower_try_finally (state=0x7fffffffc760, > tp=0x7ffff2f0f3f0) at /home/marxin/Programming/gcc/gcc/tree-eh.c:1693 > #14 0x00000000010c1cdf in lower_eh_constructs_2 (state=0x7fffffffc760, > gsi=0x7fffffffc720) at /home/marxin/Programming/gcc/gcc/tree-eh.c:2099 > #15 0x00000000010c1e8a in lower_eh_constructs_1 (state=0x7fffffffc760, > pseq=0x7ffff2f0f3e0) at /home/marxin/Programming/gcc/gcc/tree-eh.c:2158 > #16 0x00000000010c0d53 in lower_try_finally (state=0x7fffffffc920, > tp=0x7ffff2f0f3b8) at /home/marxin/Programming/gcc/gcc/tree-eh.c:1693 > #17 0x00000000010c1cdf in lower_eh_constructs_2 (state=0x7fffffffc920, > gsi=0x7fffffffc8e0) at /home/marxin/Programming/gcc/gcc/tree-eh.c:2099 > #18 0x00000000010c1e8a in lower_eh_constructs_1 (state=0x7fffffffc920, > pseq=0x7ffff2f0f3a8) at /home/marxin/Programming/gcc/gcc/tree-eh.c:2158 > #19 0x00000000010c0d53 in lower_try_finally (state=0x7fffffffcae0, > tp=0x7ffff2f0f380) at /home/marxin/Programming/gcc/gcc/tree-eh.c:1693 > #20 0x00000000010c1cdf in lower_eh_constructs_2 (state=0x7fffffffcae0, > gsi=0x7fffffffcaa0) at /home/marxin/Programming/gcc/gcc/tree-eh.c:2099 > > where a new 'state' is always created here: > > 1769 static gimple_seq > 1770 lower_catch (struct leh_state *state, gtry *tp) > 1771 { > 1772 eh_region try_region = NULL; > 1773 struct leh_state this_state = *state; > ... But it's copied. There's a reason why I didn't tackle it (because of this interwided stuff). But I don't like the simple cache-map.
> But it's copied. There's a reason why I didn't tackle it > (because of this interwided stuff). But I don't like the > simple cache-map. Sure, I've done that in: https://gcc.gnu.org/ml/gcc-patches/2020-01/msg00862.html
Fixed for GCC 10. Note the testcase(s) expose other slownesses to be categorized and filed separately (mainly -O1 [-g] is interesting here, for -O2+ we don't provide any guarantees with these kind of testcases). But I don't want to make this bug more complicated since the sink_clobbers stuff might be backportable.
The master branch has been updated by Martin Liska <marxin@gcc.gnu.org>: https://gcc.gnu.org/g:92ce93c743b3c81f6911bc3d06056099369e9191 commit r10-6084-g92ce93c743b3c81f6911bc3d06056099369e9191 Author: Martin Liska <mliska@suse.cz> Date: Mon Jan 20 11:10:30 2020 +0100 Record outer non-cleanup region in TREE EH. PR tree-optimization/93199 * tree-eh.c (struct leh_state): Add new field outer_non_cleanup. (cleanup_is_dead_in): Pass leh_state instead of eh_region. Add a checking that state->outer_non_cleanup points to outer non-clean up region. (lower_try_finally): Record outer_non_cleanup for this_state. (lower_catch): Likewise. (lower_eh_filter): Likewise. (lower_eh_must_not_throw): Likewise. (lower_cleanup): Likewise.
GCC 8.4.0 has been released, adjusting target milestone.
GCC 8 branch is being closed.
GCC 9.4 is being released, retargeting bugs to GCC 9.5.
Fixed in GCC 10.