Created attachment 53454 [details] fc_exch.c.c.orig.gz Initially I observed slowness on ia64 trying to compile linux-4.19 using this week's gcc snapshot. x86_64 also seems to be affected. Original (ia64-specific) example is attached as fc_exch.c.c.orig.gz. It shows the origin of this construct. I reduced the example with 4s timeout and got the following reproducer: // $ cat fc_exch.c.c int nr_cpu_ids; void fc_setup_exch_mgr() { (((((((1UL << (((0, 0) ? ((1) ? (((nr_cpu_ids)) ? 0 : ((nr_cpu_ids)) & (21) ? 21 : ((nr_cpu_ids)) ? 20 : ((nr_cpu_ids)) & (19) ? 19 : ((nr_cpu_ids)) ? 18 : ((nr_cpu_ids)) & (17) ? 17 : ((nr_cpu_ids)) ? 16 : ((nr_cpu_ids)) & (15) ? 15 : ((nr_cpu_ids)) ? 14 : ((nr_cpu_ids)) & (13) ? 13 : ((nr_cpu_ids)) ? 12 : ((nr_cpu_ids)) & (11) ? 11 : ((nr_cpu_ids)) ? 10 : ((nr_cpu_ids)) & (9) ? 9 : ((nr_cpu_ids)) ? 8 : ((nr_cpu_ids)) & (7) ? 7 : ((nr_cpu_ids)) ? 6 : ((nr_cpu_ids)) & (5) ? 5 : ((nr_cpu_ids)) ? 4 : ((nr_cpu_ids)) & (3) ? 3 : ((nr_cpu_ids)-1) & 1) : 1) : 0) + 1))))) & (1ULL << 2) ? 2 : 1)) ); } $ time gcc-13.0.0 -Wno-address-of-packed-member -c fc_exch.c.c -o bug.o -O1 real 0m4,821s user 0m4,726s sys 0m0,077s Almost 5s! $ time gcc-12.1.0 -Wno-address-of-packed-member -c fc_exch.c.c -o bug.o -O1 real 0m0,019s user 0m0,013s sys 0m0,006s $ gcc -v Using built-in specs. COLLECT_GCC=/<<NIX>>/gcc-13.0.0/bin/gcc COLLECT_LTO_WRAPPER=/<<NIX>>/gcc-13.0.0/libexec/gcc/x86_64-unknown-linux-gnu/13.0.0/lto-wrapper Target: x86_64-unknown-linux-gnu Configured with: Thread model: posix Supported LTO compression algorithms: zlib gcc version 13.0.0 20220807 (experimental) (GCC) On this example perf shows the following output: 8.63% cc1 cc1 [.] operand_compare::operand_equal_p 4.39% cc1 cc1 [.] operand_compare::verify_hash_value 4.29% cc1 cc1 [.] wide_int_to_tree_1 3.67% cc1 cc1 [.] fold_binary_loc 3.17% cc1 cc1 [.] generic_simplify_NE_EXPR 2.67% cc1 cc1 [.] tree_strip_nop_conversions 2.64% cc1 cc1 [.] generic_simplify_EQ_EXPR 2.47% cc1 cc1 [.] tree_operand_check 2.39% cc1 cc1 [.] get_inner_reference 2.37% cc1 cc1 [.] integer_zerop 2.25% cc1 cc1 [.] wide_int_binop 2.11% cc1 cc1 [.] tree_nop_convert 1.84% cc1 cc1 [.] int_const_binop 1.81% cc1 cc1 [.] int_cst_hasher::hash 1.78% cc1 cc1 [.] ggc_internal_alloc 1.70% cc1 cc1 [.] element_precision 1.44% cc1 cc1 [.] wi::fits_to_tree_p<poly_int<1u, generic_wide_int<wide_int_ref_storage<false, true> > > > 1.36% cc1 cc1 [.] contains_struct_check 1.08% cc1 cc1 [.] build2 1.05% cc1 cc1 [.] hash_table<int_cst_hasher, false, xcallocator>::find_slot_with_hash 1.04% cc1 cc1 [.] generic_simplify 1.03% cc1 cc1 [.] cache_wide_int_in_type_cache 0.97% cc1 cc1 [.] get_int_cst_ext_nunits 0.91% cc1 cc1 [.] bitmask_inv_cst_vector_p 0.87% cc1 cc1 [.] tree_strip_sign_nop_conversions 0.86% cc1 libc.so.6 [.] __memset_avx2_unaligned_erms
I doubt this is a regression. I suspect if you compile the trunk with --enable-checking=release you would see the same compile time as 12.
(In reply to Andrew Pinski from comment #1) > I doubt this is a regression. I suspect if you compile the trunk with > --enable-checking=release you would see the same compile time as 12. I think both my gcc-12.1.0 and gcc-13.0.0 are compiled with the same options (both are without --enable-checking= option). My unsubstantiated guess would be on recently added rules to fold & (or similar) which does not scale too well on long strings of ternary operators.
(In reply to Sergei Trofimovich from comment #2) > (In reply to Andrew Pinski from comment #1) > > I doubt this is a regression. I suspect if you compile the trunk with > > --enable-checking=release you would see the same compile time as 12. > > I think both my gcc-12.1.0 and gcc-13.0.0 are compiled with the same options > (both are without --enable-checking= option). My unsubstantiated guess would > be on recently added rules to fold & (or similar) which does not scale too > well on long strings of ternary operators. Oh, only now I realized snapshots have a different default compared to releases proper. I'll add --enable-checking=release and re-test.
Gcc releases are always defaults to --enable-checking=release while the git trunk is =yes.
--enable-checking=release did not help much. Attached example still compiles 3.5 s. verify parts disappeared from profile, but generic_simplify are still there: 11.20% cc1 cc1 [.] operand_compare::operand_equal_p 4.32% cc1 cc1 [.] wide_int_to_tree_1 4.23% cc1 cc1 [.] fold_binary_loc 4.02% cc1 cc1 [.] generic_simplify_NE_EXPR 3.64% cc1 cc1 [.] generic_simplify_EQ_EXPR 3.62% cc1 cc1 [.] tree_strip_nop_conversions 2.78% cc1 cc1 [.] get_inner_reference 2.22% cc1 cc1 [.] wide_int_binop 2.17% cc1 cc1 [.] int_const_binop 2.00% cc1 cc1 [.] tree_nop_convert 1.90% cc1 cc1 [.] operand_equal_p 1.77% cc1 cc1 [.] wi::fits_to_tree_p<poly_int<1u, generic_wide_int<wide_int_ref_storage<false, true> > > > 1.75% cc1 cc1 [.] integer_zerop 1.72% cc1 cc1 [.] ggc_internal_alloc 1.66% cc1 cc1 [.] element_precision 1.38% cc1 cc1 [.] hash_table<int_cst_hasher, false, xcallocator>::find_slot_with_hash 1.33% cc1 cc1 [.] build2 1.30% cc1 cc1 [.] tree_with_possible_nonzero_bits2 1.29% cc1 cc1 [.] fold_ternary_loc 1.24% cc1 cc1 [.] wi::popcount 1.19% cc1 cc1 [.] tree_truth_valued_p 1.14% cc1 cc1 [.] tree_strip_sign_nop_conversions 1.11% cc1 cc1 [.] generic_simplify 1.11% cc1 cc1 [.] fold_truth_andor 1.10% cc1 cc1 [.] bitmask_inv_cst_vector_p 1.08% cc1 cc1 [.] __popcountdi2 1.02% cc1 cc1 [.] get_nonzero_bits And I also see hangups when building linux-4.19, this time on x86_64 as well. Will try to extract non-minimized example.
Created attachment 53455 [details] ring_buffer.c.gz ring_buffer.c.gz is a preprocessed file from linux-4.19. $ time gcc-12.1.0 -O2 -c ring_buffer.c -o bug.o real 0m0,308s user 0m0,285s sys 0m0,017s $ gcc -O2 -c ring_buffer.c -o bug.o # Runs for minutes, increases RAM usage, never finishes. perf stats after 3 minutes' run: 10,16% cc1 [.] operand_compare::operand_equal_p 7,38% cc1 [.] fold_binary_loc 7,01% cc1 [.] generic_simplify_453 5,63% cc1 [.] generic_simplify_GT_EXPR 3,72% cc1 [.] wide_int_binop 3,42% cc1 [.] generic_simplify_COND_EXPR 3,38% cc1 [.] hash_table<int_cst_hasher, false, xcallocator>::find_slot_with_hash 3,04% cc1 [.] wide_int_to_tree_1 3,00% cc1 [.] tree_strip_nop_conversions 2,82% cc1 [.] int_const_binop 2,47% cc1 [.] get_inner_reference 2,45% cc1 [.] fold_build2_loc 2,23% cc1 [.] fold_ternary_loc 2,07% cc1 [.] tree_strip_sign_nop_conversions 1,96% cc1 [.] int_cst_hasher::hash 1,83% cc1 [.] ggc_internal_alloc 1,81% cc1 [.] fold_relational_const 1,61% cc1 [.] wi::fits_to_tree_p<poly_int<1u, generic_wide_int<wide_int_ref_storage<false, true> > > > 1,57% cc1 [.] generic_simplify_155 1,51% cc1 [.] operand_equal_p 1,22% cc1 [.] dbg_cnt 1,20% cc1 [.] element_precision 1,10% cc1 [.] ggc_free
Applying pattern match.pd:5769, generic-match.cc:14472 Applying pattern match.pd:5769, generic-match.cc:14472 Applying pattern match.pd:5769, generic-match.cc:14472 Applying pattern match.pd:5769, generic-match.cc:14472 Applying pattern match.pd:5769, generic-match.cc:14472 Applying pattern match.pd:5769, generic-match.cc:14472 Applying pattern match.pd:5769, generic-match.cc:14472 Applying pattern match.pd:5769, generic-match.cc:14472 Applying pattern match.pd:5769, generic-match.cc:14472 Applying pattern match.pd:5769, generic-match.cc:14472 Applying pattern match.pd:5769, generic-match.cc:14472 Applying pattern match.pd:5769, generic-match.cc:14472 Applying pattern match.pd:5769, generic-match.cc:14472 Applying pattern match.pd:5769, generic-match.cc:14472 ... that's the following pattern which GCC 12 doesn't have /* From fold_binary_op_with_conditional_arg handle the case of rewriting (a ? b : c) > d to a ? (b > d) : (c > d) when the compares simplify. */ (for cmp (simple_comparison) (simplify (cmp:c (cond @0 @1 @2) @3) /* Do not move possibly trapping operations into the conditional as this pessimizes code and causes gimplification issues when applied late. */ (if (!FLOAT_TYPE_P (TREE_TYPE (@3)) || operation_could_trap_p (cmp, true, false, @3)) (cond @0 (cmp! @1 @3) (cmp! @2 @3))))) the testcase is essentially a degenerate case of this, applying the pattern recursively. The testcase is a bit too large to easily follow what happens, cutting some lines in the middle shows we are eventually producing 0, 0 ? nr_cpu_ids == 0 && ((nr_cpu_ids & 7) == 0 && (nr_cpu_ids == 0 && ((nr_cpu_ids & 5) == 0 && (nr_cpu_ids == 0 && ((nr_cpu_ids & 3) == 0 && (nr_cpu_ids + -1 & 1) != 0))))) : 0 ? 2 : 1; not sure why we don't fold the (0, 0) ? ..., that seems to be a "bug" of fully folding in the C frontend? We do simplify 0 ? ... just fine. Supposedly COMPOUND_EXPRs are never constant folded?
We don't fold because case COMPOUND_EXPR: /* When pedantic, a compound expression can be neither an lvalue nor an integer constant expression. */ if (TREE_SIDE_EFFECTS (arg0) || TREE_CONSTANT (arg1)) return NULL_TREE; /* Don't let (0, 0) be null pointer constant. */ tem = integer_zerop (arg1) ? build1_loc (loc, NOP_EXPR, type, arg1) : fold_convert_loc (loc, type, arg1); return tem; where we chicken out because of the fear of early folding. At least the C frontend should get this correct now. Ideally the above would just return arg1 if !TREE_SIDE_EFFECTS (arg0). OTOH it's really sth for language specific folding. Of course the bug then would be still present when replacing (0, 0) with something non-constant.
Started with r13-322-g7f04b0d786e13ff5.
*** Bug 106642 has been marked as a duplicate of this bug. ***
(In reply to Martin Liška from comment #9) > Started with r13-322-g7f04b0d786e13ff5. I do confirm that after reverting this commit there is no more hanging when compiling (bug 106642)
Btw, just for reference the original looks like fc_cpu_order = ( __builtin_constant_p(( __builtin_constant_p(nr_cpu_ids) ? ( ((nr_cpu_ids) == 1) ? 1 : (1UL << (( __builtin_constant_p((nr_cpu_ids) - 1) ? ( __builtin_constant_p((nr_cpu_ids) - 1) ? ( ((nr_cpu_ids) - 1) < 2 ? 0 : ((nr_cpu_ids) - 1) & (1ULL << 63) ? 63 : ((nr_cpu_ids) - 1) & (1ULL << 62) ? 62 : ((nr_cpu_ids) - 1) & (1ULL << 61) ? 61 : ((nr_cpu_ids) - 1) & (1ULL << 60) ? 60 : ((nr_cpu_ids) - 1) & (1ULL << 59) ? 59 : ((nr_cpu_ids) - 1) & (1ULL << 58) ? 58 : ((nr_cpu_ids) - 1) & (1ULL << 57) ? ... so some fancy ceil_log2. If you take a simplified unsigned long fc_setup_exch_mgr(int nr_cpu_ids_m1) { return ((((nr_cpu_ids_m1)) & (1<<21) ? 21 : ((nr_cpu_ids_m1)) & (1<<20) ? 20 : ((nr_cpu_ids_m1)) & (1<<19) ? 19 : ((nr_cpu_ids_m1)) & (1<<18) ? 18 : ((nr_cpu_ids_m1)) & (1<<17) ? 17 : ((nr_cpu_ids_m1)) & (1<<16) ? 16 : ((nr_cpu_ids_m1)) & (1<<15) ? 15 : ((nr_cpu_ids_m1)) & (1<<14) ? 14 : ((nr_cpu_ids_m1)) & (1<<13) ? 13 : ((nr_cpu_ids_m1)) & (1<<12) ? 12 : ((nr_cpu_ids_m1)) & (1<<11) ? 11 /* THIS */ : ((nr_cpu_ids_m1)) & (1<<9) ? 9 : ((nr_cpu_ids_m1)) & (1<<8) ? 8 : ((nr_cpu_ids_m1)) & (1<<7) ? 7 : ((nr_cpu_ids_m1)) & (1<<6) ? 6 : ((nr_cpu_ids_m1)) & (1<<5) ? 5 : ((nr_cpu_ids_m1)) & (1<<4) ? 4 : ((nr_cpu_ids_m1)) & (1<<3) ? 3 : 0)) != 11; } I see ./cc1 -quiet t2.c -fdump-tree-original-folding; grep 'Applying' t2.c.005t.original | sort -u ; grep 'Applying' t2.c.005t.original | wc -l Applying pattern match.pd:4778, generic-match.cc:95676 Applying pattern match.pd:5809, generic-match.cc:24025 16383 but when I just remove the marked line it becomes ./cc1 -quiet t2.c -fdump-tree-original-folding; grep 'Applying' t2.c.005t.original | sort -u ; grep 'Applying' t2.c.005t.original | wc -l Applying pattern match.pd:4778, generic-match.cc:95676 Applying pattern match.pd:5809, generic-match.cc:24025 34 hmm, and the issue is probably that we have this pattern both in fold-const.cc and match.pd and thus when the pattern applies recursively, like for (a ? (b ? d : e) : f) > g and the original fold implementation would simplify because one of the braches simplifes: /* Check that we have simplified at least one of the branches. */ if (!TREE_CONSTANT (arg) && !TREE_CONSTANT (lhs) && !TREE_CONSTANT (rhs)) return NULL_TREE; so it goes all the way, recursively to "simple". But then the match.pd variant rejects the result because it is EXPR_P for the lhs (but not for the rhs). We always first try the match.pd path but then will try fold_binary_op_with_conditional_arg anyway which makes this effectively at least quadratic. This was all done to avoid regressing the COND_EXPR gimple IL refactoring. One possibility to avoid the recursion with fold-const is to disable the match.pd pattern for GENERIC.
The master branch has been updated by Richard Biener <rguenth@gcc.gnu.org>: https://gcc.gnu.org/g:ac68f904fe31baf80fa53218f1d8ee033bd8c79b commit r13-2113-gac68f904fe31baf80fa53218f1d8ee033bd8c79b Author: Richard Biener <rguenther@suse.de> Date: Thu Aug 18 11:10:30 2022 +0200 middle-end/106617 - fix fold_binary_op_with_conditional_arg pattern issue Now that we have parts of fold_binary_op_with_conditional_arg duplicated in match.pd and are using ! to take or throw away the result we have to be careful to not have both implementations play games which each other, causing quadratic behavior. In particular the match.pd implementation requires both arms to simplify while the fold-const.cc is happy with just one arm simplifying (something we cannot express in match.pd). The fix is to simply not enable the match.pd pattern for GENERIC. PR middle-end/106617 * match.pd ((a ? b : c) > d -> a ? (b > d) : (c > d)): Fix guard, disable on GENERIC to not cause quadratic behavior with the fold-const.cc implementation and the use of ! * gcc.dg/pr106617.c: New testcase.
Should be fixed now.