For gcc.c-torture/execute/builtins/strlen-2.c compilation, -O1, with target arc-elf, I see an ICE in config/arc/arc.c:gen_compare_reg, as it has been passed a comparison for two VOIDmode constants: (ne:SI (const_int 3 [0x3]) (const_int 3 [0x3])) We should not generate such comparisons. [amylaar@rowan gcc]$ ./cc1 -fpreprocessed strlen-2.i -quiet -dumpbase strlen-2.c -auxbase strlen-2 -O1 -w -version -fno-diagnostics-show-caret -fdiagnostics-color=never -fno-tree-loop-distribute-patterns -o strlen-2.s GNU C (GCC) version 4.9.0 20131024 (experimental) (arc-elf) compiled by GNU C version 4.8.1 20130603 (Red Hat 4.8.1-1), GMP version 5.1.1, MPFR version 3.1.1, MPC version 1.0.1 GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096 GNU C (GCC) version 4.9.0 20131024 (experimental) (arc-elf) compiled by GNU C version 4.8.1 20130603 (Red Hat 4.8.1-1), GMP version 5.1.1, MPFR version 3.1.1, MPC version 1.0.1 GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096 Compiler executable checksum: a28c59382a8f5b3c721a450c1591fc5d /home/amylaar/synopsys/synopsys-gcc-mainline/unisrc-4.8/gcc/testsuite/gcc.c-torture/execute/builtins/strlen-2.c: In function ‘main_test’: /home/amylaar/synopsys/synopsys-gcc-mainline/unisrc-4.8/gcc/testsuite/gcc.c-torture/execute/builtins/strlen-2.c:25:1: internal compiler error: in gen_compare_reg, at config/arc/arc.c:1430 0x899316f gen_compare_reg(rtx_def*, machine_mode) ../../gcc/gcc/config/arc/arc.c:1430 0x89cfa80 gen_cstoresi4(rtx_def*, rtx_def*, rtx_def*, rtx_def*) ../../gcc/gcc/config/arc/arc.md:3208 0x85c8a6d insn_gen_fn::operator()(rtx_def*, rtx_def*, rtx_def*, rtx_def*) const ../../gcc/gcc/recog.h:286 0x85c83b9 maybe_gen_insn(insn_code, unsigned int, expand_operand*) ../../gcc/gcc/optabs.c:8217 0x85c8604 maybe_expand_insn(insn_code, unsigned int, expand_operand*) ../../gcc/gcc/optabs.c:8243 0x83942ef emit_cstore ../../gcc/gcc/expmed.c:5121 0x8394a85 emit_store_flag_1 ../../gcc/gcc/expmed.c:5362 0x8394b7c emit_store_flag(rtx_def*, rtx_code, rtx_def*, rtx_def*, machine_mode, int, int) ../../gcc/gcc/expmed.c:5405 0x83959ec emit_store_flag_force(rtx_def*, rtx_code, rtx_def*, rtx_def*, machine_mode, int, int) ../../gcc/gcc/expmed.c:5727 0x83ba5dd do_store_flag ../../gcc/gcc/expr.c:10832 0x83b1fb5 expand_expr_real_2(separate_ops*, rtx_def*, machine_mode, expand_modifier) ../../gcc/gcc/expr.c:8787 0x83b9191 expand_expr_real_1(tree_node*, rtx_def*, machine_mode, expand_modifier, rtx_def**) ../../gcc/gcc/expr.c:10444 0x83ad831 expand_expr_real(tree_node*, rtx_def*, machine_mode, expand_modifier, rtx_def**) ../../gcc/gcc/expr.c:7796 0x83b3ab6 expand_expr_real_1(tree_node*, rtx_def*, machine_mode, expand_modifier, rtx_def**) ../../gcc/gcc/expr.c:9253 0x83ad831 expand_expr_real(tree_node*, rtx_def*, machine_mode, expand_modifier, rtx_def**) ../../gcc/gcc/expr.c:7796 0x831e807 expand_normal ../../gcc/gcc/expr.h:450 0x8320856 do_jump(tree_node*, rtx_def*, rtx_def*, int) ../../gcc/gcc/dojump.c:608 0x831fa22 do_jump_1(tree_code, tree_node*, tree_node*, rtx_def*, rtx_def*, int) ../../gcc/gcc/dojump.c:364 0x831e918 jumpifnot_1(tree_code, tree_node*, tree_node*, rtx_def*, int) ../../gcc/gcc/dojump.c:114 0x82af4bd expand_gimple_cond ../../gcc/gcc/cfgexpand.c:2030 Please submit a full bug report, with preprocessed source if appropriate. Please include the complete backtrace with any bug report. See <http://gcc.gnu.org/bugs.html> for instructions. emit_cstore has generated this comparison. I propose to use copy_to_mode_reg in this case to preserve the information about the mode. I see that for the testcase, the unnecessary instructions are removed in the ce1 pass.
Frame 12 shows that at the tree level, one of the comparison operands is an initialized variable. (gdb) frame 12 #12 0x083bd9fa in expand_expr_real_1 (exp=<ne_expr 0xb7b55b28>, target=0x0, tmode=VOIDmode, modifier=EXPAND_NORMAL, alt_rtl=0x0) at ../../gcc/gcc/expr.c:10444 10444 return expand_expr_real_2 (&ops, target, tmode, modifier); (gdb) call debug_tree(exp) <ne_expr 0xb7b55b28 type <boolean_type 0xb7abd5a0 _Bool public unsigned QI size <integer_cst 0xb7aaa258 constant 8> unit size <integer_cst 0xb7aaa26c constant 1> align 8 symtab 0 alias set -1 canonical type 0xb7abd5a0 precision 1 min <integer_cst 0xb7aaa438 0> max <integer_cst 0xb7aaa460 1>> arg 0 <ssa_name 0xb7b0ac58 type <integer_type 0xb7abd480 long unsigned int public unsigned SI size <integer_cst 0xb7aaa140 constant 32> unit size <integer_cst 0xb7aaa154 constant 4> align 32 symtab 0 alias set -1 canonical type 0xb7abd480 precision 32 min <integer_cst 0xb7aaa3c0 0> max <integer_cst 0xb7aaa3ac 4294967295> context <translation_unit_decl 0xb7ac3dec D.1415> pointer_to_this <pointer_type 0xb7ac44e0>> visited var <var_decl 0xb7b4db24 D.1483>def_stmt _10 = 3; version 10> arg 1 <integer_cst 0xb7b3ce4c type <integer_type 0xb7abd480 long unsigned int> constant 3> /home/amylaar/synopsys/synopsys-gcc-mainline/unisrc-4.8/gcc/testsuite/gcc.c-torture/execute/builtins/strlen-2.c:29:6>
The regression shows up at r204194 : http://gcc.gnu.org/viewcvs/gcc?view=revision&revision=204194 The 104t.copyprop6 looks indeed more cumbersome for r204194 than for r204193: --- ../204193/strlen-2.i.104t.copyprop6 2013-11-08 16:03:58.272739028 +0000 +++ ./strlen-2.i.104t.copyprop6 2013-11-08 16:04:18.355773658 +0000 @@ -27,11 +27,15 @@ main_test () { + char[4] * iftmp.2_1; + const char * iftmp.6_2; const char * iftmp.12_3; long unsigned int g.3_7; long unsigned int g.5_8; + long unsigned int _10; long unsigned int h.7_11; long unsigned int h.9_12; + long unsigned int _14; long unsigned int i.10_15; long unsigned int i.11_16; long unsigned int j.13_19; @@ -44,24 +48,31 @@ long unsigned int _28; long unsigned int k.14_29; long unsigned int l.20_33; + _Bool _34; + _Bool _35; + _Bool _36; + _Bool _37; + _Bool _38; + _Bool _39; <bb 2>: g.3_7 = g; g.5_8 = g.3_7 + 1; g = g.5_8; - if (g.5_8 != 1) + if (g.3_7 != 0) goto <bb 3>; else goto <bb 4>; <bb 3>: - abort (); <bb 4>: - h.7_11 = h; - h.9_12 = h.7_11 + 1; - h = h.9_12; - if (h.9_12 != 1) + # iftmp.2_1 = PHI <"foo"(3), "bar"(2)> + _10 = strlen (iftmp.2_1); + _34 = g.5_8 != 1; + _35 = _10 != 3; + _36 = _35 | _34; + if (_36 != 0) goto <bb 5>; else goto <bb 6>; @@ -70,68 +81,93 @@ abort (); <bb 6>: + h.7_11 = h; + h.9_12 = h.7_11 + 1; + h = h.9_12; + if (h.7_11 != 0) + goto <bb 7>; + else + goto <bb 8>; + + <bb 7>: + + <bb 8>: + # iftmp.6_2 = PHI <&MEM[(void *)"xfoo" + 1B](7), "bar"(6)> + _14 = strlen (iftmp.6_2); + _37 = h.9_12 != 1; + _38 = _14 != 3; + _39 = _38 | _37; + if (_39 != 0) + goto <bb 9>; + else + goto <bb 10>; + + <bb 9>: + abort (); + + <bb 10>: i.10_15 = i; i.11_16 = i.10_15 + 1; i = i.11_16; if (i.11_16 != 1) - goto <bb 7>; + goto <bb 11>; else - goto <bb 8>; + goto <bb 12>; - <bb 7>: + <bb 11>: abort (); - <bb 8>: + <bb 12>: inside_main = 0; j.13_19 = j; if (j.13_19 != 0) - goto <bb 9>; + goto <bb 13>; else - goto <bb 10>; + goto <bb 14>; - <bb 9>: + <bb 13>: k.14_20 = k; k.16_21 = k.14_20 + 1; k = k.16_21; iftmp.12_23 = "foo" + k.14_20; - goto <bb 11>; + goto <bb 15>; - <bb 10>: + <bb 14>: k.14_24 = k; k.18_25 = k.14_24 + 1; k = k.18_25; iftmp.12_27 = "bar" + k.14_24; - <bb 11>: - # iftmp.12_3 = PHI <iftmp.12_23(9), iftmp.12_27(10)> + <bb 15>: + # iftmp.12_3 = PHI <iftmp.12_23(13), iftmp.12_27(14)> _28 = strlen (iftmp.12_3); if (_28 != 3) - goto <bb 13>; + goto <bb 17>; else - goto <bb 12>; + goto <bb 16>; - <bb 12>: + <bb 16>: k.14_29 = k; if (k.14_29 != 1) - goto <bb 13>; + goto <bb 17>; else - goto <bb 14>; + goto <bb 18>; - <bb 13>: + <bb 17>: abort (); - <bb 14>: + <bb 18>: foo (); l.20_33 = l; if (l.20_33 != 1) - goto <bb 15>; + goto <bb 19>; else - goto <bb 16>; + goto <bb 20>; - <bb 15>: + <bb 19>: abort (); - <bb 16>: + <bb 20>: return; }
Making emit_store_flag return 0 in the case of const-const comparison gives simpler rtl generation: (insn 14 13 15 (set (reg:QI 175) (const_int 1 [0x1])) .../strlen-2.c:29 -1 (nil)) (insn 15 14 16 (set (reg:QI 175) (const_int 0 [0])) .../strlen-2.c:29 -1 (nil)) (code_label 16 15 0 8 "" [0 uses])
We do have a missed optimization due to fab being done so late.
A patch is here: http://gcc.gnu.org/ml/gcc-patches/2013-11/msg00931.html
We have <bb 4>: # iftmp.2_1 = PHI <"foo"(2), "bar"(3)> _10 = 3; _34 = g.5_8 != 1; _35 = _10 != 3; in .optimized, as fold-all-builtins manages to produce that from <bb 4>: # iftmp.2_1 = PHI <"foo"(3), "bar"(2)> _10 = strlen (iftmp.2_1); _34 = g.5_8 != 1; _35 = _10 != 3; we don't re-fold the strlen call earlier as it doesn't change (only the PHI node feeding it does!). The fact that we don't propagate string-lengths separately in CCP makes this a somewhat ill-fit for handling during the CCP propagation stage. At -O2 jump threading and the strlen pass handle it well. So indeed an oddity - probably TER simply should not forward constants to not confuse RTL expansion.
That is, sth like Index: gcc/tree-ssa-ter.c =================================================================== --- gcc/tree-ssa-ter.c (revision 204664) +++ gcc/tree-ssa-ter.c (working copy) @@ -438,6 +439,12 @@ ter_is_replaceable_p (gimple stmt) && !is_gimple_val (gimple_assign_rhs1 (stmt))) return false; + /* Do not propagate "modeless" constants - we may end up confusing the RTL + expanders. Leave the optimization to RTL CCP. */ + if (gimple_assign_single_p (stmt) + && CONSTANT_CLASS_P (gimple_assign_rhs1 (stmt))) + return false; + return true; } return false;
(In reply to Richard Biener from comment #7) > That is, sth like > > Index: gcc/tree-ssa-ter.c > =================================================================== > --- gcc/tree-ssa-ter.c (revision 204664) > +++ gcc/tree-ssa-ter.c (working copy) > @@ -438,6 +439,12 @@ ter_is_replaceable_p (gimple stmt) > && !is_gimple_val (gimple_assign_rhs1 (stmt))) > return false; > > + /* Do not propagate "modeless" constants - we may end up confusing > the RTL > + expanders. Leave the optimization to RTL CCP. */ > + if (gimple_assign_single_p (stmt) > + && CONSTANT_CLASS_P (gimple_assign_rhs1 (stmt))) > + return false; > + > return true; > } > return false; Constants are often very valuable for rtl expansion, allowing to use cheaper patterns. And some constant propagations are impossible in rtl because of mode oddities. E.g. when you have a have a mulsidi3 pattern, you generally have a sign_extend - you can't have a VOIDmode constant inside that. Therefore, I would rather have the middle-end move the constants to registers only when necessary to preserve the mode, and preferrably fold instead in the first place when optimizing.
Author: amylaar Date: Mon Nov 11 18:57:25 2013 New Revision: 204682 URL: http://gcc.gnu.org/viewcvs?rev=204682&root=gcc&view=rev Log: PR middle-end/59049 * expmed.c (emit_store_flag): Fail for const-const comparison. Modified: trunk/gcc/ChangeLog trunk/gcc/expmed.c
Problem has been fixed for 4.9 with the commit shown in comment #9.