This test previously passed, now it fails. A patch in the revision range (last_known_working:first_known_failing) 208159:208165 exposed or caused this regression. Since then it fails as follows: Running /tmp/hpautotest-gcc1/gcc/gcc/testsuite/gcc.dg/tree-ssa/tree-ssa.exp ... ... FAIL: gcc.dg/tree-ssa/ssa-dom-thread-4.c scan-tree-dump-times dom1 "Threaded" 4 This is a logical_op_short_circuit target, and the FAIL seems common to such targets, cf: <http://gcc.gnu.org/ml/gcc-testresults/2014-02/msg01894.html> (AVR being the only target with test-results properly marked as such). A target that may be of interest is too; "ARM cortex-M". Author of suspect patch in revision range CC:ed. ssa-dom-thread-4.c.078t.dom1 from r208207 attached. (There are 6 "Threaded")
Created attachment 32230 [details] ssa-dom-thread-4.c.078t.dom1 for cris-elf.
Created attachment 32315 [details] tar of dump files.
After patching 208165, there are two more jump threading opportunities for dom1 pass. Jump threading is doing alright, the interesting thing is why there is no such opportunities before patching. I attatched related dump files with/without patch. It seems dumps before vrp1 pass are pretty similar, while after vrp1, dump with patch shows the two additional jump threading opportunities. In other words, they are somehow already fixed (not introduced) in pass vrp1 without patching. For now I can just change ssa-dom-thread-4.c to handle the two jump threadings, or should I look into vrp to find the difference in the first place?
Although may be irrelavant. I found loop's latch doesn't get updated after removing the forwarder latch basic block. Previous patch only catches function remove_forwarder_block, but remove_forwarder_block_with_phi should be handled too. I will send a patch picking this up.
Vrp1 generates below code: <bb 3>: if (b_elt_11(D) != 0B) goto <bb 4>; else goto <bb 8>; <bb 4>: # kill_elt_10 = PHI <kill_elt_4(3)> goto <bb 6>; <bb 5>: kill_elt_14 = kill_elt_2->next; <bb 6>: # kill_elt_2 = PHI <kill_elt_10(4), kill_elt_14(5)> if (kill_elt_2 != 0B) goto <bb 7>; else goto <bb 19>; <bb 7>: _12 = kill_elt_2->indx; _13 = b_elt_11(D)->indx; if (_12 < _13) goto <bb 5>; else goto <bb 20>; ... <bb 18>: goto <bb 14>; <bb 19>: # kill_elt_41 = PHI <0B(6)> if (b_elt_11(D) != 0B) goto <bb 18>; else goto <bb 14>; The whole bb 19 is unnecessary since we know "b_elt_11(D) != 0" holds.
After investigation, I think the root cause is: For given mergephi2 dump (before vrp), there are latch(bb13) and header(bb14) of a loop: <bb 13>: # changed_18 = PHI <changed_26(12), changed_1(7), changed_1(8), changed_1(9), changed_1(11)> <bb 14>: # changed_1 = PHI <changed_18(13), 0(2)> # kill_elt_4 = PHI <kill_elt_3(13), kill_elt_7(D)(2)> if (a_elt_9(D) != 0B) goto <bb 3>; else goto <bb 15>; The latch is removed by the bogus patch, and the code turns into: <bb 14>: # changed_1 = PHI <changed_1(7), 0(2), changed_26(12), changed_1(11), changed_1(9), changed_1(8)> # kill_elt_4 = PHI <kill_elt_3(7), kill_elt_7(D)(2), kill_elt_3(12), kill_elt_3(11), kill_elt_3(9), kill_elt_3(8)> if (a_elt_9(D) != 0B) goto <bb 3>; else goto <bb 15>; Since VRP requires LOOP_HAVE_SIMPLE_LATCH, it will be initialized into: <bb 14>: # changed_19 = PHI <changed_1(7), changed_1(22), changed_26(12), changed_1(11), changed_1(9)> # kill_elt_18 = PHI <kill_elt_3(7), kill_elt_33(22), kill_elt_32(12), kill_elt_32(11), kill_elt_32(9)> <bb 17>: # changed_1 = PHI <changed_19(14), 0(2)> # kill_elt_4 = PHI <kill_elt_18(14), kill_elt_7(D)(2)> if (a_elt_9(D) != 0B) goto <bb 24>; else goto <bb 15>; After all works in vrp (including jump threading), the code is transformed into below form: <bb 18>: # kill_elt_2 = PHI <kill_elt_10(5), kill_elt_14(4)> if (kill_elt_2 != 0B) goto <bb 6>; else goto <bb 21>; <bb 21>: goto <bb 27>; ... <bb 14>: # changed_19 = PHI <changed_1(7), changed_1(22), changed_26(12), changed_1(11), changed_1(9), changed_1(27), changed_1(29)> # kill_elt_18 = PHI <kill_elt_3(7), 0B(26), kill_elt_3(12), kill_elt_3(11), kill_elt_3(9), kill_elt_3(27), kill_elt_3(29)> <bb 17>: # changed_1 = PHI <changed_19(14), 0(2)> # kill_elt_4 = PHI <kill_elt_18(14), kill_elt_7(D)(2)> if (a_elt_9(D) != 0B) goto <bb 24>; else goto <bb 15>; ... <bb 26>: goto <bb 14>; <bb 27>: # kill_elt_41 = PHI <0B(21)> if (b_elt_11(D) != 0B) goto <bb 26>; else goto <bb 14>; When trying to remove bb26, it finds out bb27 is pred of both bb26 and bb14, then checks whether the phi args are consistent for both edges (<26, 14> and <27, 14>). Apparently "0B(26)" != "kill_elt_3(27)", resulting in keeping bb27. Actually, kill_elt_3(27) equals to "0", because of check condition in bb18. So this might be a missed opportunity in vrp?
The problem has nothing to do with VRP, and might be a missed opportunity of jump threading. The dump after VRP but before jump threading at the end of VRP is like: ... <bb 21>: goto <bb 7>; ... <bb 7>: # kill_elt_3 = PHI <kill_elt_4(20), 0B(21), kill_elt_2(6)> if (b_elt_11(D) != 0B) goto <bb 8>; else goto <bb 14>; <bb 8>: if (kill_elt_3 != 0B) goto <bb 9>; else goto <bb 22>; <bb 22>: goto <bb 14>; ... <bb 14>: # changed_19 = PHI <changed_1(7), changed_1(22), changed_26(12), changed_1(11), changed_1(9)> # kill_elt_18 = PHI <kill_elt_3(7), 0B(22), kill_elt_3(12), kill_elt_3(11), kill_elt_3(9)> <bb 17>: # changed_1 = PHI <changed_19(14), 0(2)> # kill_elt_4 = PHI <kill_elt_18(14), kill_elt_7(D)(2)> if (a_elt_9(D) != 0B) goto <bb 24>; else goto <bb 15>; There is jump threading path like: Registering jump thread: (21, 7) incoming edge; (7, 8) joiner; (8, 22) normal; After jump threading, the dump is like: <bb 18>: # kill_elt_2 = PHI <kill_elt_10(5), kill_elt_14(4)> if (kill_elt_2 != 0B) goto <bb 6>; else goto <bb 21>; <bb 21>: goto <bb 27>; ... <bb 14>: # changed_19 = PHI <changed_1(7), changed_1(22), changed_26(12), changed_1(11), changed_1(9), changed_1(27), changed_1(29)> # kill_elt_18 = PHI <kill_elt_3(7), 0B(26), kill_elt_3(12), kill_elt_3(11), kill_elt_3(9), kill_elt_3(27), kill_elt_3(29)> <bb 17>: # changed_1 = PHI <changed_19(14), 0(2)> # kill_elt_4 = PHI <kill_elt_18(14), kill_elt_7(D)(2)> if (a_elt_9(D) != 0B) goto <bb 24>; else goto <bb 15>; ... <bb 26>: goto <bb 14>; <bb 27>: # kill_elt_41 = PHI <0B(21)> if (b_elt_11(D) != 0B) goto <bb 26>; else goto <bb 14>; During jump threading process, GCC updates phi node in destination basic block like "kill_elt_18 = PHI <". Because bb27 is duplicated from bb7, the phi argument "kill_elt_3(27)" is copied directly from "kill_elt_3(7)". Although "kill_elt_3 == 0" holds when coming from edge <bb27, bb14>, gcc doesn't know this fact. At last, cfgcleanup tries to remove forwarder block bb26, but it can't make it because the phi arguments from edge<bb27, bb14> and edge<bb26, bb14> doesn't match. I am thinking maybe jump threading can back trace the threading path and try to add phi argument for edge<bb27, bb14> with the value(0) of kill_elt_3 (i.e., "0(27)"). The value is available along the jump threading path.
Here is the findings. 1) After patching 208165, there are two more jump threading opportunities for dom1 pass. Jump threading is doing alright, the interesting thing is why there is no such opportunities before patching. The two additional opportunities are introduced after vrp1. VRP1 dump without patch: <bb 2>: goto <bb 15>; <bb 3>: if (b_elt_11(D) != 0B) goto <bb 4>; else goto <bb 8>; <bb 4>: goto <bb 6>; <bb 5>: kill_elt_14 = kill_elt_2->next; <bb 6>: # kill_elt_2 = PHI <kill_elt_4(4), kill_elt_14(5)> if (kill_elt_2 != 0B) goto <bb 7>; else goto <bb 19>; <bb 7>: _12 = kill_elt_2->indx; _13 = b_elt_11(D)->indx; if (_12 < _13) goto <bb 5>; else goto <bb 18>; <bb 8>: # kill_elt_3 = PHI <kill_elt_4(3)> if (b_elt_11(D) != 0B) goto <bb 9>; else goto <bb 14>; <bb 9>: if (kill_elt_3 != 0B) goto <bb 10>; else goto <bb 14>; <bb 10>: # kill_elt_32 = PHI <kill_elt_3(9), kill_elt_39(18)> _15 = kill_elt_32->indx; _16 = b_elt_11(D)->indx; if (_15 == _16) goto <bb 11>; else goto <bb 14>; <bb 11>: if (a_elt_9(D) == 0B) goto <bb 13>; else goto <bb 12>; <bb 12>: _17 = a_elt_9(D)->indx; if (_16 <= _17) goto <bb 13>; else goto <bb 14>; <bb 13>: _20 = (int) changed_1; _25 = bitmap_elt_ior (dst_21(D), dst_elt_22(D), dst_prev_23(D), a_elt_9(D), &tmp_elt, _20); changed_26 = (unsigned char) _25; tmp_elt ={v} {CLOBBER}; <bb 14>: # changed_18 = PHI <changed_26(13), changed_1(8), changed_1(9), changed_1(10), changed_1(12), changed_1(18), changed_1(19)> # kill_elt_33 = PHI <kill_elt_32(13), kill_elt_3(8), kill_elt_3(9), kill_elt_32(10), kill_elt_32(12), kill_elt_39(18), kill_elt_35(19)> <bb 15>: # changed_1 = PHI <changed_18(14), 0(2)> # kill_elt_4 = PHI <kill_elt_33(14), kill_elt_7(D)(2)> if (a_elt_9(D) != 0B) goto <bb 3>; else goto <bb 16>; <bb 16>: if (b_elt_11(D) != 0B) goto <bb 3>; else goto <bb 17>; <bb 17>: # changed_6 = PHI <changed_1(16)> changed_28 = changed_6; return changed_28; <bb 18>: # kill_elt_39 = PHI <kill_elt_2(7)> if (b_elt_11(D) != 0B) goto <bb 10>; else goto <bb 14>; <bb 19>: # kill_elt_35 = PHI <0B(6)> goto <bb 14>; VRP1 dump with patch: <bb 2>: goto <bb 15>; <bb 3>: if (b_elt_11(D) != 0B) goto <bb 4>; else goto <bb 8>; <bb 4>: # kill_elt_10 = PHI <kill_elt_4(3)> goto <bb 6>; <bb 5>: kill_elt_14 = kill_elt_2->next; <bb 6>: # kill_elt_2 = PHI <kill_elt_10(4), kill_elt_14(5)> if (kill_elt_2 != 0B) goto <bb 7>; else goto <bb 19>; <bb 7>: _12 = kill_elt_2->indx; _13 = b_elt_11(D)->indx; if (_12 < _13) goto <bb 5>; else goto <bb 20>; <bb 8>: # kill_elt_3 = PHI <kill_elt_4(3)> if (b_elt_11(D) != 0B) goto <bb 9>; else goto <bb 14>; <bb 9>: if (kill_elt_3 != 0B) goto <bb 10>; else goto <bb 14>; <bb 10>: # kill_elt_34 = PHI <kill_elt_3(9), kill_elt_37(20)> _15 = kill_elt_34->indx; _16 = b_elt_11(D)->indx; if (_15 == _16) goto <bb 11>; else goto <bb 14>; <bb 11>: if (a_elt_9(D) == 0B) goto <bb 13>; else goto <bb 12>; <bb 12>: _17 = a_elt_9(D)->indx; if (_16 <= _17) goto <bb 13>; else goto <bb 14>; <bb 13>: _20 = (int) changed_1; _25 = bitmap_elt_ior (dst_21(D), dst_elt_22(D), dst_prev_23(D), a_elt_9(D), &tmp_elt, _20); changed_26 = (unsigned char) _25; tmp_elt ={v} {CLOBBER}; <bb 14>: # changed_19 = PHI <changed_1(8), changed_1(18), changed_26(13), changed_1(12), changed_1(10), changed_1(19), changed_1(20), changed_1(9)> # kill_elt_18 = PHI <kill_elt_3(8), 0B(18), kill_elt_34(13), kill_elt_34(12), kill_elt_34(10), kill_elt_41(19), kill_elt_37(20), 0B(9)> <bb 15>: # changed_1 = PHI <changed_19(14), 0(2)> # kill_elt_4 = PHI <kill_elt_18(14), kill_elt_7(D)(2)> if (a_elt_9(D) != 0B) goto <bb 3>; else goto <bb 16>; <bb 16>: if (b_elt_11(D) != 0B) goto <bb 3>; else goto <bb 17>; <bb 17>: # changed_29 = PHI <changed_1(16)> changed_28 = changed_29; return changed_28; <bb 18>: goto <bb 14>; <bb 19>: # kill_elt_41 = PHI <0B(6)> if (b_elt_11(D) != 0B) goto <bb 18>; else goto <bb 14>; <bb 20>: # kill_elt_37 = PHI <kill_elt_2(7)> if (b_elt_11(D) != 0B) goto <bb 10>; else goto <bb 14>; The problem part is as below: <bb 14>: # changed_19 = PHI <changed_1(8), changed_1(18), changed_26(13), changed_1(12), changed_1(10), changed_1(19), changed_1(20), changed_1(9)> # kill_elt_18 = PHI <kill_elt_3(8), 0B(18), kill_elt_34(13), kill_elt_34(12), kill_elt_34(10), kill_elt_41(19), kill_elt_37(20), 0B(9)> ... <bb 18>: goto <bb 14>; <bb 19>: # kill_elt_41 = PHI <0B(6)> if (b_elt_11(D) != 0B) goto <bb 18>; else goto <bb 14>; Since PHI argument "kill_elt_41(19)" equals to "0B(18)", the bb18 is a removable forward basic block. By removing bb18, bb19 can be simplified into a single jump to bb14 too. This is exactly what GCC does without patch. When CFGcleanup tries to remove bb18, it has to check that PHI arguments "kill_elt_41(19)" and "0B(18)" are compatible. Unfortunately, CFGcleanup doesn't evaluate value of ssa_name and it doesn't know that "kill_elt_41(19)" equals to "0" along jump threading path "19->14". Since jump threading is now a flow sensitive optimization, it knows very well about that truth, so I come up below conclusion: Jump threading adds PHI argument by copying existing one when duplicating basic blocks along jump threading path. It’s very possible that a PHI argument has a constant value along threading path, but jump threading doesn’t do any value evaluation and just copies the ssa_name. If we copy the constant value instead of ssa_name, cfgcleanup after jump threading can see more removable forward basic blocks thus the issue is fixed. For this specific case, jump threading should be able to add argument "0(19)" rather than "kill_elt_41(19)" to bb14, like: <bb 14>: # changed_19 = PHI <changed_1(8), changed_1(18), changed_26(13), changed_1(12), changed_1(10), changed_1(19), changed_1(20), changed_1(9)> # kill_elt_18 = PHI <kill_elt_3(8), 0B(18), kill_elt_34(13), kill_elt_34(12), kill_elt_34(10), 0(19), kill_elt_37(20), 0B(9)> I worked out a patch fixing the issue, will send it for review.
Fails on mips[el]-unknown-linux-gnu (and thus likely mipsisa64-elf, a primary target) as well. Consider fixing it by xfail-ing for logical_op_short_circuit targets if a fix isn't suitable at this stage.
Patch sent at http://gcc.gnu.org/ml/gcc-patches/2014-03/msg00857.html , but it need to wait for stage 1. I will xfail it for now.
Agreed, let's xfail the test on the affected targets for now and look to fix the missed threading optimization during the next stage1.
Author: amker Date: Tue Apr 1 09:56:29 2014 New Revision: 208980 URL: http://gcc.gnu.org/viewcvs?rev=208980&root=gcc&view=rev Log: PR target/60363 * gcc.target/tree-ssa/ssa-dom-thread-4.c: Xfail for logical_op_short_circuit targets. Modified: trunk/gcc/testsuite/ChangeLog trunk/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-4.c
GCC 4.9.0 has been released
Author: amker Date: Mon May 5 07:36:30 2014 New Revision: 210059 URL: http://gcc.gnu.org/viewcvs?rev=210059&root=gcc&view=rev Log: PR tree-optimization/60363 * gcc/tree-ssa-threadupdate.c (get_value_locus_in_path): New. (copy_phi_args): New parameters. Call get_value_locus_in_path. (update_destination_phis): New parameter. (create_edge_and_update_destination_phis): Ditto. (ssa_fix_duplicate_block_edges): Pass new arguments. (thread_single_edge): Ditto. PR tree-optimization/60363 * gcc.dg/tree-ssa/ssa-dom-thread-4.c: Revert XFAIL test. Modified: trunk/gcc/ChangeLog trunk/gcc/testsuite/ChangeLog trunk/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-4.c trunk/gcc/tree-ssa-threadupdate.c
Should be fixed now.
Fixed for 4.10.