Bug 60363 - [4.9/4.10 Regression]: logical_op_short_circuit, gcc.dg/tree-ssa/ssa-dom-thread-4.c scan-tree-dump-times dom1 "Threaded" 4
Summary: [4.9/4.10 Regression]: logical_op_short_circuit, gcc.dg/tree-ssa/ssa-dom-thre...
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 4.9.0
: P3 normal
Target Milestone: 5.0
Assignee: Not yet assigned to anyone
URL:
Keywords: xfail
Depends on:
Blocks:
 
Reported: 2014-02-28 01:31 UTC by Hans-Peter Nilsson
Modified: 2014-05-06 08:25 UTC (History)
3 users (show)

See Also:
Host: x86_64-unknown-linux-gnu
Target: cris-axis-elf, mipsel-unknown-linux-gnu, mips-unknown-linux-gnu
Build:
Known to work: 4.10.0
Known to fail: 4.9.1
Last reconfirmed: 2014-03-31 00:00:00


Attachments
ssa-dom-thread-4.c.078t.dom1 for cris-elf. (3.32 KB, text/plain)
2014-02-28 01:32 UTC, Hans-Peter Nilsson
Details
tar of dump files. (7.35 KB, application/x-bzip)
2014-03-09 13:31 UTC, bin.cheng
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Hans-Peter Nilsson 2014-02-28 01:31:24 UTC
This test previously passed, now it fails.
A patch in the revision range (last_known_working:first_known_failing) 208159:208165 exposed or caused this regression.  Since then it fails as follows:

Running /tmp/hpautotest-gcc1/gcc/gcc/testsuite/gcc.dg/tree-ssa/tree-ssa.exp ...
...
FAIL: gcc.dg/tree-ssa/ssa-dom-thread-4.c scan-tree-dump-times dom1 "Threaded" 4

This is a logical_op_short_circuit target, and the FAIL seems common to such targets, cf: <http://gcc.gnu.org/ml/gcc-testresults/2014-02/msg01894.html>
(AVR being the only target with test-results properly marked as such).
A target that may be of interest is too; "ARM cortex-M".

Author of suspect patch in revision range CC:ed.
ssa-dom-thread-4.c.078t.dom1 from r208207 attached. (There are 6 "Threaded")
Comment 1 Hans-Peter Nilsson 2014-02-28 01:32:17 UTC
Created attachment 32230 [details]
ssa-dom-thread-4.c.078t.dom1 for cris-elf.
Comment 2 bin.cheng 2014-03-09 13:31:29 UTC
Created attachment 32315 [details]
tar of dump files.
Comment 3 bin.cheng 2014-03-09 13:36:19 UTC
After patching 208165, there are two more jump threading opportunities for dom1 pass.  Jump threading is doing alright, the interesting thing is why there is no such opportunities before patching.

I attatched related dump files with/without patch.  It seems dumps before vrp1 pass are pretty similar, while after vrp1, dump with patch shows the two additional jump threading opportunities.  In other words, they are somehow already fixed (not introduced) in pass vrp1 without patching.

For now I can just change ssa-dom-thread-4.c to handle the two jump threadings, or should I look into vrp to find the difference in the first place?
Comment 4 bin.cheng 2014-03-09 15:00:31 UTC
Although may be irrelavant.  I found loop's latch doesn't get updated after removing the forwarder latch basic block.  Previous patch only catches function remove_forwarder_block, but remove_forwarder_block_with_phi should be handled too.

I will send a patch picking this up.
Comment 5 bin.cheng 2014-03-11 09:16:10 UTC
Vrp1 generates below code:


  <bb 3>:
  if (b_elt_11(D) != 0B)
    goto <bb 4>;
  else
    goto <bb 8>;

  <bb 4>:
  # kill_elt_10 = PHI <kill_elt_4(3)>
  goto <bb 6>;

  <bb 5>:
  kill_elt_14 = kill_elt_2->next;

  <bb 6>:
  # kill_elt_2 = PHI <kill_elt_10(4), kill_elt_14(5)>
  if (kill_elt_2 != 0B)
    goto <bb 7>;
  else
    goto <bb 19>;

  <bb 7>:
  _12 = kill_elt_2->indx;
  _13 = b_elt_11(D)->indx;
  if (_12 < _13)
    goto <bb 5>;
  else
    goto <bb 20>;

...


  <bb 18>:
  goto <bb 14>;

  <bb 19>:
  # kill_elt_41 = PHI <0B(6)>
  if (b_elt_11(D) != 0B)
    goto <bb 18>;
  else
    goto <bb 14>;

The whole bb 19 is unnecessary since we know "b_elt_11(D) != 0" holds.
Comment 6 bin.cheng 2014-03-11 11:33:57 UTC
After investigation, I think the root cause is:

For given mergephi2 dump (before vrp), there are latch(bb13) and header(bb14) of a loop:

  <bb 13>:
  # changed_18 = PHI <changed_26(12), changed_1(7), changed_1(8), changed_1(9), changed_1(11)>

  <bb 14>:
  # changed_1 = PHI <changed_18(13), 0(2)>
  # kill_elt_4 = PHI <kill_elt_3(13), kill_elt_7(D)(2)>
  if (a_elt_9(D) != 0B)
    goto <bb 3>;
  else
    goto <bb 15>;

The latch is removed by the bogus patch, and the code turns into:

  <bb 14>:
  # changed_1 = PHI <changed_1(7), 0(2), changed_26(12), changed_1(11), changed_1(9), changed_1(8)>
  # kill_elt_4 = PHI <kill_elt_3(7), kill_elt_7(D)(2), kill_elt_3(12), kill_elt_3(11), kill_elt_3(9), kill_elt_3(8)>
  if (a_elt_9(D) != 0B)
    goto <bb 3>;
  else
    goto <bb 15>;

Since VRP requires LOOP_HAVE_SIMPLE_LATCH, it will be initialized into:

  <bb 14>:
  # changed_19 = PHI <changed_1(7), changed_1(22), changed_26(12), changed_1(11), changed_1(9)>
  # kill_elt_18 = PHI <kill_elt_3(7), kill_elt_33(22), kill_elt_32(12), kill_elt_32(11), kill_elt_32(9)>

  <bb 17>:
  # changed_1 = PHI <changed_19(14), 0(2)>
  # kill_elt_4 = PHI <kill_elt_18(14), kill_elt_7(D)(2)>
  if (a_elt_9(D) != 0B)
    goto <bb 24>;
  else
    goto <bb 15>;

After all works in vrp (including jump threading), the code is transformed into below form:


  <bb 18>:
  # kill_elt_2 = PHI <kill_elt_10(5), kill_elt_14(4)>
  if (kill_elt_2 != 0B)
    goto <bb 6>;
  else
    goto <bb 21>;

  <bb 21>:
  goto <bb 27>;

...

  <bb 14>:
  # changed_19 = PHI <changed_1(7), changed_1(22), changed_26(12), changed_1(11), changed_1(9), changed_1(27), changed_1(29)>
  # kill_elt_18 = PHI <kill_elt_3(7), 0B(26), kill_elt_3(12), kill_elt_3(11), kill_elt_3(9), kill_elt_3(27), kill_elt_3(29)>

  <bb 17>:
  # changed_1 = PHI <changed_19(14), 0(2)>
  # kill_elt_4 = PHI <kill_elt_18(14), kill_elt_7(D)(2)>
  if (a_elt_9(D) != 0B)
    goto <bb 24>;
  else
    goto <bb 15>;

...

  <bb 26>:
  goto <bb 14>;

  <bb 27>:
  # kill_elt_41 = PHI <0B(21)>
  if (b_elt_11(D) != 0B)
    goto <bb 26>;
  else
    goto <bb 14>;

When trying to remove bb26, it finds out bb27 is pred of both bb26 and bb14, then checks whether the phi args are consistent for both edges (<26, 14> and <27, 14>).  Apparently "0B(26)" != "kill_elt_3(27)", resulting in keeping bb27.

Actually, kill_elt_3(27) equals to "0", because of check condition in bb18.

So this might be a missed opportunity in vrp?
Comment 7 bin.cheng 2014-03-12 10:27:23 UTC
The problem has nothing to do with VRP, and might be a missed opportunity of jump threading.

The dump after VRP but before jump threading at the end of VRP is like:

...

  <bb 21>:
  goto <bb 7>;

...

  <bb 7>:
  # kill_elt_3 = PHI <kill_elt_4(20), 0B(21), kill_elt_2(6)>
  if (b_elt_11(D) != 0B)
    goto <bb 8>;
  else
    goto <bb 14>;

  <bb 8>:
  if (kill_elt_3 != 0B)
    goto <bb 9>;
  else
    goto <bb 22>;

  <bb 22>:
  goto <bb 14>;

...

  <bb 14>:
  # changed_19 = PHI <changed_1(7), changed_1(22), changed_26(12), changed_1(11), changed_1(9)>
  # kill_elt_18 = PHI <kill_elt_3(7), 0B(22), kill_elt_3(12), kill_elt_3(11), kill_elt_3(9)>

  <bb 17>:
  # changed_1 = PHI <changed_19(14), 0(2)>
  # kill_elt_4 = PHI <kill_elt_18(14), kill_elt_7(D)(2)>
  if (a_elt_9(D) != 0B)
    goto <bb 24>;
  else
    goto <bb 15>;

There is jump threading path like:
  Registering jump thread: (21, 7) incoming edge;  (7, 8) joiner;  (8, 22) normal;

After jump threading, the dump is like:
  <bb 18>:
  # kill_elt_2 = PHI <kill_elt_10(5), kill_elt_14(4)>
  if (kill_elt_2 != 0B)
    goto <bb 6>;
  else
    goto <bb 21>;

  <bb 21>:
  goto <bb 27>;

...

  <bb 14>:
  # changed_19 = PHI <changed_1(7), changed_1(22), changed_26(12), changed_1(11), changed_1(9), changed_1(27), changed_1(29)>
  # kill_elt_18 = PHI <kill_elt_3(7), 0B(26), kill_elt_3(12), kill_elt_3(11), kill_elt_3(9), kill_elt_3(27), kill_elt_3(29)>

  <bb 17>:
  # changed_1 = PHI <changed_19(14), 0(2)>
  # kill_elt_4 = PHI <kill_elt_18(14), kill_elt_7(D)(2)>
  if (a_elt_9(D) != 0B)
    goto <bb 24>;
  else
    goto <bb 15>;

...

  <bb 26>:
  goto <bb 14>;

  <bb 27>:
  # kill_elt_41 = PHI <0B(21)>
  if (b_elt_11(D) != 0B)
    goto <bb 26>;
  else
    goto <bb 14>;

During jump threading process, GCC updates phi node in destination basic block like "kill_elt_18 = PHI <".  Because bb27 is duplicated from bb7, the phi argument "kill_elt_3(27)" is copied directly from "kill_elt_3(7)".  Although "kill_elt_3 == 0" holds when coming from edge <bb27, bb14>, gcc doesn't know this fact.
At last, cfgcleanup tries to remove forwarder block bb26, but it can't make it because the phi arguments from edge<bb27, bb14> and edge<bb26, bb14> doesn't match.

I am thinking maybe jump threading can back trace the threading path and try to add phi argument for edge<bb27, bb14> with the value(0) of kill_elt_3 (i.e., "0(27)").  The value is available along the jump threading path.
Comment 8 bin.cheng 2014-03-18 10:08:06 UTC
Here is the findings.
1) After patching 208165, there are two more jump threading opportunities for dom1 pass. Jump threading is doing alright, the interesting thing is why there is no such opportunities before patching. The two additional opportunities are introduced after vrp1.
VRP1 dump without patch:
  <bb 2>:
  goto <bb 15>;

  <bb 3>:
  if (b_elt_11(D) != 0B)
    goto <bb 4>;
  else
    goto <bb 8>;

  <bb 4>:
  goto <bb 6>;

  <bb 5>:
  kill_elt_14 = kill_elt_2->next;

  <bb 6>:
  # kill_elt_2 = PHI <kill_elt_4(4), kill_elt_14(5)>
  if (kill_elt_2 != 0B)
    goto <bb 7>;
  else
    goto <bb 19>;

  <bb 7>:
  _12 = kill_elt_2->indx;
  _13 = b_elt_11(D)->indx;
  if (_12 < _13)
    goto <bb 5>;
  else
    goto <bb 18>;

  <bb 8>:
  # kill_elt_3 = PHI <kill_elt_4(3)>
  if (b_elt_11(D) != 0B)
    goto <bb 9>;
  else
    goto <bb 14>;

  <bb 9>:
  if (kill_elt_3 != 0B)
    goto <bb 10>;
  else
    goto <bb 14>;

  <bb 10>:
  # kill_elt_32 = PHI <kill_elt_3(9), kill_elt_39(18)>
  _15 = kill_elt_32->indx;
  _16 = b_elt_11(D)->indx;
  if (_15 == _16)
    goto <bb 11>;
  else
    goto <bb 14>;

  <bb 11>:
  if (a_elt_9(D) == 0B)
    goto <bb 13>;
  else
    goto <bb 12>;

  <bb 12>:
  _17 = a_elt_9(D)->indx;
  if (_16 <= _17)
    goto <bb 13>;
  else
    goto <bb 14>;

  <bb 13>:
  _20 = (int) changed_1;
  _25 = bitmap_elt_ior (dst_21(D), dst_elt_22(D), dst_prev_23(D), a_elt_9(D), &tmp_elt, _20);
  changed_26 = (unsigned char) _25;
  tmp_elt ={v} {CLOBBER};

  <bb 14>:
  # changed_18 = PHI <changed_26(13), changed_1(8), changed_1(9), changed_1(10), changed_1(12), changed_1(18), changed_1(19)>
  # kill_elt_33 = PHI <kill_elt_32(13), kill_elt_3(8), kill_elt_3(9), kill_elt_32(10), kill_elt_32(12), kill_elt_39(18), kill_elt_35(19)>

  <bb 15>:
  # changed_1 = PHI <changed_18(14), 0(2)>
  # kill_elt_4 = PHI <kill_elt_33(14), kill_elt_7(D)(2)>
  if (a_elt_9(D) != 0B)
    goto <bb 3>;
  else
    goto <bb 16>;

  <bb 16>:
  if (b_elt_11(D) != 0B)
    goto <bb 3>;
  else
    goto <bb 17>;

  <bb 17>:
  # changed_6 = PHI <changed_1(16)>
  changed_28 = changed_6;
  return changed_28;

  <bb 18>:
  # kill_elt_39 = PHI <kill_elt_2(7)>
  if (b_elt_11(D) != 0B)
    goto <bb 10>;
  else
    goto <bb 14>;

  <bb 19>:
  # kill_elt_35 = PHI <0B(6)>
  goto <bb 14>;

VRP1 dump with patch:
   <bb 2>:
  goto <bb 15>;

  <bb 3>:
  if (b_elt_11(D) != 0B)
    goto <bb 4>;
  else
    goto <bb 8>;

  <bb 4>:
  # kill_elt_10 = PHI <kill_elt_4(3)>
  goto <bb 6>;

  <bb 5>:
  kill_elt_14 = kill_elt_2->next;

  <bb 6>:
  # kill_elt_2 = PHI <kill_elt_10(4), kill_elt_14(5)>
  if (kill_elt_2 != 0B)
    goto <bb 7>;
  else
    goto <bb 19>;

  <bb 7>:
  _12 = kill_elt_2->indx;
  _13 = b_elt_11(D)->indx;
  if (_12 < _13)
    goto <bb 5>;
  else
    goto <bb 20>;

  <bb 8>:
  # kill_elt_3 = PHI <kill_elt_4(3)>
  if (b_elt_11(D) != 0B)
    goto <bb 9>;
  else
    goto <bb 14>;

  <bb 9>:
  if (kill_elt_3 != 0B)
    goto <bb 10>;
  else
    goto <bb 14>;

  <bb 10>:
  # kill_elt_34 = PHI <kill_elt_3(9), kill_elt_37(20)>
  _15 = kill_elt_34->indx;
  _16 = b_elt_11(D)->indx;
  if (_15 == _16)
    goto <bb 11>;
  else
    goto <bb 14>;

  <bb 11>:
  if (a_elt_9(D) == 0B)
    goto <bb 13>;
  else
    goto <bb 12>;

  <bb 12>:
  _17 = a_elt_9(D)->indx;
  if (_16 <= _17)
    goto <bb 13>;
  else
    goto <bb 14>;

  <bb 13>:
  _20 = (int) changed_1;
  _25 = bitmap_elt_ior (dst_21(D), dst_elt_22(D), dst_prev_23(D), a_elt_9(D), &tmp_elt, _20);
  changed_26 = (unsigned char) _25;
  tmp_elt ={v} {CLOBBER};

  <bb 14>:
  # changed_19 = PHI <changed_1(8), changed_1(18), changed_26(13), changed_1(12), changed_1(10), changed_1(19), changed_1(20), changed_1(9)>
  # kill_elt_18 = PHI <kill_elt_3(8), 0B(18), kill_elt_34(13), kill_elt_34(12), kill_elt_34(10), kill_elt_41(19), kill_elt_37(20), 0B(9)>

  <bb 15>:
  # changed_1 = PHI <changed_19(14), 0(2)>
  # kill_elt_4 = PHI <kill_elt_18(14), kill_elt_7(D)(2)>
  if (a_elt_9(D) != 0B)
    goto <bb 3>;
  else
    goto <bb 16>;

  <bb 16>:
  if (b_elt_11(D) != 0B)
    goto <bb 3>;
  else
    goto <bb 17>;

  <bb 17>:
  # changed_29 = PHI <changed_1(16)>
  changed_28 = changed_29;
  return changed_28;

  <bb 18>:
  goto <bb 14>;

  <bb 19>:
  # kill_elt_41 = PHI <0B(6)>
  if (b_elt_11(D) != 0B)
    goto <bb 18>;
  else
    goto <bb 14>;

  <bb 20>:
  # kill_elt_37 = PHI <kill_elt_2(7)>
  if (b_elt_11(D) != 0B)
    goto <bb 10>;
  else
    goto <bb 14>;

The problem part is as below:
  <bb 14>:
  # changed_19 = PHI <changed_1(8), changed_1(18), changed_26(13), changed_1(12), changed_1(10), changed_1(19), changed_1(20), changed_1(9)>
  # kill_elt_18 = PHI <kill_elt_3(8), 0B(18), kill_elt_34(13), kill_elt_34(12), kill_elt_34(10), kill_elt_41(19), kill_elt_37(20), 0B(9)>

...

  <bb 18>:
  goto <bb 14>;

  <bb 19>:
  # kill_elt_41 = PHI <0B(6)>
  if (b_elt_11(D) != 0B)
    goto <bb 18>;
  else
    goto <bb 14>;

Since PHI argument "kill_elt_41(19)" equals to "0B(18)", the bb18 is a removable forward basic block. By removing bb18, bb19 can be simplified into a single jump to bb14 too. This is exactly what GCC does without patch.

When CFGcleanup tries to remove bb18, it has to check that PHI arguments "kill_elt_41(19)" and "0B(18)" are compatible. Unfortunately, CFGcleanup doesn't evaluate value of ssa_name and it doesn't know that "kill_elt_41(19)" equals to "0" along jump threading path "19->14".

Since jump threading is now a flow sensitive optimization, it knows very well about that truth, so I come up below conclusion:
Jump threading adds PHI argument by copying existing one when duplicating basic blocks along jump threading path. It’s very possible that a PHI argument has a constant value along threading path, but jump threading doesn’t do any value evaluation and just copies the ssa_name. If we copy the constant value instead of ssa_name, cfgcleanup after jump threading can see more removable forward basic blocks thus the issue is fixed.

For this specific case, jump threading should be able to add argument "0(19)" rather than "kill_elt_41(19)" to bb14, like:
  <bb 14>:
  # changed_19 = PHI <changed_1(8), changed_1(18), changed_26(13), changed_1(12), changed_1(10), changed_1(19), changed_1(20), changed_1(9)>
  # kill_elt_18 = PHI <kill_elt_3(8), 0B(18), kill_elt_34(13), kill_elt_34(12), kill_elt_34(10), 0(19), kill_elt_37(20), 0B(9)>


I worked out a patch fixing the issue, will send it for review.
Comment 9 Richard Biener 2014-03-31 09:29:45 UTC
Fails on mips[el]-unknown-linux-gnu (and thus likely mipsisa64-elf, a primary target) as well.

Consider fixing it by xfail-ing for logical_op_short_circuit targets if a fix
isn't suitable at this stage.
Comment 10 bin.cheng 2014-03-31 09:39:23 UTC
Patch sent at http://gcc.gnu.org/ml/gcc-patches/2014-03/msg00857.html , but it need to wait for stage 1.

I will xfail it for now.
Comment 11 Jeffrey A. Law 2014-03-31 13:26:33 UTC
Agreed, let's xfail the test on the affected targets for now and look to fix the missed threading optimization during the next stage1.
Comment 12 bin cheng 2014-04-01 09:57:01 UTC
Author: amker
Date: Tue Apr  1 09:56:29 2014
New Revision: 208980

URL: http://gcc.gnu.org/viewcvs?rev=208980&root=gcc&view=rev
Log:

	PR target/60363
	* gcc.target/tree-ssa/ssa-dom-thread-4.c: Xfail for
	logical_op_short_circuit targets.


Modified:
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-4.c
Comment 13 Jakub Jelinek 2014-04-22 11:35:14 UTC
GCC 4.9.0 has been released
Comment 14 bin cheng 2014-05-05 07:37:01 UTC
Author: amker
Date: Mon May  5 07:36:30 2014
New Revision: 210059

URL: http://gcc.gnu.org/viewcvs?rev=210059&root=gcc&view=rev
Log:

	PR tree-optimization/60363
	* gcc/tree-ssa-threadupdate.c (get_value_locus_in_path): New.
	(copy_phi_args): New parameters.  Call get_value_locus_in_path.
	(update_destination_phis): New parameter.
	(create_edge_and_update_destination_phis): Ditto.
	(ssa_fix_duplicate_block_edges): Pass new arguments.
	(thread_single_edge): Ditto.

	PR tree-optimization/60363
	* gcc.dg/tree-ssa/ssa-dom-thread-4.c: Revert XFAIL test.


Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-4.c
    trunk/gcc/tree-ssa-threadupdate.c
Comment 15 bin.cheng 2014-05-06 07:16:16 UTC
Should be fixed now.
Comment 16 Richard Biener 2014-05-06 08:25:09 UTC
Fixed for 4.10.