This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug tree-optimization/52424] New: dom prematurely pops entries from const_and_copies stack
- From: "wschmidt at gcc dot gnu.org" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: Tue, 28 Feb 2012 20:50:06 +0000
- Subject: [Bug tree-optimization/52424] New: dom prematurely pops entries from const_and_copies stack
- Auto-submitted: auto-generated
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52424
Bug #: 52424
Summary: dom prematurely pops entries from const_and_copies
stack
Classification: Unclassified
Product: gcc
Version: 4.7.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: tree-optimization
AssignedTo: wschmidt@gcc.gnu.org
ReportedBy: wschmidt@gcc.gnu.org
CC: bergner@vnet.ibm.com, jiangning.liu@arm.com,
rguenther@suse.de
Created attachment 26775
--> http://gcc.gnu.org/bugzilla/attachment.cgi?id=26775
Proposed patch of tree-ssa-dom.c
Jiangning Liu reported that the following C code has recently experienced
degraded performance on trunk. (Jiangning, please fill in the
Host/Target/Build fields for your configuration, or tell me what they are if
you don't have access. The problem is not related to specific targets in any
event.)
int *l, *r, *g;
void test_func(int n)
{
int i;
static int j;
static int pos, direction, direction_pre;
pos = 0;
direction = 1;
for ( i = 0; i < n; i++ )
{
direction_pre = direction;
for ( j = 0; j <= 400; j++ )
{
if ( g[pos] == 200 )
break;
if ( direction == 0 )
pos = l[pos];
else
pos = r[pos];
if ( pos == -1 )
{
if ( direction_pre != direction )
break;
pos = 0;
direction = !direction;
}
}
f(g[pos]);
}
}
When compiled with -O2, the dom2 details dump shows that a redundant phi is
detected in block 12:
<bb 12>:
# prephitmp.28_89 = PHI <pos.4_18(10), pos.6_24(11)>
# pos_lsm.33_53 = PHI <pos.4_18(10), pos.6_24(11)>
LKUP STMT prephitmp.28_89 = PHI <pos.4_18, pos.6_24>
prephitmp.28_89 = PHI <pos.4_18(10), pos.6_24(11)>
2>>> STMT prephitmp.28_89 = PHI <pos.4_18, pos.6_24>
prephitmp.28_89 = PHI <pos.4_18(10), pos.6_24(11)>
LKUP STMT pos_lsm.33_53 = PHI <pos.4_18, pos.6_24>
pos_lsm.33_53 = PHI <pos.4_18(10), pos.6_24(11)>
FIND: prephitmp.28_89
0>>> COPY pos_lsm.33_53 = prephitmp.28_89
However, the COPY is incorrectly removed from the const_and_copy stack after
block 13 is processed. It should not be removed until after the last block
dominated by bb12 (bb18) is processed. As a result, the copy is not propagated
into a PHI in block 15:
# pos_lsm.33_123 = PHI <pos_lsm.33_53(14)>
As a result, an extra copy is eventually introduced that degrades performance.
I tracked down the extra removal of const_and_copy stack entries to
tree-ssa-threadedge.c:remove_temporary_equivalences. This removes pairs of
entries that were introduced by thread_across_edge, and also removes a NULL
marker entry beneath those entries. Thus calls to thread_across_edge must be
preceded by a push of such a marker entry.
thread_across_edge is called from tree-ssa-dom.c:dom_thread_across_edge, which
is called in three places within dom_opt_leave_block. In two of those places,
the push of a marker entry is present. In the third, it isn't. Adding the
marker push appears to correct the problem.
I've attached a proposed fix. Jiangning, can you please apply this and see if
your performance problem is resolved?
Thanks,
Bill