This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug tree-optimization/49749] New: Reassociation rank algorithm does not include all non-NULL operands
- From: "wschmidt at gcc dot gnu.org" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: Thu, 14 Jul 2011 16:51:24 +0000
- Subject: [Bug tree-optimization/49749] New: Reassociation rank algorithm does not include all non-NULL operands
- Auto-submitted: auto-generated
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49749
Summary: Reassociation rank algorithm does not include all
non-NULL operands
Product: gcc
Version: 4.7.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: tree-optimization
AssignedTo: unassigned@gcc.gnu.org
ReportedBy: wschmidt@gcc.gnu.org
CC: dje@gcc.gnu.org, richard.guenther@gmail.com,
bergner@gcc.gnu.org, pthaugen@gcc.gnu.org,
meissner@gcc.gnu.org
Host: powerpc64-linux
Target: powerpc64-linux
Build: powerpc64-linux
Created attachment 24754
--> http://gcc.gnu.org/bugzilla/attachment.cgi?id=24754
Excerpt from bwaves showing the problem on r161839 and before
tree-ssa-reassoc.c: get_rank () contains the following code:
/* Otherwise, find the maximum rank for the operands, or the bb
rank, whichever is less. */
rank = 0;
maxrank = bb_rank[gimple_bb(stmt)->index];
if (gimple_assign_single_p (stmt))
{
tree rhs = gimple_assign_rhs1 (stmt);
n = TREE_OPERAND_LENGTH (rhs);
if (n == 0)
rank = MAX (rank, get_rank (rhs));
else
{
for (i = 0;
i < n && TREE_OPERAND (rhs, i) && rank != maxrank; i++)
rank = MAX(rank, get_rank (TREE_OPERAND (rhs, i)));
}
}
else
{
n = gimple_num_ops (stmt);
for (i = 1; i < n && rank != maxrank; i++)
{
gcc_assert (gimple_op (stmt, i));
rank = MAX(rank, get_rank (gimple_op (stmt, i)));
}
}
The loop condition:
i < n && TREE_OPERAND (rhs, i) && rank != maxrank; i++)
erroneously stops processing operands as soon as a NULL operand is detected.
As an example, a TARGET_MEM_REF without a symbol operand (operand 0) will be
assigned a rank of zero by that loop.
The condition "rank != maxrank" in both loops seems to indicate that an
expression should never get a rank larger than its block rank. I am seeing
cases where the rank is slightly larger; for example, a block rank of 1638400
and an operand rank of 1638404. If the intent is to limit the value to 1638400
in such a case, this also needs to be fixed.
The existence of this bug has contributed in an odd way to a degradation of the
CPU2006 bwaves benchmark on powerpc64 (affected procedure attached). In trunk
revision 161840 (7/5/2010), Richard Guenther changed the ivopts code to
generate MEM_REFs in preference to TARGET_MEM_REFs. Prior to this change, SSA
names from assignments to TARGET_MEM_REFs were given a rank of 1. Following
this change, SSA names from assignments to MEM_REFs were given a rank of
1638404 (for example), while SSA names from PHI assignments in the same block
were given a rank of 1638400. Thus, prior to this change, PHI assignments had
higher rank than memory references in the same block, but after this change,
they had lower rank.
Giving preference to the PHI expressions over the memory references provided
better reassociation for the bwaves example. In particular, prior to 161840 a
prephitmp was reassociated so that copyrename could generate something like
prephitmp.160_552 = D.1095_522 + prephitmp.160_161;
thus making prephitmp.160 independent of other calculations in the loop
iteration. When the PHI expression was given lowest rank, this did not occur,
causing prephitmp.160 to be dependent upon the result of the previous loop
iteration's calculations.
For powerpc64, we unroll the loop by a factor of 2. When the two iterations
are independent, FMAs from the two iterations can be interleaved leading to
good performance. When the two iterations are not independent, we end up with
a stack of linearly dependent FMAs at the end of the loop that chokes the
floating-point unit.
This leads to the question of whether PHIs at the top of a block should
generally be given preference over memory references in the interior of that
block. It may be that if the PHIs and the memory references have the same
rank, this is exactly what will happen. That might just be serendipity,
though; thoughts?
I will be experimenting with capping the rank of an expression at the rank of
the containing block, and including all operands in the ranking. But I am very
interested in thoughts on this problem from those who are familiar with the
reassociator.
Please let me know if you would like further materials to examine.
The bug goes back a long way; the degradation affects us on 4.6 and 4.7.