This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug tree-optimization/49749] New: Reassociation rank algorithm does not include all non-NULL operands


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49749

           Summary: Reassociation rank algorithm does not include all
                    non-NULL operands
           Product: gcc
           Version: 4.7.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
        AssignedTo: unassigned@gcc.gnu.org
        ReportedBy: wschmidt@gcc.gnu.org
                CC: dje@gcc.gnu.org, richard.guenther@gmail.com,
                    bergner@gcc.gnu.org, pthaugen@gcc.gnu.org,
                    meissner@gcc.gnu.org
              Host: powerpc64-linux
            Target: powerpc64-linux
             Build: powerpc64-linux


Created attachment 24754
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=24754
Excerpt from bwaves showing the problem on r161839 and before

tree-ssa-reassoc.c: get_rank () contains the following code:

      /* Otherwise, find the maximum rank for the operands, or the bb
     rank, whichever is less.   */
      rank = 0;
      maxrank = bb_rank[gimple_bb(stmt)->index];
      if (gimple_assign_single_p (stmt))
    {
      tree rhs = gimple_assign_rhs1 (stmt);
      n = TREE_OPERAND_LENGTH (rhs);
      if (n == 0)
        rank = MAX (rank, get_rank (rhs));
      else
        {
          for (i = 0;
           i < n && TREE_OPERAND (rhs, i) && rank != maxrank; i++)
        rank = MAX(rank, get_rank (TREE_OPERAND (rhs, i)));
        }
    }
      else
    {
      n = gimple_num_ops (stmt);
      for (i = 1; i < n && rank != maxrank; i++)
        {
          gcc_assert (gimple_op (stmt, i));
          rank = MAX(rank, get_rank (gimple_op (stmt, i)));
        }
    }


The loop condition: 

   i < n && TREE_OPERAND (rhs, i) && rank != maxrank; i++)

erroneously stops processing operands as soon as a NULL operand is detected. 
As an example, a TARGET_MEM_REF without a symbol operand (operand 0) will be
assigned a rank of zero by that loop.

The condition "rank != maxrank" in both loops seems to indicate that an
expression should never get a rank larger than its block rank.  I am seeing
cases where the rank is slightly larger; for example, a block rank of 1638400
and an operand rank of 1638404.  If the intent is to limit the value to 1638400
in such a case, this also needs to be fixed.

The existence of this bug has contributed in an odd way to a degradation of the
CPU2006 bwaves benchmark on powerpc64 (affected procedure attached).  In trunk
revision 161840 (7/5/2010), Richard Guenther changed the ivopts code to
generate MEM_REFs in preference to TARGET_MEM_REFs.  Prior to this change, SSA
names from assignments to TARGET_MEM_REFs were given a rank of 1.  Following
this change, SSA names from assignments to MEM_REFs were given a rank of
1638404 (for example), while SSA names from PHI assignments in the same block
were given a rank of 1638400.  Thus, prior to this change, PHI assignments had
higher rank than memory references in the same block, but after this change,
they had lower rank.

Giving preference to the PHI expressions over the memory references provided
better reassociation for the bwaves example.  In particular, prior to 161840 a
prephitmp was reassociated so that copyrename could generate something like

    prephitmp.160_552 = D.1095_522 + prephitmp.160_161;

thus making prephitmp.160 independent of other calculations in the loop
iteration.  When the PHI expression was given lowest rank, this did not occur,
causing prephitmp.160 to be dependent upon the result of the previous loop
iteration's calculations.

For powerpc64, we unroll the loop by a factor of 2.  When the two iterations
are independent, FMAs from the two iterations can be interleaved leading to
good performance.  When the two iterations are not independent, we end up with
a stack of linearly dependent FMAs at the end of the loop that chokes the
floating-point unit.

This leads to the question of whether PHIs at the top of a block should
generally be given preference over memory references in the interior of that
block.  It may be that if the PHIs and the memory references have the same
rank, this is exactly what will happen.  That might just be serendipity,
though; thoughts?

I will be experimenting with capping the rank of an expression at the rank of
the containing block, and including all operands in the ranking.  But I am very
interested in thoughts on this problem from those who are familiar with the
reassociator.

Please let me know if you would like further materials to examine.

The bug goes back a long way; the degradation affects us on 4.6 and 4.7.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]