This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug tree-optimization/52424] New: dom prematurely pops entries from const_and_copies stack


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52424

             Bug #: 52424
           Summary: dom prematurely pops entries from const_and_copies
                    stack
    Classification: Unclassified
           Product: gcc
           Version: 4.7.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
        AssignedTo: wschmidt@gcc.gnu.org
        ReportedBy: wschmidt@gcc.gnu.org
                CC: bergner@vnet.ibm.com, jiangning.liu@arm.com,
                    rguenther@suse.de


Created attachment 26775
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=26775
Proposed patch of tree-ssa-dom.c

Jiangning Liu reported that the following C code has recently experienced
degraded performance on trunk.  (Jiangning, please fill in the
Host/Target/Build fields for your configuration, or tell me what they are if
you don't have access.  The problem is not related to specific targets in any
event.)

int *l, *r, *g;
void test_func(int n)
{
    int i;
    static int j;
    static int pos, direction, direction_pre;

    pos = 0;
    direction = 1;

    for ( i = 0; i < n; i++ )
    {
        direction_pre = direction;

        for ( j = 0; j <= 400; j++ )
        {

            if ( g[pos] == 200 )
                break;

            if ( direction == 0 )
                pos = l[pos];
            else
                pos = r[pos];

            if ( pos == -1 )
            {
                if ( direction_pre != direction )
                    break;

                pos = 0;
                direction = !direction;
            }

        }

        f(g[pos]);
    }
}

When compiled with -O2, the dom2 details dump shows that a redundant phi is
detected in block 12:

<bb 12>:
  # prephitmp.28_89 = PHI <pos.4_18(10), pos.6_24(11)>
  # pos_lsm.33_53 = PHI <pos.4_18(10), pos.6_24(11)>

LKUP STMT prephitmp.28_89 = PHI <pos.4_18, pos.6_24>
          prephitmp.28_89 = PHI <pos.4_18(10), pos.6_24(11)>
2>>> STMT prephitmp.28_89 = PHI <pos.4_18, pos.6_24>
          prephitmp.28_89 = PHI <pos.4_18(10), pos.6_24(11)>
LKUP STMT pos_lsm.33_53 = PHI <pos.4_18, pos.6_24>
          pos_lsm.33_53 = PHI <pos.4_18(10), pos.6_24(11)>
FIND: prephitmp.28_89
0>>> COPY pos_lsm.33_53 = prephitmp.28_89

However, the COPY is incorrectly removed from the const_and_copy stack after
block 13 is processed.  It should not be removed until after the last block
dominated by bb12 (bb18) is processed.  As a result, the copy is not propagated
into a PHI in block 15:

  # pos_lsm.33_123 = PHI <pos_lsm.33_53(14)>

As a result, an extra copy is eventually introduced that degrades performance.

I tracked down the extra removal of const_and_copy stack entries to
tree-ssa-threadedge.c:remove_temporary_equivalences.  This removes pairs of
entries that were introduced by thread_across_edge, and also removes a NULL
marker entry beneath those entries.  Thus calls to thread_across_edge must be
preceded by a push of such a marker entry.

thread_across_edge is called from tree-ssa-dom.c:dom_thread_across_edge, which
is called in three places within dom_opt_leave_block.  In two of those places,
the push of a marker entry is present.  In the third, it isn't.  Adding the
marker push appears to correct the problem.

I've attached a proposed fix.  Jiangning, can you please apply this and see if
your performance problem is resolved?

Thanks,
Bill


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]