This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

[Bug tree-optimization/59643] New: Predictive commoning unnecessarily punts on scimark2 SOR

From: "jakub at gcc dot gnu.org" <gcc-bugzilla at gcc dot gnu dot org>
To: gcc-bugs at gcc dot gnu dot org
Date: Mon, 30 Dec 2013 22:00:47 +0000
Subject: [Bug tree-optimization/59643] New: Predictive commoning unnecessarily punts on scimark2 SOR
Auto-submitted: auto-generated

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59643

            Bug ID: 59643
           Summary: Predictive commoning unnecessarily punts on scimark2
                    SOR
           Product: gcc
           Version: 4.9.0
            Status: UNCONFIRMED
          Keywords: missed-optimization
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: jakub at gcc dot gnu.org

I've noticed GCC performs badly on scimark2 SOR compared to llvm 3.[34], and I
believe the difference is in predictive commoning, which IMHO unnecessarily
gives up on the loop.

https://cmssdt.cern.ch/SDT/lxr/source/Validation/Performance/bin/SOR.c?v=Sat

The inner loop is:
               for (j=1; j<Nm1; j++)
                     Gi[j] = omega_over_four * (Gim1[j] + Gip1[j] + Gi[j-1] 
                                 + Gi[j+1]) + one_minus_omega * Gi[j];
and the problem is that data ref doesn't know that Gim1[j] and Gip1[j] reads
don't conflict with the Gi[j] write (they don't in the benchmark, but the
compiler can't know that (unless -flto and some extra smart IPA analysis hints
that, that is primarily a bad choice of data structures in the benchmark,
instead of using array of pointers to double where each inner array is malloced
separately, using two dimensional array might make it clear to the compiler
there is no aliasing).
When constructing components, pcom ignores read-read dependencies with offset
that can't be determined, but in this case there is a write and thus all the
data references are put into the same component and that component is
unsuitable, because the offset can't be determined.

For two writes with unknown dependencies, there is nothing that can be done,
but I wonder if for the case of (suitable) write and some other read where we
can't determine offset we really have to give up on both the data refs, rather
than just the read.  On this testcase, giving up on the Gim1[j] and Gip1[j]
reads that could possibly overlap with Gi[j] write is IMHO fine, we just keep
them as is and don't attempt to optimize them, and pcom doesn't optimize away
writes either (or does it?  then we'd need to say on the component that it
shouldn't do it in that case).
With the untested patch I'll attach scimark2 improved from
SOR             Mflops:  1135.50    (1000 x 1000)
to
SOR             Mflops:  1617.87    (1000 x 1000)

Follow-Ups:
- [Bug tree-optimization/59643] Predictive commoning unnecessarily punts on scimark2 SOR
  - From: jakub at gcc dot gnu.org

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]