This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug tree-optimization/41783] New: r151561 (PRE fix) regresses zeusmp
- From: "matz at gcc dot gnu dot org" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: 21 Oct 2009 14:34:14 -0000
- Subject: [Bug tree-optimization/41783] New: r151561 (PRE fix) regresses zeusmp
- Reply-to: gcc-bugzilla at gcc dot gnu dot org
zeusmp regressed by about 5% again with the PRE fix for PR41101, which is
r151561. The problem is that PRE now finds a partial redundancy (where in
reality there isn't any) and the PHI node to compensate for this prevents
vectorization of a loop due to its value used outside that loop. Testcase
extracted from zeusmp:
% cat hsmoc-1.f
subroutine hsmoc ( )
implicit NONE
integer ijkn
parameter(ijkn = 128+5)
real*8 dt, fact, db(ijkn), w1dt(ijkn)
integer i, is, ie, j, js, je
common /rootr/ dt
common /scratch/ w1dt
do 9 i=is,ie
do 807 j=js-1,je+1
db (j ) = j
807 continue
fact = dt * i
do 808 j=js,je+1
w1dt(j)= fact * db (j)
808 continue
9 continue
return
end
(compile with -march=barcelona -O3 -ffast-math -funroll-loops -fpeel-loops)
The problem is the access to 'dt' (rootr.dt), which PRE thinks is partially
redundant in the first loop (!?), hence it creates this code:
pretmp.11_53 = rootr.dt;
Loop-i:
prephitmp.12_51 = PHI <pretmp.11_53(9), D.1376_20(20)>
...
Loop-j1
prephitmp.12_49 = PHI <prephitmp.12_51(11), pretmp.11_52(14)>
...
pretmp.11_52 = rootr.dt;
goto Loop-j1
prephitmp.12_23 = PHI <prephitmp.12_51(12), prephitmp.12_49(13)>
D.1376_20 = prephitmp.12_23;
...
Loop-j2
Notice especially how we now read rootr.dt in the backedge for loop-j1,
which is much more often than before. Originally we access it ie-is times,
now we access it (ie-is)*(je-js) times.
It's possible that this alone explains the speed regression, and not
necessarily the missed vectorization. But the missed vectorization was
much easier to detect.
--
Summary: r151561 (PRE fix) regresses zeusmp
Product: gcc
Version: 4.5.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: tree-optimization
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: matz at gcc dot gnu dot org
GCC host triplet: x86_64-linux
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41783