This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: Failed attempt to improve FP register allocation on alpha
- To: gcc at gcc dot gnu dot org, lucier at math dot purdue dot edu
- Subject: Re: Failed attempt to improve FP register allocation on alpha
- From: Brad Lucier <lucier at math dot purdue dot edu>
- Date: Thu, 6 Jan 2000 16:12:37 -0500 (EST)
- Cc: wilker at math dot purdue dot edu, feeley at iro dot umontreal dot ca
OK, I've done some more experiments and I think I can say pretty
exactly how much those extra fmovs in IEEE floating point code on
the alpha ev6 are costing me.
I have two versions of the electrostatic test problem. The computation
per atom pair is the same for each code, but the loop in one blocks
the atom list to use the cache hierarchy better; the loop in the other
naively trashes the cache as it goes through the atoms list linearly.
(It's one of those quadratic algorithms I'm always complaining about.)
I rewrote the C code in the naive test so that there are no explicit
FP ops of the form x = x op y; this removed most of the fmovs.
The others were generated because
w = w op (x op (y op z))
was expanded into RTL as
temp = y op z
temp = x op temp
w = w op temp
so the second operation does not satisfy the early-clobber requirements
of ieee FP on the 21264, and another temporary register and an fmov are
generated by the the global register allocation
The timings for 200,000,000 electrostatic calculations are as follows,
using the options
gcc -mcpu=ev6 -fno-math-errno -mieee -fPIC -O2
with gcc 2.95.1 (I was mistaken about -O2 pessimizing this code):
blocking code -O2 naive code -O2 rewritten naive code -O2
ccc 33040 ms 70139 ms
gcc 51233 ms 100852 ms 89021 ms
The blocking code was too complicated to rewrite by hand, but it should
benefit in the same way, and likely go from 51233 ms to about 39402 ms,
so the unneeded fmovs cost about 30%. If this code generation problem
were fixed, then the ccc code should be only 19% faster than the gcc
code rather than 55% faster.
Brad Lucier