This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: rfa (x86): 387<=>sse moves
- From: Dale Johannesen <dalej at apple dot com>
- To: Paolo Bonzini <paolo dot bonzini at lu dot unisi dot ch>
- Cc: GCC Development <gcc at gcc dot gnu dot org>, Dale Johannesen <dalej at apple dot com>
- Date: Tue, 26 Jul 2005 15:34:17 -0700
- Subject: Re: rfa (x86): 387<=>sse moves
- References: <0f330882304cdfaf28ef9c2b1360380c@apple.com> <42E5EB70.8080303@lu.unisi.ch>
On Jul 26, 2005, at 12:51 AM, Paolo Bonzini wrote:
Dale Johannesen wrote:
With -march=pentium4 -mfpmath=sse -O2, we get an extra move for code
like
double d = atof(foo);
int i = d;
call atof
fstpl -8(%ebp)
movsd -8(%ebp), %xmm0
cvttsd2si %xmm0, %eax
(This is Linux, Darwin is similar.) I think the difficulty is that
for
(set (reg/v:DF 58 [ d ]) (reg:DF 8 st)) 64 {*movdf_nointeger}
Try the attached patch. It gave a 3% speedup on -mfpmath=sse for
tramp3d. Richard Henderson asked for SPEC testing, then it may go in.
Thanks. That's progress; the cost computation in regclass now figures
out that memory
is that fastest place to put R58:
Register 58 costs: AD_REGS:87000 Q_REGS:87000 NON_Q_REGS:87000
INDEX_REGS:87000 LEGACY_REGS:87000 GENERAL_REGS:87000 FP_TOP_REG:49000
FP_SECOND_REG:50000 FLOAT_REGS:50000 SSE_REGS:50000
FP_TOP_SSE_REGS:75000
FP_SECOND_SSE_REGS:75000 FLOAT_SSE_REGS:75000 FLOAT_INT_REGS:87000
INT_SSE_REGS:91000 FLOAT_INT_SSE_REGS:91000
ALL_REGS:91000 MEM:40000
Unfortunately local-alloc insists on putting in a register anyway
(ST(0) instead of an XMM,
but the end codegen is unchanged):
;; Register 58 in 8.
I think the RA may be missing the concept that memory might be faster
than any possible register....
will dig further.