This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: rfa (x86): 387<=>sse moves


On Jul 26, 2005, at 12:51 AM, Paolo Bonzini wrote:
Dale Johannesen wrote:
With -march=pentium4 -mfpmath=sse -O2, we get an extra move for code like
double d = atof(foo);
int i = d;
call atof
fstpl -8(%ebp)
movsd -8(%ebp), %xmm0
cvttsd2si %xmm0, %eax
(This is Linux, Darwin is similar.) I think the difficulty is that for

(set (reg/v:DF 58 [ d ]) (reg:DF 8 st)) 64 {*movdf_nointeger}

Try the attached patch. It gave a 3% speedup on -mfpmath=sse for tramp3d. Richard Henderson asked for SPEC testing, then it may go in.

Thanks. That's progress; the cost computation in regclass now figures out that memory
is that fastest place to put R58:


Register 58 costs: AD_REGS:87000 Q_REGS:87000 NON_Q_REGS:87000
INDEX_REGS:87000 LEGACY_REGS:87000 GENERAL_REGS:87000 FP_TOP_REG:49000
FP_SECOND_REG:50000 FLOAT_REGS:50000 SSE_REGS:50000 FP_TOP_SSE_REGS:75000
FP_SECOND_SSE_REGS:75000 FLOAT_SSE_REGS:75000 FLOAT_INT_REGS:87000
INT_SSE_REGS:91000 FLOAT_INT_SSE_REGS:91000
ALL_REGS:91000 MEM:40000


Unfortunately local-alloc insists on putting in a register anyway (ST(0) instead of an XMM,
but the end codegen is unchanged):


;; Register 58 in 8.

I think the RA may be missing the concept that memory might be faster than any possible register....
will dig further.



Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]