This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |
Other format: | [Raw text] |
On Feb 2, 2007, at 2:16 PM, Richard Henderson wrote:
On Mon, Jan 29, 2007 at 09:57:29AM -0800, Stuart Hastings wrote:Last ping is here:
http://gcc.gnu.org/ml/gcc-patches/2007-01/msg00299.html
Patch awaiting review is here:
http://gcc.gnu.org/ml/gcc-patches/2006-12/msg01541.html
I had a look at this, and there are a number of things wrong with it. Rather than iterate over this lots of times, I fixed them up; the result is attached. In summary:
(Pretty big "summary." ;-)
I'm not fond of TARGET_KEEPS_VECTOR_ALIGNED_STACK preventing these, but it's a reasonable hack for now. It'd be nice if we could prove these vector values never got spilled; if the user is using SSE on their own, then we can piggy-back on that and use it too. Or, as I think I saw Jan mention somewhere in the thread, expand these as post-reload splitters. Which doesn't seem like it'd be *that* hard to do cleanly. Failing that, we could arrange for reload to emit movups, et al, when spilling. All of which is not really relevant for Darwin, so we'll ignore it for now.
I added a comment about the HImode conversions.
You added a "floatunsdidf3" expander, which would be ignored. It must be merged with the 64-bit "floatunsdidf2" expander above.
The movdi_to_sse pattern used which_alternative, but only had one alternative. It was clearly intended to have a second alternative for load from memory,
No,
but got it wrong.
...and yes.
Actually, the original version worked fine without the memory alternative. Jan insisted on the alternative, and it was easier to accommodate him than to convince him. ;-)
Ideally, this would be done in the regular movdi pattern that allows inter-unit moves, but we don't actually have one of those. I think they were disabled because it made reload do Truely Horrible Things with register class choices. We might could revisit that at some point...
I'd rather tackle x86_inter_unit_moves in a separate patch.
All uses of ix86_fp_from_int_const used ULL suffixes, which is not valid for C90, which we're supposed to support. I was able to replace all instances with real_ldexp.
I was able to merge the SF and DF to SI conversion routines. I think it's almost possible to write this without using actual vector modes at all, except for the actual cvttpd2dq insn. We need to ensure that the high fp values are valid (prefereably zero). That's easy to do with vector insns coming up to it, so it seems silly to hack in a "xorps xmm0, xmm0; movss xmm1, xmm0" before the actual conversion. Again, something that wouldn't be relevant if we expanded this post-reload.
Calling gen_rtx_SUBREG directly is generally bad practice. In all of these instances we wanted gen_lowpart or gen_highpart.
I did do away with the clamping in the fp to uint conversions; the C standard is quite clear that out-of-range values can produce any result.
I'm not sure the purpose of store_xmm_as_DF. In what cases does ix86_expand_vector_extract do the wrong thing?
Historical. IIRC, an ancestor of ix86_expand_vector_{move,extract} generated an MMX instruction in one instance. (Apple's current GCC is still based on 4.0.)
During initial rtl expansion, it's almost always a mistake to select between MEM and REG. Many REGs will have been loaded from REGs, and combine could be convinced to put it back if you give it half a chance.
Yes, movdi_to_sse was created to enable combine to do this.
I *think* my changes to ix86_expand_convert_uns_didf_sse in this regard are correct, but I havn't done exhaustive testing.
TARGET_INTER_UNIT_MOVES is always FALSE, so this will always go through memory. This is exactly what I was trying to avoid with movdi_to_sse.
Not fixed. I still think movdi_to_sse is the best answer short of a generalized "x86_inter_unit_moves" facility, but I claim no special expertise here.
ix86_expand_convert_uns_sidf_sse didn't need any vector ops.
The unsigned sidf and signed didf operations would be interesting to expand after reload, Just In Case the value happened to be in SSE registers or memory to begin with. For instance, with this patch we currently generate
cvtsi2sd 20(%esp), %xmm1 mulsd .LC7, %xmm1 { 0x1.0p32 } movl 16(%esp), %eax subl $-2147483648, %eax cvtsi2sd %eax, %xmm0 addsd .LC3, %xmm0 { 0x1.0p31 } addsd %xmm0, %xmm1
Add two more insns if we disallow inter-unit moves. Best case with SSE3 might look like
movq 16(%esp), %xmm0 psubd LC0, %xmm0 { INT_MIN, 0, 0, 0 } cvtpi2pd %xmm0, %xmm0
I'm not sure about this one; AFAICT, there is no such instruction.
addsd LC1, %xmm0 { 0x1.0p31 } mulpd LC2, %xmm0 { 1.0, 0x1.0p32 } haddpd %xmm0, %xmm0
I've only tested this lightly so far. I'll add a hack to force linux to assume aligned stacks (true during bootstrap at least) and build there.
I fixed a typo in x86_emit_floatuns.
I added one more conversion.
I added your name to the ChangeLog entry because you rewrote so much of the patch.
2007-02-08 Stuart Hastings <stuart@apple.com> Richard Henderson <rth@redhat.com>
* gcc/config/i386/i386.h (TARGET_KEEPS_VECTOR_ALIGNED_STACK): New.
* gcc/config/i386/darwin.h: (TARGET_KEEPS_VECTOR_ALIGNED_STACK): New.
* gcc/config/i386/i386.md (fixuns_trunc<mode>si2, fixuns_truncsfhi2,
fixuns_truncdfhi2): New.
(fix_truncsfdi_sse): Call ix86_expand_convert_sign_didf_sse.
(floatunsdidf2): Call ix86_expand_convert_uns_didf_sse.
(floatunssisf2): Add call to ix86_expand_convert_uns_sisf_sse.
(floatunssidf2): Allow nonimmediate source.
* gcc/config/i386/sse.md (movdi_to_sse): New. (vec_concatv2di): Drop '*'.
* gcc/config/i386/i386-protos.h (ix86_expand_convert_uns_si_sse,
ix86_expand_convert_uns_didf_sse, ix86_expand_convert_uns_sidf_sse,
ix86_expand_convert_uns_sisf_sse, ix86_expand_convert_sign_didf_sse): New.
* gcc/config/i386/i386.c (ix86_expand_convert_uns_si_sse,
ix86_expand_convert_uns_didf_sse, ix86_expand_convert_uns_sidf_sse,
ix86_expand_convert_uns_sisf_sse, ix86_expand_convert_sign_didf_sse,
ix86_build_const_vector, ix86_expand_vector_init_one_nonzero): New.
(ix86_build_signbit_mask): Fix decl of v, refactor to call ix86_build_const_vector.
(x86_emit_floatuns): Rewrite.
Bootstrapped, DejaGnu for C/C++/ObjC/Obj-C++ on Darwin/x86_32, no regressions.
Bootstrapped, DejaGnu for C/C++/ObjC/Obj-C++ on Linux/x86_32 with TARGET_KEEPS_VECTOR_ALIGNED_STACK kludged ON, and -march=nocona, - mfpmath=sse; no regressions.
RTH: The only "loose end" I'm aware of in this patch is the TARGET_INTER_UNIT_MOVES issue, and I see you're currently addressing it. This patch has been tested pretty thoroughly, so I propose to commit it now and kill movdi_to_sse when you're done.
O.K. to commit onto trunk?
Attachment:
gcc.fsf.cvt6.diffs.txt
Description: Text document
stuart hastings Apple
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |