This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

x86 patch: SSE-based FP<=>int conversions, round 4

From: Stuart Hastings <stuart at apple dot com>
To: gcc-patches at gcc dot gnu dot org
Date: Thu, 8 Feb 2007 17:09:13 -0800
Subject: x86 patch: SSE-based FP<=>int conversions, round 4
References: <DC118988-83C5-4D6F-B600-FBBBB4D67834@apple.com>

On Feb 2, 2007, at 2:16 PM, Richard Henderson wrote:
On Mon, Jan 29, 2007 at 09:57:29AM -0800, Stuart Hastings wrote:
Last ping is here:

http://gcc.gnu.org/ml/gcc-patches/2007-01/msg00299.html

Patch awaiting review is here:

http://gcc.gnu.org/ml/gcc-patches/2006-12/msg01541.html
I had a look at this, and there are a number of things wrong with it.
Rather than iterate over this lots of times, I fixed them up; the
result is attached.  In summary:
(Pretty big "summary." ;-)
I'm not fond of TARGET_KEEPS_VECTOR_ALIGNED_STACK preventing these,
but it's a reasonable hack for now.  It'd be nice if we could prove
these vector values never got spilled; if the user is using SSE on
their own, then we can piggy-back on that and use it too.  Or, as
I think I saw Jan mention somewhere in the thread, expand these as
post-reload splitters.  Which doesn't seem like it'd be *that* hard
to do cleanly.  Failing that, we could arrange for reload to emit
movups, et al, when spilling.  All of which is not really relevant
for Darwin, so we'll ignore it for now.
I added a comment about the HImode conversions.
You added a "floatunsdidf3" expander, which would be ignored.  It
must be merged with the 64-bit "floatunsdidf2" expander above.
The movdi_to_sse pattern used which_alternative, but only had one
alternative.  It was clearly intended to have a second alternative
for load from memory,
No,

but got it wrong.
...and yes.

Actually, the original version worked fine without the memory alternative. Jan insisted on the alternative, and it was easier to accommodate him than to convince him. ;-)
 Ideally, this would be done
in the regular movdi pattern that allows inter-unit moves, but we
don't actually have one of those.  I think they were disabled because
it made reload do Truely Horrible Things with register class choices.
We might could revisit that at some point...
I'd rather tackle x86_inter_unit_moves in a separate patch.
All uses of ix86_fp_from_int_const used ULL suffixes, which is not
valid for C90, which we're supposed to support.  I was able to
replace all instances with real_ldexp.
I was able to merge the SF and DF to SI conversion routines.  I
think it's almost possible to write this without using actual
vector modes at all, except for the actual cvttpd2dq insn.  We
need to ensure that the high fp values are valid (prefereably zero).
That's easy to do with vector insns coming up to it, so it seems
silly to hack in a "xorps xmm0, xmm0; movss xmm1, xmm0" before
the actual conversion.  Again, something that wouldn't be
relevant if we expanded this post-reload.
Calling gen_rtx_SUBREG directly is generally bad practice.  In
all of these instances we wanted gen_lowpart or gen_highpart.
I did do away with the clamping in the fp to uint conversions;
the C standard is quite clear that out-of-range values can
produce any result.
I'm not sure the purpose of store_xmm_as_DF.  In what cases does
ix86_expand_vector_extract do the wrong thing?
Historical. IIRC, an ancestor of ix86_expand_vector_{move,extract} generated an MMX instruction in one instance. (Apple's current GCC is still based on 4.0.)
During initial rtl expansion, it's almost always a mistake to
select between MEM and REG.  Many REGs will have been loaded
from REGs, and combine could be convinced to put it back if
you give it half a chance.
Yes, movdi_to_sse was created to enable combine to do this.
 I *think* my changes to
ix86_expand_convert_uns_didf_sse in this regard are correct,
but I havn't done exhaustive testing.
TARGET_INTER_UNIT_MOVES is always FALSE, so this will always go through memory. This is exactly what I was trying to avoid with movdi_to_sse.

Not fixed. I still think movdi_to_sse is the best answer short of a generalized "x86_inter_unit_moves" facility, but I claim no special expertise here.
ix86_expand_convert_uns_sidf_sse didn't need any vector ops.
The unsigned sidf and signed didf operations would be interesting
to expand after reload, Just In Case the value happened to be in
SSE registers or memory to begin with.  For instance, with this
patch we currently generate
       cvtsi2sd    20(%esp), %xmm1
       mulsd        .LC7, %xmm1    { 0x1.0p32 }
       movl        16(%esp), %eax
       subl        $-2147483648, %eax
       cvtsi2sd    %eax, %xmm0
       addsd        .LC3, %xmm0    { 0x1.0p31 }
       addsd        %xmm0, %xmm1
Add two more insns if we disallow inter-unit moves.  Best case
with SSE3 might look like
   movq        16(%esp), %xmm0
   psubd        LC0, %xmm0    { INT_MIN, 0, 0, 0 }
   cvtpi2pd    %xmm0, %xmm0
I'm not sure about this one; AFAICT, there is no such instruction.
   addsd        LC1, %xmm0    { 0x1.0p31 }
   mulpd        LC2, %xmm0    { 1.0, 0x1.0p32 }
   haddpd        %xmm0, %xmm0
I've only tested this lightly so far.  I'll add a hack to force
linux to assume aligned stacks (true during bootstrap at least)
and build there.
I fixed a typo in x86_emit_floatuns.

I added one more conversion.

I added your name to the ChangeLog entry because you rewrote so much of the patch.
2007-02-08  Stuart Hastings  <stuart@apple.com>
        Richard Henderson  <rth@redhat.com>
* gcc/config/i386/i386.h (TARGET_KEEPS_VECTOR_ALIGNED_STACK): New. * gcc/config/i386/darwin.h: (TARGET_KEEPS_VECTOR_ALIGNED_STACK): New. * gcc/config/i386/i386.md (fixuns_trunc<mode>si2, fixuns_truncsfhi2, fixuns_truncdfhi2): New. (fix_truncsfdi_sse): Call ix86_expand_convert_sign_didf_sse. (floatunsdidf2): Call ix86_expand_convert_uns_didf_sse. (floatunssisf2): Add call to ix86_expand_convert_uns_sisf_sse. (floatunssidf2): Allow nonimmediate source. * gcc/config/i386/sse.md (movdi_to_sse): New. (vec_concatv2di): Drop '*'. * gcc/config/i386/i386-protos.h (ix86_expand_convert_uns_si_sse, ix86_expand_convert_uns_didf_sse, ix86_expand_convert_uns_sidf_sse, ix86_expand_convert_uns_sisf_sse, ix86_expand_convert_sign_didf_sse): New. * gcc/config/i386/i386.c (ix86_expand_convert_uns_si_sse, ix86_expand_convert_uns_didf_sse, ix86_expand_convert_uns_sidf_sse, ix86_expand_convert_uns_sisf_sse, ix86_expand_convert_sign_didf_sse, ix86_build_const_vector, ix86_expand_vector_init_one_nonzero): New. (ix86_build_signbit_mask): Fix decl of v, refactor to call ix86_build_const_vector. (x86_emit_floatuns): Rewrite.

Bootstrapped, DejaGnu for C/C++/ObjC/Obj-C++ on Darwin/x86_32, no regressions. Bootstrapped, DejaGnu for C/C++/ObjC/Obj-C++ on Linux/x86_32 with TARGET_KEEPS_VECTOR_ALIGNED_STACK kludged ON, and -march=nocona, - mfpmath=sse; no regressions.

RTH: The only "loose end" I'm aware of in this patch is the TARGET_INTER_UNIT_MOVES issue, and I see you're currently addressing it. This patch has been tested pretty thoroughly, so I propose to commit it now and kill movdi_to_sse when you're done.

O.K. to commit onto trunk?

I suppose the review would be easier if I attached a patch to review. Duh.

Attachment: gcc.fsf.cvt6.diffs.txt
Description: Text document

stuart hastings
Apple

:ADDPATCH target/x86:

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]