This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: Ping 2: x86 patch: SSE-based FP<=>int conversions, round 3

From: Richard Henderson <rth at redhat dot com>
To: Stuart Hastings <stuart at apple dot com>
Cc: gcc-patches at gcc dot gnu dot org
Date: Fri, 2 Feb 2007 14:16:45 -0800
Subject: Re: Ping 2: x86 patch: SSE-based FP<=>int conversions, round 3
References: <2FBF78EC-9BF4-4F9F-BD7A-8C2D747DA7F9@apple.com> <F2F2BC3E-F74C-4F27-88BC-7E652104DE54@apple.com>

On Mon, Jan 29, 2007 at 09:57:29AM -0800, Stuart Hastings wrote:
> Last ping is here:
> 
> 	http://gcc.gnu.org/ml/gcc-patches/2007-01/msg00299.html
> 
> Patch awaiting review is here:
> 
> 	http://gcc.gnu.org/ml/gcc-patches/2006-12/msg01541.html

I had a look at this, and there are a number of things wrong with it.
Rather than iterate over this lots of times, I fixed them up; the
result is attached.  In summary:

I'm not fond of TARGET_KEEPS_VECTOR_ALIGNED_STACK preventing these,
but it's a reasonable hack for now.  It'd be nice if we could prove
these vector values never got spilled; if the user is using SSE on
their own, then we can piggy-back on that and use it too.  Or, as
I think I saw Jan mention somewhere in the thread, expand these as
post-reload splitters.  Which doesn't seem like it'd be *that* hard
to do cleanly.  Failing that, we could arrange for reload to emit
movups, et al, when spilling.  All of which is not really relevant
for Darwin, so we'll ignore it for now.

I added a comment about the HImode conversions.

You added a "floatunsdidf3" expander, which would be ignored.  It 
must be merged with the 64-bit "floatunsdidf2" expander above.

The movdi_to_sse pattern used which_alternative, but only had one
alternative.  It was clearly intended to have a second alternative
for load from memory, but got it wrong.  Ideally, this would be done
in the regular movdi pattern that allows inter-unit moves, but we
don't actually have one of those.  I think they were disabled because
it made reload do Truely Horrible Things with register class choices.
We might could revisit that at some point...

I'd rather tackle x86_inter_unit_moves in a separate patch.

All uses of ix86_fp_from_int_const used ULL suffixes, which is not
valid for C90, which we're supposed to support.  I was able to 
replace all instances with real_ldexp.

I was able to merge the SF and DF to SI conversion routines.  I 
think it's almost possible to write this without using actual 
vector modes at all, except for the actual cvttpd2dq insn.  We
need to ensure that the high fp values are valid (prefereably zero).
That's easy to do with vector insns coming up to it, so it seems
silly to hack in a "xorps xmm0, xmm0; movss xmm1, xmm0" before
the actual conversion.  Again, something that wouldn't be 
relevant if we expanded this post-reload.

Calling gen_rtx_SUBREG directly is generally bad practice.  In
all of these instances we wanted gen_lowpart or gen_highpart.

I did do away with the clamping in the fp to uint conversions;
the C standard is quite clear that out-of-range values can 
produce any result.

I'm not sure the purpose of store_xmm_as_DF.  In what cases does
ix86_expand_vector_extract do the wrong thing?

During initial rtl expansion, it's almost always a mistake to 
select between MEM and REG.  Many REGs will have been loaded 
from REGs, and combine could be convinced to put it back if 
you give it half a chance.  I *think* my changes to 
ix86_expand_convert_uns_didf_sse in this regard are correct, 
but I havn't done exhaustive testing.

ix86_expand_convert_uns_sidf_sse didn't need any vector ops.

The unsigned sidf and signed didf operations would be interesting
to expand after reload, Just In Case the value happened to be in 
SSE registers or memory to begin with.  For instance, with this 
patch we currently generate

        cvtsi2sd	20(%esp), %xmm1
        mulsd	 	.LC7, %xmm1	{ 0x1.0p32 }
        movl		16(%esp), %eax
        subl		$-2147483648, %eax
        cvtsi2sd	%eax, %xmm0
        addsd		.LC3, %xmm0	{ 0x1.0p31 }
        addsd		%xmm0, %xmm1

Add two more insns if we disallow inter-unit moves.  Best case
with SSE3 might look like

	movq		16(%esp), %xmm0
	psubd		LC0, %xmm0	{ INT_MIN, 0, 0, 0 }
	cvtpi2pd	%xmm0, %xmm0
	addsd		LC1, %xmm0	{ 0x1.0p31 }
	mulpd		LC2, %xmm0	{ 1.0, 0x1.0p32 }
	haddpd		%xmm0, %xmm0

I've only tested this lightly so far.  I'll add a hack to force
linux to assume aligned stacks (true during bootstrap at least)
and build there.

r~

Attachment: z
Description: Text document

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]