This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH, i386]: Expand round(a) = sgn(a) * floor(fabs(a) + 0.5) using SSE4 ROUND insn


On Mon, Aug 15, 2011 at 5:25 PM, Michael Matz <matz@suse.de> wrote:

> On Mon, 15 Aug 2011, Michael Matz wrote:
>
>> > > .LFB0:
>> > > ? ? ? ?.cfi_startproc
>> > > ? ? ? ?movsd ? .LC0(%rip), %xmm2
>> > > ? ? ? ?movapd ?%xmm0, %xmm1
>> > > ? ? ? ?andpd ? %xmm2, %xmm1
>> > > ? ? ? ?andnpd ?%xmm0, %xmm2
>> > > ? ? ? ?addsd ? .LC1(%rip), %xmm1
>> > > ? ? ? ?roundsd $1, %xmm1, %xmm1
>> > > ? ? ? ?orpd ? ?%xmm2, %xmm1
>> > > ? ? ? ?movapd ?%xmm1, %xmm0
>> > > ? ? ? ?ret
>> >
>> > Hm, why do we need the sign-copy? ?If I read the docs correctly
>> > we can simply use roundsd directly, no?
>>
>> round-half-away-from-zero breaks your neck. ?round[ps][sd] only supports
>> the usual four IEEE rounding modes.
>
> But, you should be able to apply the sign to the 0.5, which wouldn't
> require building the absolute value of input:
>
> round(x) = trunc(x + (copysign (0.5, x)))
>
> which should roughly be expanded to:
>
> ? ? ? ?movsd ? signbits(%rip), %xmm1
> ?? ? ?andpd ? %xmm0, %xmm1
> ?? ? ?movsd ? nextof0.5(%rip), %xmm2
> ? ? ? orpd ? ?%xmm1, %xmm2
> ? ? ? addpd ? %xmm2, %xmm0
> ?? ? ?roundsd $1, %xmm0, %xmm0
> ? ? ? ?ret
>
> Which has one logical operation less (and one move because I chose a more
> optimal register assignment).

Thanks for the suggestion, I will implement and test it ASAP.

Uros.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]