This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: Fldcw, rounding and optimizations

From: Sebastien Loisel <loisel at math dot mcgill dot ca>
To: Jamie Lokier <jamie at shareable dot org>
Cc: "Loisel, Sebastien" <loisel at amazon dot com>, Fergus Henderson <fjh at cs dot mu dot OZ dot AU>, Andrew Pinski <pinskia at physics dot uc dot edu>, <gcc at gnu dot org>
Date: Sat, 16 Aug 2003 04:57:54 -0400 (EDT)
Subject: Re: Fldcw, rounding and optimizations

On Sat, 16 Aug 2003, Jamie Lokier wrote:

> Loisel, Sebastien wrote:
> > > 2) Is there a way to do this already? I know you can already specify
> > > dependencies of asm blocks on variables and such, and I presume this
> > > is used in the optimizer to prevent foul-ups. What's the best way of
> > > convincing gcc not to move my asm blocks around?
> > 
> > Nothing to see here, move along. I just found out about asm volatile. Sorry for the noise.
> 
> Does it fix your problem?

Well it helped, but it seemed to generate really poor code. My best 
solution thus far is this:

int __roundceil=DEFAULT_CW|0x800;
#define RU "fldcw __roundceil\n"
template <class T>
inline T add_hi(T a, T b)
{ asm (RU "fadd %1,%2" : "=&t"(a) : "f"(b), "0"(a)); return a; }

This way, the optimizer is still allowed to reshuffle fpu operations, but 
keeps the rounding modes straight as much as I need them to be.

The code is still not super good (I get an fadd every fourth or 
fifth instruction.)

I'm still not 100% certain the generated code is correct, I'll have to 
write extensive testing software to determine that.

What I fear though is that it may be very difficult to write efficient 
interval arithmetic code with gcc. The portable code (using fenv.h) is 
bound to be slow, with one call per rounding mode change 
(bits/fenvinline.h really ought to have asm volatile inlines -- but even 
if it did, the generated code would suck.) The nonportable approach I'm 
using above seems to do a bit better (I'll be exercising it later to make 
sure) but I think it's still less good than what one might want.

Cheers,

Sebastien Loisel

Follow-Ups:
- Re: Fldcw, rounding and optimizations
  - From: Jamie Lokier

References:
- Re: Fldcw, rounding and optimizations
  - From: Jamie Lokier

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]