This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Re: floor on i386


On Tue, Sep 25, 2001 at 03:46:59PM +0200, Jan Hubicka wrote:
> > It would probably be best to introduce a hard register to indicate the
> > rounding mode, and use OPTIMIZE_MODE_SWITCHING to do as few mode
> > changes as possible.  For reference, have a look at the SH4
> > implementation of floating-point support, that defines an explicit
> > floating-point control register, mode-switching RTL and USEs that
> 
> The USEs itself are problem - you loose a lot of optimizations then.
> The trick can be to lower code before reload using pre-reload splitting.
> 
> Major problem still remains in reload.
> If we don't want to get exact IEEE by setting proper precisity before each
> mathematic operation (as SH4 does IMO), we will run into problems with spills ,
> since these can be put in place control word is set to some wrong value
> resutlting in wrong rounding before storing.
> 
> Thats the main purpose why my original patch didn't contained it.
> 
> The problem can be solved by mode switching pass after reload, when all spills
> are visible - you use existing pass before reload to compute control word values
> as these needs pseudos and after reload just insert fldcw/fstcw at strategic places.
> 
> If you insert them at last optimal position in code, you will get them after the
> lazy code to compute control word inserted by pre-reload pass.
> 
> As disussed with Timothy, the benefits are relativly small compared to the first
> half (computing control word values optimally), as CPUs do have hardware bypass.
> 
> > register in all instructions that depend on the floating-point mode,
> > indicating in an attribute which mode the register is supposed to be
> > in.  The difference is that SH4 uses the floating-point control
> > register to switch between single- and double-precision operations,
> > that have the same encoding but different behavior depending on the
> > state of the control register.  Modeling mode switching for purposes
> > of rounding on x86 should be far simpler.  In fact, I'm not even sure
> > you'd need the hard register: just define unspec patterns that switch
> > back and forth and you're done.
> You need scheduling barrier, but it is big problem.
> 

Note, that this optimization is necessary if gcc don't want to have 4% of
the performance of icc for Intel iA32. For example a MPEG-2 Layer 2 decoder
spends 65% of the execution time in rounding floats to integers (Athlon).
This is not a joke, it's a flaw of the compiler.

Currently gcc is unusable if you need fast float to int convertion
(Video).
______________________________________________________________________

Another work-around is the following. It can be implemented very fast.

enum rounding_model_e {
    round_default = 0x0000,
    round_floor   = 0x0400,
    round_ceil    = 0x0800,
    round_trunc   = 0x0C00,
    round_round   = 0x0000
}

enum rounding_model_e  set_rounding_model ( enum rounding_model_e );

double         rint    ( double );
float          rintf   ( float );
long double    rintl   ( long double );
int            irint   ( double );		// 64 bit float to 32 bit int
long long      llrintl ( long double );		// 80 bit float to 64 bit int

Other target types ( signed|unsigned, char|short|int|long|long long) are
also possible, also other saturation models (wrap|saturate|integerinfinity).

-- 
Frank Klemm

PS: The are CPUs with the following mapping of 32 bit integers:
    0x80000001...0x7FFFFFFF:   -2^31+1 ... +2^31-1
    0x80000000:                integer NAN



Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]