This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Optimizing floating point *(2^c) and /(2^c)


On Mar 29, 2010, at 16:30, Tim Prince wrote:
> gcc used to have the ability to replace division by a power of 2 by an fscale instruction, for appropriate targets (maybe still does).
The problem (again) is that floating point multiplication is 
just too damn fast. On x86, even though the latency may 
be 5 cycles, since the multiplier is fully pipelined, the 
throughput is one multiplication per clock cycle, and that's
for non-vectorized code!

For comparison, the fscale instruction breaks down to 30 µops
or something like that, compared to a single µop for most
forms of floating point multiplication. Given that Jeroen
also needs to do floating-point additions, just bouncing
the values between integer and float registers will be
more expensive than the entire multiplication is in the
first place.

> Such targets have nearly disappeared from everyday usage.  What remains is the possibility of replacing the division by constant power of 2 by multiplication, but it's generally considered the programmer should have done that in the beginning.

No, this is something the compiler does and should do. 
It is well understood that for binary floating point multiplications
division by a power of two is identical to multiplication by its reciprocal,
and it's the compiler's job to select the fastest instruction.

  -Geert


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]