This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: Optimizing floating point *(2^c) and /(2^c)
On Mar 29, 2010, at 16:30, Tim Prince wrote:
> gcc used to have the ability to replace division by a power of 2 by an fscale instruction, for appropriate targets (maybe still does).
The problem (again) is that floating point multiplication is
just too damn fast. On x86, even though the latency may
be 5 cycles, since the multiplier is fully pipelined, the
throughput is one multiplication per clock cycle, and that's
for non-vectorized code!
For comparison, the fscale instruction breaks down to 30 µops
or something like that, compared to a single µop for most
forms of floating point multiplication. Given that Jeroen
also needs to do floating-point additions, just bouncing
the values between integer and float registers will be
more expensive than the entire multiplication is in the
first place.
> Such targets have nearly disappeared from everyday usage. What remains is the possibility of replacing the division by constant power of 2 by multiplication, but it's generally considered the programmer should have done that in the beginning.
No, this is something the compiler does and should do.
It is well understood that for binary floating point multiplications
division by a power of two is identical to multiplication by its reciprocal,
and it's the compiler's job to select the fastest instruction.
-Geert