This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Optimizing floating point *(2^c) and /(2^c)


On 29 March 2010 19:51, Geert Bosch <bosch@adacore.com> wrote:
>
> On Mar 29, 2010, at 13:19, Jeroen Van Der Bossche wrote:
>
>> 've recently written a program where taking the average of 2 floating
>> point numbers was a real bottleneck. I've looked into the assembly
>> generated by gcc -O3 and apparently gcc treats multiplication and
>> division by a hard-coded 2 like any other multiplication with a
>> constant. I think, however, that *(2^c) and /(2^c) for floating
>> points, where the c is known at compile-time, should be able to be
>> optimized with the following pseudo-code:
>>
>> e = exponent bits of the number
>> if (e > c && e < (0b111...11)-c) {
>> e += c or e -= c
>> } else {
>> do regular multiplication
>> }
>>
>> Even further optimizations may be possible, such as bitshifting the
>> significand when e=0. However, that would require checking for a lot
>> of special cases and require so many conditional jumps that it's most
>> likely not going to be any faster.
>>
>> I'm not skilled enough with assembly to write this myself and test if
>> this actually performs faster than how it's implemented now. Its
>> performance will most likely also depend on the processor
>> architecture, and I could only test this code on one machine.
>> Therefore I ask to those who are familiar with gcc's optimization
>> routines to give this 2 seconds of thought, as this is probably rather
>> easy to implement and many programs could benefit from this.
>
> For any optimization suggestions, you should start with showing some real, compilable, code with a performance problem that you think the compiler could address. Please include details about compilation options, GCC versions and target hardware, as well as observed performance numbers. How do you see that averaging two floating point numbers is a bottleneck? This should only be a single addition and multiplication, and will execute in a nanosecond or so on a moderately modern system.
>
> Your particular suggestion is flawed. Floating-point multiplication is very fast on most targets. It is hard to see how on any target with floating-point hardware, manual mucking with the representation can be a win. In particular, your sketch doesn't at all address underflow and overflow. Likely a complete implementation would be many times slower than a floating-point multiply.
>
> ?-Geert

The program in question was a scientific computation where most of the
main loop was spent doing two floating point operations, one of which
was taking the average of two numbers. Granted, there is no
"performance problem" with the code as it is now, it executes in as
much time as would be expected. However, an optimization need not be a
10% performance gain to be called an optimization, right? If we could
get this multiplication, which you say executes in about a nanosecond,
to execute in half a nanosecond, why not do it?

With all due respect, your statement that "your sketch doesn't at all
address underflow and overflow" is just plain wrong. Did you even look
at the code? The if statement is there exactly to address under- and
overflow and nothing else. It's not because it looks simple that I
didn't think it through. I know exactly how floating point
multiplication works, and this implementation of *(2^c) doesn't
violate under/overflow, denormalisation when e=0, rounding rules or
rules about NaN/infinity.

On my processor (Intel Core2, x86-64), a mulsd instruction takes about
as much time as 5 add instructions, though I have to admit that my
method of determining that are highly unscientific. Nevertheless, if
someone competent with optimizing assembly would be able to write the
entire thing so that the if=true-path uses 4 instructions that takes
as much time as an add instruction, then that would constitute a 20%
gain.

Jeroen


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]