This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Re: const folding vs multiply-add


On  8 Nov, Dale Johannesen wrote:

> Of course, the current behavior will be better in some other
> cases.  Getting all the cases right seems difficult, and
> it's not obvious to me how to approach this.  I could
> probably teach combine to make x+x-z into a multiply-add
> instruction, but then it's too late to pull the resulting
> 2.0 out of the loop.  I'm inclined to just whack the tree-based
> optimization out; since it's done before const propagation,
> the gain isn't great anyway, and many users routinely write
> x+x to begin with if they care about performance.  There's
> no hook to do that, so this would presumably remain an
> Apple-specific change.  Any better ideas?

I'd say IF a CPU has a fused multiply-add instruction AND it
has the same latencies as a normal add AND we don't need to
generate the operand to multiply the number with then go for
the fused instruction otherwise use the simple add.
Normally we should always end up with the most complex instructions on
cpus like the 7400 if it doesn't need additional work to get some
numbers loaded into registers because have the ability to get more 
work done in the same time.

Back to your question; do you see a pattern in the noticed behaviour?
We could peephole it then.... I probably don't understand your question
to its complete extend, maybe I should start playing around a bit with
the example loop above to figure out what we could do here.

--
Servus,
       Daniel


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]