This is the mail archive of the
mailing list for the GCC project.
Re: const folding vs multiply-add
- To: dalej at apple dot com
- Subject: Re: const folding vs multiply-add
- From: degger at fhm dot edu
- Date: Sat, 10 Nov 2001 00:07:33 +0100 (CET)
- Cc: gcc at gcc dot gnu dot org
- Reply-To: degger at fhm dot edu
On 8 Nov, Dale Johannesen wrote:
> Of course, the current behavior will be better in some other
> cases. Getting all the cases right seems difficult, and
> it's not obvious to me how to approach this. I could
> probably teach combine to make x+x-z into a multiply-add
> instruction, but then it's too late to pull the resulting
> 2.0 out of the loop. I'm inclined to just whack the tree-based
> optimization out; since it's done before const propagation,
> the gain isn't great anyway, and many users routinely write
> x+x to begin with if they care about performance. There's
> no hook to do that, so this would presumably remain an
> Apple-specific change. Any better ideas?
I'd say IF a CPU has a fused multiply-add instruction AND it
has the same latencies as a normal add AND we don't need to
generate the operand to multiply the number with then go for
the fused instruction otherwise use the simple add.
Normally we should always end up with the most complex instructions on
cpus like the 7400 if it doesn't need additional work to get some
numbers loaded into registers because have the ability to get more
work done in the same time.
Back to your question; do you see a pattern in the noticed behaviour?
We could peephole it then.... I probably don't understand your question
to its complete extend, maybe I should start playing around a bit with
the example loop above to figure out what we could do here.