This is the mail archive of the
mailing list for the GCC project.
Re: P3 SSE/MMX support: adding the patterns
On Wed, Sep 06, 2000 at 02:41:16PM +0100, Bernd Schmidt wrote:
> > Perhaps we ought to make calls.c generate these? Seems silly to
> > have to define nine variants of the same thing.
> It is silly, but the easiest thing to do. I can think of two ways to
> improve this: either implement a macro mechanism for md files, or fix
> emit_push_insn so that it uses pushxx as a named pattern, and falls back
> to add/move when it fails. Both are likely to be quite a bit of extra
> work (the latter because we may need to change a lot of ports).
I wouldn't have guessed that fixing emit_push_insn would require
changing lots of ports. In any case it's no big deal, just something
that's sorta irritated me for a while.
> This pattern is generated by the loadhps/storehps builtins, both of which
> ensure that one argument is a MEM.
> I have to admit I'm bewildered by the ia64 "mf" pattern. It creates
> (mem:BLK (mem:BLK (scratch)))
> and I fail to see the point of the two nested MEMs.
Heh. The nested mems are actually a cut and paste error. But
discounting that, we've got a read and a write to unspecified
volatile memory, which should alias with everything, and so prevent
any memory reference from crossing it.
> Isn't the point where the prefetch is added rather critical for getting
> good performance? At least I think we should disallow scheduling memory
> refernces across a prefetch,
Yeah, it should be early enough, but not too early. But think of it
the other way around -- with the volatile, the prefetch can't move up
either. And really, the prefetch should percolate up to the first
pipeline bubble after its address is ready.
> or we might end up moving the prefetch after the "real"
> memory reference that it belongs to.
If the real memory reference is that close, you're wasting your time
with the prefetch. You should be prefetching memory 4 to 8 iterations
in front of where you're working. Remember, the point is to overcome
the 35 cycle wait for L2/3 cache or the 100 cycle wait for main memory.
That's a lot of time, which implies you've got to put the prefetch
well in advance of when the data will be needed.