preliminary patch: prefetch support for i386

Jan Hubicka jh@suse.cz
Wed Nov 21 12:34:00 GMT 2001


> On Fri, Nov 30, 2001 at 09:18:19PM +0100, Jan Hubicka wrote:
> > first of all, thanks for the patch. It is something I really
> > wanted to do for long time.
> 
> I know, I'm hoping that the existence of a general framework will
> inspire you to update your prefetch optimizations for arrays in loops

I will install your updated patch to cfg-branch. As 3.1 is feature freezing at
15th, we have some time for 3.2.x release. I hope that we will get working CFG
based loop optimizer till then and working AST loop optimizer as well. The
prefetch code then should recognize the possiblities at tree level and emit
prefetches at lower level, most probably.

> and perhaps greedy prefetching of addresses in pointers!  Please let me

I still have the primitive code to do that.  What is missing is to recognize
the pointers in structures whose addresses are fetched and prefetch them.
I am not sure this indirection is correct in C.  May I assume that if
I have pointer to structure and I know that program reads some of it's fields,
the other fields are accessible too?

> know what your plans are for this and what I can do to help.
> 
> > > Any suggestions on how to provide generic data prefetch support in
> > > i386.md would be greatly appreciated.
> > What do you mean exactly?
> [this was at the end of your mail]
> 
> Sorry, that wasn't very clear.  What I'm really after is knowing if I'm
> using the correct approach by supporting one or the other flavor of data
> prefetch instructions for i386 targets based on a combination of
> cpu-type and extensions.  Having additional cpu-types would make it more
> clear, but it sounds like there are also some machines that would still
> require a combination of cpu-type and an option like -msse.  If so, that
> complicates getting the right values for SIMULATANEOUS_PREFETCH and
> PREFETCH_BLOCK, which currently come from the cost structures, as in
> your prefetch proposal from May 2000.  I hadn't realized until today

I think that the block size/number of prefetches is OK to come from structure,
as every major CPU core has fixed those features. THe flavours of CPUs just
differs in what extensions does it support - SSE/3dNOW etc and we may
get that from TARGET_* macros.

As I've mentioned in toher mail, I believe it is sane to have switch for
every sold CPU flavour - like -mcpu=pentium and -mcpu=pentiummmx

It is bit confusing still as there are Athlon, Athlon tbirds, Mobile
Athlons and Athlon MPs, where only the last matters for us as it has
SSE support.

> that it doesn't cover targets that only support prefetch with an
> extension set like SSE.
> 
> I put together some tests to check that the correct (or no) prefetch
> instructions are generated for various i386 targets; they are appended
> to this message.  There's also a need to check that invalid combinations
> of cpu-type and extension options aren't allowed, but that's something I
> don't know about at all.  I know that my lists are not complete, they're
> just a start.

I am also not sure. The idea with -mcpu selection gets crazy on whether
we should have -mcpu=athlon-tbird just because we need to support -mcpu=athlon-xp
and we do not want to put confused user to question when deciding whether his
tbird is more like -mcpu=athlon or -mcpu=athlon-xp or -mcpu=athlon-whatever.

Having separate switches looks confused to me too, as Joe, the user, probably
don't know what flavour of SSE, 3dNOW and other features his cpu supports....
> 
> > I think the proper sollution is to include the pentium2/pentium3 switches
> > now when we can use some of their features.  I can bring it soon,
> > but as your patch appears to be getting in, I can wait for it to get
> > installed.
> 
> I'll resubmit the generic and ia64 prefetch patch on Monday after
> bootstrapping and testing again, but I'm still quite uncomfortable with
> the i386 support patch.  I could submit it without the broken
> SIMULTANEOUS_PREFETCH and PREFETCH_BLOCK support and you could work from
> that, or else you could rework my i386 prefetch patch as part of your
> patch to add new cpu-type switches if you decide to do that.
> 
> > I remember that the property of SSE prefetch is that it is nop for older
> > CPUs, so I guess it should be controlled by -mcpu instead of -march.
> 
> It might be best to not generate prefetch instructions for CPUs where
> they are nops, but then again if there is a call to __builtin_prefetch
> we could assume that the programmer really wants them.  Even as nops,
> though, they make the code larger without adding anything.

What I was wondering about is switch like -mcpu=pentium4 saying optimize
for pentium4, but do not use anything incompatible with i386. That still
can generate the prefetch instructions for SSE, so the setting should
depdend on CPU selection, while the 3dNOW prefetch is invalid instruction
for earlier CPUs so it must depdend on ARCH selection.

> 
> > Also writting the program, how I will get informed about whether the
> > prefetch builtin is supported or not.
> 
> As things are now, by looking at the generated code.  I had thought it
> was a feature to silently treat __builtin_prefetch as a nop on targets
> that don't have data prefetch support, but perhaps a warning would be
> appropriate.  There aren't other builtins that are safe to use when not
> supported, so I didn't have an example to follow.

I guess siletnly ignoring them is OK. The prefetch builtin is more a hint
to compiler compiler may or may not use. Perhaps we can have
warning as an option, but I would disable it by default, as code written
for machine with prefetches should compile on machines w/o prefetches
and there should be way to make it compile w/o warnings, that would
need ifdefs otherwise.

Honza



More information about the Gcc-patches mailing list