This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH, i386]: Insert "cld" via optimize mode switching


> Hi,
> 
> On Mon, 4 Dec 2006, Uros Bizjak wrote:
> 
> > >but will fail working with those changes.  IMHO such routines would be 
> > >extremely rare, if they exist at all, so that shouldn't hinder full 
> > >progress by using mode switching for the direction flag.
> >
> > Going a bit further, and based on following assumptions:
> > 
> > 1) function entry and exit modes are mandated by ABI, so we _know_ that
> > direction bit is cleared there
> > 2) asm should take care by itself to issue CLD before exit
> > 3) we never emit STD anywhere
> > 
> > it is possible to argue, that gcc _does not need to emit any_ CLD
> > instructions.
> 
> Yes, it is.  I never quite understood why we emit them anyway, the only 
> possibility is to care for broken libraries (e.g. in some 3rd party 
> libraries) or broken user asm code, i.e. conservativeness.  One data point 
> is that I have one program in /usr/bin which uses 'std', and it's mplayer, 
> and that one does cld afterwards.  I think if you were to rip out the cld 

As author of the current cld emitting code, I can just say that I kept
it around for single reason, that the old GCC always emitted it and I
didn't find any specification in the 32bit ABI (in 64bit ABI we specify
CLD as clear and I did have patch to disable CLD codegen there that
probably got lost somewhere on it's way to mainline).   Other minor
reason for me was that we never specified if the asm inline blocks are
required to cld after std, but as Michael correctly observed, there is
very little need for this.

I was also originally thinking about memmove inline expander, but it
don't seem that important to me (since we don't really want to copy
backwards anyway, we probably might want to inline memmove internally in
a way checking that blocks don't overlap in negative direction and are
small and use our memcpy inliners, emitting library call otherwise)

I would not be oposed to not emitting it at all - it just never seemed
important enought for me as it is short and rather fast on most modern
CPUs.  It seems better alternative to me than introducing the mode
switching pass that has some compilation time implications.

Also note that CSE/GCSE is currently pretty capable on eliminating many
redundant CLDs. 
> > Finally, according to pentium optimization guide by Agner Fog, std and 
> > cld have astonishing latency of 48 and 52 clks (I still hope for the 
> > possibility that there is some kind of error).

This is prety weird.  Does this apply to original Pentium?

Honza
> 
> Nah, that's wrong since long.  For instance K8: cld has latency 1 (it's a 
> directpath insn) and std latency 2 (a double directpath insn).  It does 
> incur a dependency on the flag register (though I wonder if that isn't 
> short circuited if there was no actual change), so it probably doesn't 
> matter much either way (especially considering that it's usually followed 
> by string ops, which take much longer).
> 
> 
> Ciao,
> Michael.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]