This is the mail archive of the
mailing list for the GCC project.
Re: [PATCH, i386]: Insert "cld" via optimize mode switching
On Mon, 4 Dec 2006, Uros Bizjak wrote:
> >but will fail working with those changes. IMHO such routines would be
> >extremely rare, if they exist at all, so that shouldn't hinder full
> >progress by using mode switching for the direction flag.
> Going a bit further, and based on following assumptions:
> 1) function entry and exit modes are mandated by ABI, so we _know_ that
> direction bit is cleared there
> 2) asm should take care by itself to issue CLD before exit
> 3) we never emit STD anywhere
> it is possible to argue, that gcc _does not need to emit any_ CLD
Yes, it is. I never quite understood why we emit them anyway, the only
possibility is to care for broken libraries (e.g. in some 3rd party
libraries) or broken user asm code, i.e. conservativeness. One data point
is that I have one program in /usr/bin which uses 'std', and it's mplayer,
and that one does cld afterwards. I think if you were to rip out the cld
generation of gcc noone would ever see any error resulting from that, so
perhaps we should try that (unless someone remember a very good reason why
that is too optimistic).
> Finally, according to pentium optimization guide by Agner Fog, std and
> cld have astonishing latency of 48 and 52 clks (I still hope for the
> possibility that there is some kind of error).
Nah, that's wrong since long. For instance K8: cld has latency 1 (it's a
directpath insn) and std latency 2 (a double directpath insn). It does
incur a dependency on the flag register (though I wonder if that isn't
short circuited if there was no actual change), so it probably doesn't
matter much either way (especially considering that it's usually followed
by string ops, which take much longer).