This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: gcc will become the best optimizing x86 compiler
On Wed, Jul 30, 2008 at 5:14 PM, Agner Fog <agner@agner.org> wrote:
> Denys Vlasenko wrote:
>>>
>>> 3164 line source file which implements memcpy().
>>> You got to be kidding.
>>> How much of L1 icache it blows away in the process?
>>> I bet it performs wonderfully on microbenchmarks though.
>>>
>
> I agree that the OpenSolaris memcpy is bigger than necessary. However, it is
> necessary to have 16 branches for covering all possible alignments modulo
> 16. This is because, unfortunately, there is no XMM shift instruction with a
> variable count, only with a constant count, so we need one branch for each
> value of the shift count. Since only one of the branches is used, it doesn't
> take much space in the code cache. The speed is improved by a factor 4-5 by
> this 16-branch algorithm, so it is certainly worth the extra complexity.
You forgot to look at PowerPC :
http://cvs.opensolaris.org/source/xref/ppc-dev/ppc-dev/usr/src/lib/libc/ppc/gen/memcpy.s
is that nice and small ?
Dennis Clarke