This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: gcc will become the best optimizing x86 compiler


On Wednesday 30 July 2008 19:14, Agner Fog wrote:
> I agree that the OpenSolaris memcpy is bigger than necessary. However, 
> it is necessary to have 16 branches for covering all possible alignments 
> modulo 16. This is because, unfortunately, there is no XMM shift 
> instruction with a variable count, only with a constant count, so we 
> need one branch for each value of the shift count. Since only one of the 
> branches is used, it doesn't take much space in the code cache. The 
> speed is improved by a factor 4-5 by this 16-branch algorithm, so it is 
> certainly worth the extra complexity.

I tend to doubt that odd-byte aligned large memcpys are anywhere
near typical. malloc and mmap both return well-aligned buffers
(say, 8 byte aligned). Static and on-stack objects are also
at least word-aligned 99% of the time.

memcpy can just use "relatively simple" code for copies in which
either src or dst is not word aligned. This cuts possibilities down
from 16 to 4 (or even 2?).
--
vda


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]