This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Re: Strict aliasing affects glibc 2.1.1 as well as Linux


Martin v. Loewis wrote:
> 
> I believe Linux could do
> 
> #define memcpy(t, f, n) \
> (__builtin_constant_p(n) ? \
>  __builtin_memcpy((t),(f),(n)) : \
>  __memcpy((t),(f),(n)))
> 
> today and not lose anything (at least on x86, didn't check all
> architectures). IMHO, it is the compiler's job to know all about
> efficient copying on a specific processor. Of course, the kernel

yep, that works, in fact that's exactly what my memcpy have been
looking like for a while :)


> developer very often also knows these things, but I'm sure he'll use
> this knowledge to complain when the compiler emits inefficient code
> :-)

ok, then here are some thoughts, after having spent way too
much time optimizing linux/include/asm-i386/string.h :)

o the above memcpy is a bit better than current kernel memcpy
o with a similar memset(p,0,c) it's not obvious which one wins
o __builtin_strlen is, hmm, different [1]
o what are the requirements for builtin_strcmp to _not_ fallback
  to the library version?
o why is the compiler emitting multiple clds within one function
  (no branches in between)?
o how do I turn off the clds? (all, except after stds)
  [i've seen a 17% slowdown for a 32byte memcpy due to the CLD]
o how do i turn off the library fallback?
o what's the point w/ -ffreestanding (specifically the implied
  -fno-builtins)? [like why should it not inline eg abs() - this
  shows up when compiling the kernel :) Turning the lib fallback
  off completely would be a much more logical thing to do in a
  freestanding enviroment]
o in general the mem builtins seem to reduce the register
  pressure a bit - that's why i sometimes chose them, even if
  they themselfes were not an improvement.
  
bottom line: it's currently not possible to rely on gcc doing the
right thing with the builtins where it matters (for "normal" apps it
usually doesn't, so it's not an issue there). One can spend a lot of
time trying to trick the compiler into doing what you you want (as
there's almost no docs on this, you have to try different solutions
to discover the rules and find one that works best). Then it's
simply a lot cheaper to write the routines in question as asm
inlines and not have to second guess the compiler. (even if you know
what one version does you can't rely on that behaviour in
earlier/later releases etc).



[1] kernel strlen vs gcc2.95 __builtin_strlen:
@@ -3728,12 +3727,19 @@ <mtrr_write>:
        movl   $0XXXXXXXX,%eXX
        jmp   XXXXXXXX <mtrr_write+0xXXX>
        movl   0XXXXXXXX(%eXX),%eXX
-       movl   $0XXXXXXXX,%eXX
-       repnz scasb %es:(%eXX),%Xl
-       notl   %eXX
-       decl   %eXX
-       movl   0XXXXXXXX(%eXX),%eXX
-       addl   %eXX,%eXX
+       movl   (%eXX),%eXX
+       testb  %Xl,%Xl
+       jXX    XXXXXXXX <mtrr_write+0xXXX>
+       testb  %Xh,%Xh
+       jXX    XXXXXXXX <mtrr_write+0xXXX>
+       testl  $0XXXXXXXX,%eXX
+       jXX    XXXXXXXX <mtrr_write+0xXXX>
+       addl   $0x4,%eXX
+       testl  $0XXXXXXXX,%eXX
+       jXX   XXXXXXXX <mtrr_write+0xXXX>
+       subl   $0x3,%eXX
+       incl   %eXX
+       incl   %eXX
        leal   0XXXXXXXX(%eXX),%eXX
        movl   %eXX,0XXXXXXXX(%eXX)
        cmpb   $0xa,0XXXXXXXX(%eXX)



Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]