This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: RELEASE BLOCKER: Linux doesn't follow x86/x86-64 ABI wrt direction flag


Olivier Galibert wrote:
> On Wed, Mar 05, 2008 at 05:12:07PM -0800, H. Peter Anvin wrote:
>> It's a kernel bug, and it needs to be fixed.
> 
> I'm not convinced.  It's been that way for 15 years, it's that way in
> the BSD kernels, at that point it's a feature.  The bug is in the
> documentation, nowhere else.  And in gcc for blindly trusting the
> documentation.

well, you could see this either way -- either the kernel is buggy and
needs to be fixed or the current behavior is correct and the abi needs
an errata. If there were no performance implications i'd go for the
latter, mostly because of the security aspect.
But this thread made me dig up an old benchmark and apparently omitting
the cld before the string ops makes a significant difference; on P2 it
was ~8%, on P4 it's ~6% for 1480 byte copies; for 32 byte ones the gain
is more like 90% on a P4 [1].

So the impact on small structure memcpy/memset etc is significant, hence
fixing the kernel looks like a better long term plan.

artur

[1]
P4 # ./bcsp m
IACCK 0.9.29  Artur Skawina <...>
[ exec time; lower is better  ] [speed ] [ time ]  [ok?]
TIME-N+S TIME32 TIME33 TIME1480 MBYTES/S TIMEXXXX  CSUM FUNCTION ( rdtsc_overhead=0  null=0 )
       0      0      0        0      inf        0  ffff csum_partial_copy_null
    1885    375    389      156  7589.74    39350     0 generic_memcpy
   10894    532    666     1696   698.11   108557     0 kernel_memcpylib
    1804    325    346      151  7841.06    19614     0 kernel_memcpy686
    1804    325    346      151  7841.06    19693     0 kernel_memcpy686ncld
    1744    323    381      148  8000.00    19687     0 kernel_memcpy686as1
    1332    157    232      139  8517.99    19235     0 kernel_memcpy686as1ncld
    1782    318    339      148  8000.00    19607     0 kernel_memcpy686as2
    1371    168    189      139  8517.99    19221     0 kernel_memcpy686as2ncld

P2 # ./bcsp m
IACKK 0.9.28  Artur Skawina <...>
TIME-N+S   TIME32   TIME33 TIME1480 MBYTES/S TIMEXXXX   CKSUM FUNCTION ( rdtsc_overhead=1  null=0 )
       0        0        0        0      inf        0 :  ffff csum_partial_copy_null
    7121      746     1215      730  1621.92   127418 :     0 generic_memcpy
   43604     2032     1709     6574   180.10   416409 :     0 kernel_memcpylib
    7480      771      726      684  1730.99    96084 :     0 kernel_memcpy686
    7036      735      543      685  1728.47    95508 :     0 kernel_memcpy686ncld
    7498     1015      711      716  1653.63    92200 :     0 kernel_memcpy686as1
    5826      438      489      662  1788.52    91598 :     0 kernel_memcpy686as1ncld
    6667      657      488      708  1672.32    89366 :     0 kernel_memcpy686as2
    6614      456      270      658  1799.39    91203 :     0 kernel_memcpy686as2ncld


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]