This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Code generated for a simple memory copy loop


Richard, thanks for the reply.

I'd love to check out the generated code on a later gcc, but unfortunately we are not in a position to upgrade our gcc. We just use the default gcc that came with FreeBSD 7.0.

I'm interested in understanding why gcc generates the code the way it does. I'm probably missing something, and I'd like to understand that. Is counting up better than counting down in some way (add v/s sub)?

Thanks again for any help.

Regards,
N Datta

--- On Mon, 16/2/09, Richard Guenther <richard.guenther@gmail.com> wrote:

> From: Richard Guenther <richard.guenther@gmail.com>
> Subject: Re: Code generated for a simple memory copy loop
> To: "Narasimha Datta" <dattann@yahoo.com>
> Cc: gcc@gcc.gnu.org
> Date: Monday, 16 February, 2009, 3:54 PM
> On Mon, Feb 16, 2009 at 11:19 AM, Narasimha Datta
> <dattann@yahoo.com> wrote:
> > Hello,
> >
> > Here's a simple memory copy macro:
> >
> > #define MYMEMCOPY(dp, sp, len) \
> > do { \
> >        long __len = len; \
> >        while (--__len >= 0) \
> >                (dp)[__len] = (sp)[__len]; \
> > } while (0)
> >
> > void foo(unsigned char *dp, const unsigned char *sp,
> unsigned long size) {
> >        MYMEMCOPY(dp, sp, size);
> > }
> >
> > void bar(unsigned char *dp, const unsigned char *sp) {
> >        MYMEMCOPY(dp, sp, 128);
> > }
> >
> > The code fragments generated for the foo and bar
> functions with -O and -O2 optimizations respectively is as
> follows:
> >
> > /* ===== With -O switch ===== */
> > /* function foo */
> > .L4:
> >        movzbl  -1(%rcx), %eax
> >        movb    %al, -1(%rdx)
> >        subq    $1, %rcx
> >        subq    $1, %rdx
> >        subq    $1, %r8
> >        jns     .L4
> >
> > /* function bar */
> >        movl    $126, %edx
> > .L8:
> > .LBB3:
> >        .loc 1 13 0
> >        movzbl  1(%rdx,%rsi), %eax
> >        movb    %al, 1(%rdx,%rdi)
> >        subq    $1, %rdx
> >        cmpq    $-2, %rdx
> >        jne     .L8
> >
> > /* ===== With -O2 switch =====*/
> > /* function foo */
> > .L4:
> >        movzbl  -1(%rsi), %eax
> >        addq    $1, %rdi
> >        subq    $1, %rsi
> >        movb    %al, -1(%rcx)
> >        subq    $1, %rcx
> >        cmpq    %rdx, %rdi
> >        jne     .L4
> >
> > /* function bar */
> >        movl    $126, %edx
> > .L9:
> > .LBB3:
> >        .loc 1 13 0
> >        movzbl  1(%rdx,%rsi), %eax
> >        movb    %al, 1(%rdx,%rdi)
> >        subq    $1, %rdx
> >        cmpq    $-2, %rdx
> >        jne     .L9
> >
> > Now my questions are:
> > (i) Why does the compiler generate an addq, cmpq and
> jne for the foo function with -O2? Isn't subq/jns more
> efficient, as seen from the output from -O?
> > (ii) For function bar, why is the "cmpq $-2,
> %rdx" instruction generated? Won't it be better to
> count down from 128 to 0 instead of 126 to -2?
> >
> > Here's my OS and compiler version (I'm running
> a 64-bit FreeBSD):
> > $ uname -a
> > FreeBSD xxx 7.0-RELEASE FreeBSD 7.0-RELEASE #0: Wed
> Nov 12 18:54:21 PST 2008    
> root@WC7:/usr/obj/usr/src/sys/SMKERNEL  amd64
> > $ cc --version
> > cc (GCC) 4.2.1 20070719  [FreeBSD]
> > Copyright (C) 2007 Free Software Foundation, Inc.
> > This is free software; see the source for copying
> conditions.  There is NO
> > warranty; not even for MERCHANTABILITY or FITNESS FOR
> A PARTICULAR PURPOSE.
> >
> > And these are the commands I used to compile the
> program:
> > cc -S -O -g test.c
> > cc -S -O2 -g test.c
> >
> > Any pointers would be appreciated. Thanks!
> 
> 1) Try a more recent GCC
> 2) Use memcpy.  It is properly inlined/optimized.
> 
> Richard.


      Connect with friends all over the world. Get Yahoo! India Messenger at http://in.messenger.yahoo.com/?wm=n/


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]