Is there any other optimization for memory?

Thu Jul 14 18:17:00 GMT 2011

Parmenides <mobile.parmenides@gmail.com> writes:

> For the purpose of understanding some gcc's features, without ideas of
> details underlying gcc, I have to code some examples in C and compile
> them into assembly code, then observe them to get some ideas. Memory
> values caching in registers is one optimization taken by gcc,
> reordering instructions is another. A "memory" clobber in an inline
> assembly may have influence on the both. I have coded an example in C
> to try to understand the former.
>
> int s = 0;
> int tst(int lim)
> {
>      int i;
>
>      for (i = 1; i < lim; i++)
>           s = s + i;
>
>      asm volatile(
>           "nop"
>           );
>
>      s = s * 10;
>
>      return s;
> }
>
> To compile the C souce, the following command is excuted.
> gcc -S -O tst.c
>
> The corresponding assembly code is as follows:
> tst:
>         pushl   %ebp
>         movl    %esp, %ebp
>         movl    8(%ebp), %ecx
>         cmpl    $1, %ecx
>         jle     .L2
>         movl    s, %edx
>         movl    $1, %eax
> .L4:
>         addl    %eax, %edx
>         incl    %eax
>         cmpl    %eax, %ecx
>         jne     .L4
>         movl    %edx, s     <--- After the loop, s is write back into memory.
> .L2:
>         movl    s, %eax     <--- Before the evaluating 's = s * 10', s
> is reload into register.
>         leal    (%eax,%eax,4), %eax
>         addl    %eax, %eax
>         movl    %eax, s
>         popl    %ebp
>         ret
>
> So, the "memory" clobber have prevented the optimization. But for the
> latter case, namely reordering instructions, I can not obtain an
> example like the above to illustrate how "memory" clobber prevent
> reordering instructions. I don't know some circumstances under which
> gcc will do reodering. Without them, I can not observe the effect of
> the "memory" clobber.

Instruction reordering is easier to observe on a machine other than the
x86, one with long load latencies.  Here is an example, though:

int
f (int *a, int *b, int c)
{
  int i, j;

  for (i = 0; i < c; i++)
    {
      int a0, a1, a2, a3;

      asm ("nop" : "=r" (j) : "r" (i));
      a0 = a[0];
      a1 = a[1];
      a2 = a[2];
      a3 = a[3];
      b[1] = a0;
      b[3] = a1;
      b[0] = a2;
      b[2] = a3;
    }
  return j;
}

When optimizing, the memory load instructions will be reordered to occur
before the asm.  At least, that's what I see with current mainline gcc
on x86_64.  This isn't a case of memory caching; it's reordering of the
load instructions across the asm.

Ian