Is there any other optimization for memory?
Ian Lance Taylor
iant@google.com
Thu Jul 14 18:17:00 GMT 2011
Parmenides <mobile.parmenides@gmail.com> writes:
> For the purpose of understanding some gcc's features, without ideas of
> details underlying gcc, I have to code some examples in C and compile
> them into assembly code, then observe them to get some ideas. Memory
> values caching in registers is one optimization taken by gcc,
> reordering instructions is another. A "memory" clobber in an inline
> assembly may have influence on the both. I have coded an example in C
> to try to understand the former.
>
> int s = 0;
> int tst(int lim)
> {
> int i;
>
> for (i = 1; i < lim; i++)
> s = s + i;
>
> asm volatile(
> "nop"
> );
>
> s = s * 10;
>
> return s;
> }
>
> To compile the C souce, the following command is excuted.
> gcc -S -O tst.c
>
> The corresponding assembly code is as follows:
> tst:
> pushl %ebp
> movl %esp, %ebp
> movl 8(%ebp), %ecx
> cmpl $1, %ecx
> jle .L2
> movl s, %edx
> movl $1, %eax
> .L4:
> addl %eax, %edx
> incl %eax
> cmpl %eax, %ecx
> jne .L4
> movl %edx, s <--- After the loop, s is write back into memory.
> .L2:
> movl s, %eax <--- Before the evaluating 's = s * 10', s
> is reload into register.
> leal (%eax,%eax,4), %eax
> addl %eax, %eax
> movl %eax, s
> popl %ebp
> ret
>
> So, the "memory" clobber have prevented the optimization. But for the
> latter case, namely reordering instructions, I can not obtain an
> example like the above to illustrate how "memory" clobber prevent
> reordering instructions. I don't know some circumstances under which
> gcc will do reodering. Without them, I can not observe the effect of
> the "memory" clobber.
Instruction reordering is easier to observe on a machine other than the
x86, one with long load latencies. Here is an example, though:
int
f (int *a, int *b, int c)
{
int i, j;
for (i = 0; i < c; i++)
{
int a0, a1, a2, a3;
asm ("nop" : "=r" (j) : "r" (i));
a0 = a[0];
a1 = a[1];
a2 = a[2];
a3 = a[3];
b[1] = a0;
b[3] = a1;
b[0] = a2;
b[2] = a3;
}
return j;
}
When optimizing, the memory load instructions will be reordered to occur
before the asm. At least, that's what I see with current mainline gcc
on x86_64. This isn't a case of memory caching; it's reordering of the
load instructions across the asm.
Ian
More information about the Gcc-help
mailing list