This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: gcc 4.3.2 vectorizes access to volatile array


Andrew Haley wrote:
Till Straumann wrote:
gcc-4.3.2 seems to produce bad code when
accessing an array of small 'volatile'
objects -- it may try to access multiple
such objects in a 'parallel' fashion.
E.g., instead of reading two consecutive
'volatile short's sequentially it reads
a single 32-bit longword. This may crash
e.g., when accessing a memory-mapped device
which allows only 16-bit accesses.

If I compile this code fragment

void volarrcpy(short *d, volatile short *s, int n)
{
int i;
 for (i=0; i<n; i++)
   d[i] = s[i];
}


with '-O3' (the critical option seems to be '-ftree-vectorize') then gcc-4.3.2 produces quite complicated code but the essential section is (powerpc)

.L7:
   lhz 0,0(11)
   addi 11,11,2
   lwzx 0,4,9
   stwx 0,3,9
   addi 9,9,4
   bdnz .L7

or i386

.L7:
   movw    (%ecx), %ax
   movl    (%esi,%edx,4), %eax
   movl    %eax, (%ebx,%edx,4)
   incl    %edx
   addl    $2, %ecx
   cmpl    %edx, -20(%ebp)
   ja  .L7


Disassembled back into C-code, this reads


uint32_t *dst_l = (uint32_t*)d;
uint32_t *src_l = (uint32_t*)s;

for (i=0; i<n/2; i++) {
   d[i]     = s[i];
   dst_l[i] = src_l[i];
}

This code seems neither optimal nor correct.
Besides reading half of the locations twice
which violates the semantics of volatile
objects accessing such objects in a 'vectorized'
way (in this case: instead of reading
two adjacent short addresses gcc emits
a single 32-bit read) seems illegal to me.

Similar behavior seems to be present in 4.3.3.

Does anybody have some insight? Should I file
a bug report?

I can't reproduce this with "GCC: (GNU) 4.3.3 20081110 (prerelease)"


.L8:
	movzwl	(%ecx), %eax
	addl	$1, %ebx
	addl	$2, %ecx
	movw	%ax, (%edx)
	addl	$2, %edx
	cmpl	%ebx, 16(%ebp)
	jg	.L8

I think you should upgrade.

Andrew.

OK, try this then:


void
c(char *d, volatile char *s)
{
int i;
   for ( i=0; i<32; i++ )
       d[i]=s[i];
}


(gcc --version: gcc (Ubuntu 4.3.3-5ubuntu4) 4.3.3)


gcc -m32 -c -S -O3
produces an unrolled sequence:

   movzbl  (%ecx), %eax
   leal    20(%ebx), %edx
   movl    (%ecx), %eax
   movl    %eax, (%edi)
   movzbl  1(%ecx), %eax
   movl    4(%ecx), %eax
   movl    %eax, 4(%edi)
   movzbl  2(%ecx), %eax
   movl    4(%ebx), %eax
   movl    %eax, 4(%esi)
   movzbl  3(%ecx), %eax
   movl    8(%ebx), %eax
   movl    %eax, 8(%esi)
... < snip >...

The 64-bit version even uses SSE registers to
load the volatile data:

(gcc -c -S -O3)

.L7:
   movzbl  (%rsi), %eax
   movdqu  (%rsi), %xmm0
   movdqa  %xmm0, (%rdi)
   movzbl  1(%rsi), %eax
   movdqu  (%rdx), %xmm0
   movdqa  %xmm0, 16(%rdi)

Not sure an upgrade helps ;-)

-- Till


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]