User account creation filtered due to spam.

Bug 39075

Summary: alignment for "unsigned short a[10000]" vs "extern unsigned short a[10000]"
Product: gcc Reporter: Dan Nicolaescu <dann>
Component: middle-endAssignee: Not yet assigned to anyone <unassigned>
Status: NEW ---    
Severity: enhancement CC: gcc-bugs, rguenth
Priority: P3 Keywords: missed-optimization
Version: 4.4.0   
Target Milestone: ---   
Host: Target:
Build: Known to work:
Known to fail: Last reconfirmed: 2009-02-02 15:03:08

Description Dan Nicolaescu 2009-02-02 14:39:52 UTC
 
Comment 1 Dan Nicolaescu 2009-02-02 14:50:01 UTC
This code:
unsigned short a[10000];
void test()
{
  int i;
  for (i = 0; i < 10000; ++i)  a[i] = 5;
}

will be vectorized with -O3 -march=core2 to this:

.L2:
        movdqa  %xmm0, a(%eax)
        addl    $16, %eax
        cmpl    $20000, %eax
        jne     .L2


but this one:

extern unsigned short a[10000];

void test()
{
  int i;
  for (i = 0; i < 10000; ++i)     a[i] = 5;
}

will get a lot of extra code before the loop because the vectorizer thinks it needs to do peeling for alignment:
test.c:7: note: Alignment of access forced using peeling.

Intel's compiler does not generate the extra peeling code.
Comment 2 Richard Biener 2009-02-02 14:53:09 UTC
The ABI does not guarantee alignment bigger than 2 for the external array.  The
vectorizer adjusts the alignment for the internal one.
Comment 3 Richard Biener 2009-02-02 14:55:27 UTC
Err, it seems at least the x86_64 ABI guarantees alignment of 16 bytes for
arrays bigger than 16 bytes (including variable length arrays).