Bug 39075 - alignment for "unsigned short a[10000]" vs "extern unsigned short a[10000]"
Summary: alignment for "unsigned short a[10000]" vs "extern unsigned short a[10000]"
Status: NEW
Alias: None
Product: gcc
Classification: Unclassified
Component: middle-end (show other bugs)
Version: 4.4.0
: P3 enhancement
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords: missed-optimization
Depends on:
Blocks:
 
Reported: 2009-02-02 14:39 UTC by Dan Nicolaescu
Modified: 2009-02-02 15:03 UTC (History)
2 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2009-02-02 15:03:08


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Dan Nicolaescu 2009-02-02 14:39:52 UTC
 
Comment 1 Dan Nicolaescu 2009-02-02 14:50:01 UTC
This code:
unsigned short a[10000];
void test()
{
  int i;
  for (i = 0; i < 10000; ++i)  a[i] = 5;
}

will be vectorized with -O3 -march=core2 to this:

.L2:
        movdqa  %xmm0, a(%eax)
        addl    $16, %eax
        cmpl    $20000, %eax
        jne     .L2


but this one:

extern unsigned short a[10000];

void test()
{
  int i;
  for (i = 0; i < 10000; ++i)     a[i] = 5;
}

will get a lot of extra code before the loop because the vectorizer thinks it needs to do peeling for alignment:
test.c:7: note: Alignment of access forced using peeling.

Intel's compiler does not generate the extra peeling code.
Comment 2 Richard Biener 2009-02-02 14:53:09 UTC
The ABI does not guarantee alignment bigger than 2 for the external array.  The
vectorizer adjusts the alignment for the internal one.
Comment 3 Richard Biener 2009-02-02 14:55:27 UTC
Err, it seems at least the x86_64 ABI guarantees alignment of 16 bytes for
arrays bigger than 16 bytes (including variable length arrays).