Code below reads and writes data using different indices what is checked by "if" above loop. This can be autovectorized, as both memory areas do not overlap. Code compiled with -O3 -march=skylake-avx512 [code] struct S { double* __restrict__ * __restrict__ d; }; void test(S* __restrict__ s, int n, int k) { if (n > k) { for (int n = 0; n < 2; ++n) { s->d[n][0] = s->d[k][0]; s->d[n][1] = s->d[k][1]; } } } [/code] [asm] test(S*, int, int): cmp esi, edx jle .L3 mov rcx, QWORD PTR [rdi] movsx rdx, edx mov rax, QWORD PTR [rcx+rdx*8] mov rdx, QWORD PTR [rcx] vmovsd xmm0, QWORD PTR [rax] vmovsd QWORD PTR [rdx], xmm0 vmovsd xmm0, QWORD PTR [rax+8] vmovsd QWORD PTR [rdx+8], xmm0 vmovsd xmm0, QWORD PTR [rax] mov rdx, QWORD PTR [rcx+8] vmovsd QWORD PTR [rdx], xmm0 vmovsd xmm0, QWORD PTR [rax+8] vmovsd QWORD PTR [rdx+8], xmm0 .L3: ret [/asm]
Ehh, small typo. This is correct version, also not vectorized: [code] struct S { double* __restrict__ * __restrict__ d; }; void test(S* __restrict__ s, int n, int k) { if (n > k) { for (int i = 0; i < 2; ++i) { s->d[n][0] = s->d[k][0]; s->d[n][1] = s->d[k][1]; } } } [/code] [asm] test(S*, int, int): cmp esi, edx jle .L3 mov rax, QWORD PTR [rdi] movsx rdx, edx mov rdx, QWORD PTR [rax+rdx*8] movsx rsi, esi vmovsd xmm0, QWORD PTR [rdx] mov rax, QWORD PTR [rax+rsi*8] vmovsd QWORD PTR [rax], xmm0 vmovsd xmm0, QWORD PTR [rdx+8] vmovsd QWORD PTR [rax+8], xmm0 vmovsd xmm0, QWORD PTR [rdx] vmovsd QWORD PTR [rax], xmm0 vmovsd xmm0, QWORD PTR [rdx+8] vmovsd QWORD PTR [rax+8], xmm0 .L3: ret [/asm]
Confirmed. _33 = MEM[(double *)_28 clique 1 base 2]; MEM[(double *)_32 clique 1 base 2] = _33; _35 = MEM[(double *)_28 + 8B clique 1 base 2]; MEM[(double *)_32 + 8B clique 1 base 2] = _35; _49 = MEM[(double *)_28 clique 1 base 2]; MEM[(double *)_32 clique 1 base 2] = _49; _51 = MEM[(double *)_28 + 8B clique 1 base 2]; MEM[(double *)_32 + 8B clique 1 base 2] = _51; t.c:13:24: note: can't determine dependence between MEM[(double *)_32 clique 1 base 2] and MEM[(double *)_28 + 8B clique 1 base 2] As you can see we do not exploit that n != k when assigning restrict tags to the indirect loaded pointers. Nor would we do that when you load those in an unrolled loop. Our restrict machinery is simply not set up for this. I don't see how a compiler could reliably and reasonably implement this.
In this case s->d is pointer to pointer to double, and both pointer levels have restrict qualifier. I wonder if you could add some tag that s->d[n] and s->d[k] points to separate memory areas. This tag could be later used to determine that s->d[n][0] and s->d[k][0] also do not overlap.
On Fri, 14 Dec 2018, bugzilla@poradnik-webmastera.com wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88490 > > --- Comment #3 from Daniel Fruzynski <bugzilla@poradnik-webmastera.com> --- > In this case s->d is pointer to pointer to double, and both pointer levels have > restrict qualifier. I wonder if you could add some tag that s->d[n] and s->d[k] > points to separate memory areas. This tag could be later used to determine that > s->d[n][0] and s->d[k][0] also do not overlap. Not easily. Consider the loads being hoisted before the if (k > n) check for example. Note I do not think the C standard is sufficiently clear with regarding to restrict qualified pointers loaded from memory. Clearly s->d[n][0] and s->d[n][0] alias but they are not based on each other.
On Fri, 14 Dec 2018, rguenther at suse dot de wrote: > Note I do not think the C standard is sufficiently clear with regarding > to restrict qualified pointers loaded from memory. I think this is where "Every access that modifies X shall be considered also to modify P, for the purposes of this subclause." comes in. (See what I said at <https://gcc.gnu.org/ml/gcc-bugs/2005-01/msg03394.html>.) Modifying s->d[n][0] is considered to modify s->d[n], and so considered to modify s->d, and so considered to modify s. (It's still perfectly valid to have n == k; what's not valid is aliasing between objects accessed via s->d[0] and s->d[1], for example.)
On Fri, 14 Dec 2018, joseph at codesourcery dot com wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88490 > > --- Comment #5 from joseph at codesourcery dot com <joseph at codesourcery dot com> --- > On Fri, 14 Dec 2018, rguenther at suse dot de wrote: > > > Note I do not think the C standard is sufficiently clear with regarding > > to restrict qualified pointers loaded from memory. > > I think this is where "Every access that modifies X shall be considered > also to modify P, for the purposes of this subclause." comes in. (See > what I said at <https://gcc.gnu.org/ml/gcc-bugs/2005-01/msg03394.html>.) > Modifying s->d[n][0] is considered to modify s->d[n], and so considered to > modify s->d, and so considered to modify s. (It's still perfectly valid > to have n == k; what's not valid is aliasing between objects accessed via > s->d[0] and s->d[1], for example.) OK, thanks for clarifying. I think we are fine implementation-wise and I have a hard time thinking of a way to improve here given we have to compute sth conservatively correct.