Bug 88490 - Missed autovectorization when indices are different
Summary: Missed autovectorization when indices are different
Status: NEW
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 9.0
: P3 enhancement
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords: alias, missed-optimization
Depends on:
Blocks: restrict vectorizer
  Show dependency treegraph
 
Reported: 2018-12-13 18:06 UTC by Daniel Fruzynski
Modified: 2021-08-10 22:59 UTC (History)
2 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2018-12-14 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Daniel Fruzynski 2018-12-13 18:06:53 UTC
Code below reads and writes data using different indices what is checked by "if" above loop. This can be autovectorized, as both memory areas do not overlap. Code compiled with -O3 -march=skylake-avx512

[code]
struct S
{
    double* __restrict__ * __restrict__ d;
};

void test(S* __restrict__ s, int n, int k)
{
    if (n > k)
    {
        for (int n = 0; n < 2; ++n)
        {
            s->d[n][0] = s->d[k][0];
            s->d[n][1] = s->d[k][1];
        }
    }
}
[/code]

[asm]
test(S*, int, int):
        cmp     esi, edx
        jle     .L3
        mov     rcx, QWORD PTR [rdi]
        movsx   rdx, edx
        mov     rax, QWORD PTR [rcx+rdx*8]
        mov     rdx, QWORD PTR [rcx]
        vmovsd  xmm0, QWORD PTR [rax]
        vmovsd  QWORD PTR [rdx], xmm0
        vmovsd  xmm0, QWORD PTR [rax+8]
        vmovsd  QWORD PTR [rdx+8], xmm0
        vmovsd  xmm0, QWORD PTR [rax]
        mov     rdx, QWORD PTR [rcx+8]
        vmovsd  QWORD PTR [rdx], xmm0
        vmovsd  xmm0, QWORD PTR [rax+8]
        vmovsd  QWORD PTR [rdx+8], xmm0
.L3:
        ret
[/asm]
Comment 1 Daniel Fruzynski 2018-12-13 18:10:38 UTC
Ehh, small typo. This is correct version, also not vectorized:

[code]
struct S
{
    double* __restrict__ * __restrict__ d;
};

void test(S* __restrict__ s, int n, int k)
{
    if (n > k)
    {
        for (int i = 0; i < 2; ++i)
        {
            s->d[n][0] = s->d[k][0];
            s->d[n][1] = s->d[k][1];
        }
    }
}
[/code]

[asm]
test(S*, int, int):
        cmp     esi, edx
        jle     .L3
        mov     rax, QWORD PTR [rdi]
        movsx   rdx, edx
        mov     rdx, QWORD PTR [rax+rdx*8]
        movsx   rsi, esi
        vmovsd  xmm0, QWORD PTR [rdx]
        mov     rax, QWORD PTR [rax+rsi*8]
        vmovsd  QWORD PTR [rax], xmm0
        vmovsd  xmm0, QWORD PTR [rdx+8]
        vmovsd  QWORD PTR [rax+8], xmm0
        vmovsd  xmm0, QWORD PTR [rdx]
        vmovsd  QWORD PTR [rax], xmm0
        vmovsd  xmm0, QWORD PTR [rdx+8]
        vmovsd  QWORD PTR [rax+8], xmm0
.L3:
        ret
[/asm]
Comment 2 Richard Biener 2018-12-14 10:46:42 UTC
Confirmed.

  _33 = MEM[(double *)_28 clique 1 base 2];
  MEM[(double *)_32 clique 1 base 2] = _33;
  _35 = MEM[(double *)_28 + 8B clique 1 base 2];
  MEM[(double *)_32 + 8B clique 1 base 2] = _35;
  _49 = MEM[(double *)_28 clique 1 base 2];
  MEM[(double *)_32 clique 1 base 2] = _49;
  _51 = MEM[(double *)_28 + 8B clique 1 base 2];
  MEM[(double *)_32 + 8B clique 1 base 2] = _51;

t.c:13:24: note: can't determine dependence between MEM[(double *)_32 clique 1 base 2] and MEM[(double *)_28 + 8B clique 1 base 2]

As you can see we do not exploit that n != k when assigning restrict tags
to the indirect loaded pointers.  Nor would we do that when you load
those in an unrolled loop.  Our restrict machinery is simply not set up
for this.

I don't see how a compiler could reliably and reasonably implement this.
Comment 3 Daniel Fruzynski 2018-12-14 12:00:28 UTC
In this case s->d is pointer to pointer to double, and both pointer levels have restrict qualifier. I wonder if you could add some tag that s->d[n] and s->d[k] points to separate memory areas. This tag could be later used to determine that s->d[n][0] and s->d[k][0] also do not overlap.
Comment 4 rguenther@suse.de 2018-12-14 12:12:29 UTC
On Fri, 14 Dec 2018, bugzilla@poradnik-webmastera.com wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88490
> 
> --- Comment #3 from Daniel Fruzynski <bugzilla@poradnik-webmastera.com> ---
> In this case s->d is pointer to pointer to double, and both pointer levels have
> restrict qualifier. I wonder if you could add some tag that s->d[n] and s->d[k]
> points to separate memory areas. This tag could be later used to determine that
> s->d[n][0] and s->d[k][0] also do not overlap.

Not easily.  Consider the loads being hoisted before the if (k > n) check
for example.  Note I do not think the C standard is sufficiently
clear with regarding to restrict qualified pointers loaded from
memory.  Clearly s->d[n][0] and s->d[n][0] alias but they are not
based on each other.
Comment 5 jsm-csl@polyomino.org.uk 2018-12-14 18:29:49 UTC
On Fri, 14 Dec 2018, rguenther at suse dot de wrote:

> Note I do not think the C standard is sufficiently clear with regarding 
> to restrict qualified pointers loaded from memory.

I think this is where "Every access that modifies X shall be considered 
also to modify P, for the purposes of this subclause." comes in.  (See 
what I said at <https://gcc.gnu.org/ml/gcc-bugs/2005-01/msg03394.html>.)  
Modifying s->d[n][0] is considered to modify s->d[n], and so considered to 
modify s->d, and so considered to modify s.  (It's still perfectly valid 
to have n == k; what's not valid is aliasing between objects accessed via 
s->d[0] and s->d[1], for example.)
Comment 6 rguenther@suse.de 2018-12-17 08:06:20 UTC
On Fri, 14 Dec 2018, joseph at codesourcery dot com wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88490
> 
> --- Comment #5 from joseph at codesourcery dot com <joseph at codesourcery dot com> ---
> On Fri, 14 Dec 2018, rguenther at suse dot de wrote:
> 
> > Note I do not think the C standard is sufficiently clear with regarding 
> > to restrict qualified pointers loaded from memory.
> 
> I think this is where "Every access that modifies X shall be considered 
> also to modify P, for the purposes of this subclause." comes in.  (See 
> what I said at <https://gcc.gnu.org/ml/gcc-bugs/2005-01/msg03394.html>.)  
> Modifying s->d[n][0] is considered to modify s->d[n], and so considered to 
> modify s->d, and so considered to modify s.  (It's still perfectly valid 
> to have n == k; what's not valid is aliasing between objects accessed via 
> s->d[0] and s->d[1], for example.)

OK, thanks for clarifying.  I think we are fine implementation-wise
and I have a hard time thinking of a way to improve here given we
have to compute sth conservatively correct.