Bug 56624 - Vectorizer gives up on a group-access if it contains stores to the same location
Summary: Vectorizer gives up on a group-access if it contains stores to the same location
Status: NEW
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 4.8.0
: P3 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords: missed-optimization
Depends on:
Blocks: vectorizer
  Show dependency treegraph
 
Reported: 2013-03-15 11:51 UTC by Michael Zolotukhin
Modified: 2024-03-10 05:25 UTC (History)
3 users (show)

See Also:
Host:
Target:
Build:
Known to work: 11.1.0, 9.1.0
Known to fail: 10.1.0, 4.8.0
Last reconfirmed: 2013-03-15 00:00:00


Attachments
Reproducer (96 bytes, text/plain)
2013-03-15 11:51 UTC, Michael Zolotukhin
Details
Reproducer 2 (120 bytes, text/plain)
2013-03-15 12:26 UTC, Michael Zolotukhin
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Michael Zolotukhin 2013-03-15 11:51:29 UTC
Created attachment 29672 [details]
Reproducer

GCC can't vectorize such loop:
void foo (double *a)
{
  int i;
  for (i = 0; i < 100; i+=2)
    {
      a[i+1] = 2;
      a[i] = 3;
      a[i+1] = 2;
      a[i] = 3;
    }
}
Vectorizer reports following:

note: === vect_analyze_data_ref_accesses === 
note: Detected interleaving of size 2 
note: Two store stmts share the same dr. 
note: not vectorized: complicated access pattern.

Obviously, in this given case vectorization is possible because the first stores have no effect.

This test is a reproducer of similar problem encountered on Spec2006/470.lbm - there if-conversion could produce stores to the same location which will stop vectorizer.

The test is attached, command line to reproduce:
gcc group_access.c -O3 -c -ftree-vectorizer-verbose=15
Comment 1 Richard Biener 2013-03-15 11:54:50 UTC
> This test is a reproducer of similar problem encountered on Spec2006/470.lbm -
> there if-conversion could produce stores to the same location which will stop
> vectorizer.

Can you reproduce a testcase for that instead?  It doesn't make sense
to handle code that should be optimized earlier (by DSE).  Is it from
code like

 if (cond)
   a[i] = 3;
 else
   a[i] = 3;

?
Comment 2 Michael Zolotukhin 2013-03-15 12:19:50 UTC
> Can you reproduce a testcase for that instead?  It doesn't make sense
> to handle code that should be optimized earlier (by DSE).  Is it from
> code like
> 
>  if (cond)
>    a[i] = 3;
>  else
>    a[i] = 3;
> 
> ?

Yes, originally it is from the code similar to your example, but this example has one more problem which hides the one described in this tracker. I've submitted one more bug with the test almost like yours (56625).
Comment 3 Michael Zolotukhin 2013-03-15 12:26:46 UTC
Created attachment 29674 [details]
Reproducer 2
Comment 4 Michael Zolotukhin 2013-03-15 12:27:51 UTC
Sorry, it looks like the reproducer with if could be made, and here it is:
void foo (long *a)
{
  int i;
  for (i = 0; i < 100; i+=2)
    {
      if (a[i] == 0)
        {
          a[i+1] = 2;
          a[i] = 3;
        }
      else
        {
          a[i+1] = 3;
          a[i] = 4;
        }
    }
}
In this example we have:
group_access2.c:4: note: === vect_analyze_data_ref_accesses ===
group_access2.c:4: note: READ_WRITE dependence in interleaving.
group_access2.c:4: note: not vectorized: complicated access pattern.
group_access2.c:4: note: bad data access.
group_access2.c:1: note: vectorized 0 loops in function.

The diagnostic is a bit different, but rootcause is the same I guess.

The test is attached (reproducer 2).
Comment 5 Richard Biener 2013-03-15 12:31:57 UTC
Thanks and confirmed.
Comment 6 Richard Biener 2020-09-14 12:44:46 UTC
(In reply to Michael Zolotukhin from comment #4)
> Sorry, it looks like the reproducer with if could be made, and here it is:
> void foo (long *a)
> {
>   int i;
>   for (i = 0; i < 100; i+=2)
>     {
>       if (a[i] == 0)
>         {
>           a[i+1] = 2;
>           a[i] = 3;
>         }
>       else
>         {
>           a[i+1] = 3;
>           a[i] = 4;
>         }
>     }
> }
> In this example we have:
> group_access2.c:4: note: === vect_analyze_data_ref_accesses ===
> group_access2.c:4: note: READ_WRITE dependence in interleaving.
> group_access2.c:4: note: not vectorized: complicated access pattern.
> group_access2.c:4: note: bad data access.
> group_access2.c:1: note: vectorized 0 loops in function.
> 
> The diagnostic is a bit different, but rootcause is the same I guess.
> 
> The test is attached (reproducer 2).

We now vectorize this loop (not with plain SSE2 but with SSE4.2 for example):

.L2:
        movq    (%rdi), %xmm0
        movdqa  %xmm2, %xmm4
        addq    $16, %rdi
        punpcklqdq      %xmm0, %xmm0
        pcmpeqq %xmm1, %xmm0
        pblendvb        %xmm0, %xmm3, %xmm4
        movups  %xmm4, -16(%rdi)
        cmpq    %rdi, %rax
        jne     .L2

probably because we now sink the common stores from the if arm.  Modifying
the testcase to the following reproduces the original issue again:

void foo (long *a)
{
  int i;
  for (i = 0; i < 100; i+=2)
    {
      if (a[i] == 0)
        {
          a[i+1] = 2;
          a[i] = 3;
        }
      else
        {
          a[i] = 4;
          a[i+1] = 3;
        }
    }
}
Comment 7 Andrew Pinski 2024-03-10 05:22:00 UTC
Looks like all of the testcases vectorize since GCC 11 as far as I can tell.
Comment 8 Andrew Pinski 2024-03-10 05:25:41 UTC
For aarch64, it has been since GCC 13 though for the testcase in comment #6 .