Bug 43436 - Missed vectorization: "unhandled data-ref"
Summary: Missed vectorization: "unhandled data-ref"
Status: UNCONFIRMED
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 4.5.0
: P3 enhancement
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords: missed-optimization
Depends on:
Blocks: vectorizer
  Show dependency treegraph
 
Reported: 2010-03-18 21:51 UTC by Sebastian Pop
Modified: 2021-07-21 02:43 UTC (History)
2 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Sebastian Pop 2010-03-18 21:51:06 UTC
This kernel from FFmpeg is not vectorized with:
gcc-4.5 -c sub_hfyu_median_prediction.c -O3 -ffast-math -ftree-vectorizer-verbose=7 -msse2
[...]
sub_hfyu_median_prediction.c:18: note: not vectorized: unhandled data-ref 

Looking with GDB at it, I get:
(gdb) p debug_data_references (datarefs)
(Data Ref: 
  stmt: D.2736_16 = *D.2735_15;
  ref: *D.2735_15;
  base_object: *src1_14(D);
  Access function 0: {0B, +, 1}_1
)
(Data Ref: 
  stmt: 
  ref: 
  base_object: 
)

I think it is the dst data ref that is NULL.  Might be an aliasing
problem for the data dep analysis, but still, the data ref should be
analyzed correctly first.


typedef short DCTELEM;
typedef unsigned char uint8_t;
typedef long int x86_reg;
typedef unsigned int uint32_t;
typedef unsigned long int uint64_t;

void
sub_hfyu_median_prediction_c (uint8_t * dst, const uint8_t * src1,
			      const uint8_t * src2, int w, int *left,
			      int *left_top)
{
  int i;
  uint8_t l, lt;

  l = *left;
  lt = *left_top;

  for (i = 0; i < w; i++)
    {
      const int pred = mid_pred (l, src1[i], (l + src1[i] - lt) & 0xFF);
      lt = src1[i];
      l = src2[i];
      dst[i] = l - pred;
    }

  *left = l;
  *left_top = lt;
}

void add_hfyu_median_prediction_c(uint8_t *dst, const uint8_t *src1, const uint8_t *diff, int w, int *left, int *left_top)
{
  int i;
  uint8_t l, lt;

  l= *left;
  lt= *left_top;

  for(i=0; i<w; i++)
    {
      l= mid_pred(l, src1[i], (l + src1[i] - lt)&0xFF) + diff[i];
      lt= src1[i];
      dst[i]= l;
    }

  *left= l;
  *left_top= lt;
}
Comment 1 Sebastian Pop 2010-03-18 22:04:50 UTC
Also note that a similar problem occurs for hadamard8:
gcc-4.5 -c hadamard8.c -O3 -ffast-math -ftree-vectorizer-verbose=7 -msse2
[...]
hadamard8_diff.c:44: note: not vectorized: unhandled data-ref 
hadamard8_diff.c:26: note: not vectorized: data ref analysis failed D.2771_12 = *D.2770_11;

For which we fail to analyze one of the data references as well.
Note that ICC 11.0 does vectorize this kernel.


typedef unsigned char uint8_t;
typedef unsigned long int uint64_t;
typedef long int x86_reg;

#define BUTTERFLY2(o1,o2,i1,i2) \
o1= (i1)+(i2);\
o2= (i1)-(i2);

#define BUTTERFLY1(x,y) \
{\
    int a,b;\
    a= x;\
    b= y;\
    x= a+b;\
    y= a-b;\
}

#define BUTTERFLYA(x,y) (FFABS((x)+(y)) + FFABS((x)-(y)))

int hadamard8_diff8x8_c(void *s, uint8_t *dst, uint8_t *src, int stride, int h)
{
    int i;
    int temp[64];
    int sum=0;

    for(i=0; i<8; i++){
        //FIXME try pointer walks
        BUTTERFLY2(temp[8*i+0], temp[8*i+1], src[stride*i+0]-dst[stride*i+0],src[stride*i+1]-dst[stride*i+1]);
        BUTTERFLY2(temp[8*i+2], temp[8*i+3], src[stride*i+2]-dst[stride*i+2],src[stride*i+3]-dst[stride*i+3]);
        BUTTERFLY2(temp[8*i+4], temp[8*i+5], src[stride*i+4]-dst[stride*i+4],src[stride*i+5]-dst[stride*i+5]);
        BUTTERFLY2(temp[8*i+6], temp[8*i+7], src[stride*i+6]-dst[stride*i+6],src[stride*i+7]-dst[stride*i+7]);

        BUTTERFLY1(temp[8*i+0], temp[8*i+2]);
        BUTTERFLY1(temp[8*i+1], temp[8*i+3]);
        BUTTERFLY1(temp[8*i+4], temp[8*i+6]);
        BUTTERFLY1(temp[8*i+5], temp[8*i+7]);

        BUTTERFLY1(temp[8*i+0], temp[8*i+4]);
        BUTTERFLY1(temp[8*i+1], temp[8*i+5]);
        BUTTERFLY1(temp[8*i+2], temp[8*i+6]);
        BUTTERFLY1(temp[8*i+3], temp[8*i+7]);
    }

    for(i=0; i<8; i++){
        BUTTERFLY1(temp[8*0+i], temp[8*1+i]);
        BUTTERFLY1(temp[8*2+i], temp[8*3+i]);
        BUTTERFLY1(temp[8*4+i], temp[8*5+i]);
        BUTTERFLY1(temp[8*6+i], temp[8*7+i]);

        BUTTERFLY1(temp[8*0+i], temp[8*2+i]);
        BUTTERFLY1(temp[8*1+i], temp[8*3+i]);
        BUTTERFLY1(temp[8*4+i], temp[8*6+i]);
        BUTTERFLY1(temp[8*5+i], temp[8*7+i]);

        sum +=
             BUTTERFLYA(temp[8*0+i], temp[8*4+i])
            +BUTTERFLYA(temp[8*1+i], temp[8*5+i])
            +BUTTERFLYA(temp[8*2+i], temp[8*6+i])
            +BUTTERFLYA(temp[8*3+i], temp[8*7+i]);
    }
    return sum;
}
Comment 2 Ira Rosen 2010-03-28 10:58:02 UTC
(In reply to comment #0)

> sub_hfyu_median_prediction.c:18: note: not vectorized: unhandled data-ref 
> 
> Looking with GDB at it, I get:
> (gdb) p debug_data_references (datarefs)
> (Data Ref: 
>   stmt: D.2736_16 = *D.2735_15;
>   ref: *D.2735_15;
>   base_object: *src1_14(D);
>   Access function 0: {0B, +, 1}_1
> )
> (Data Ref: 
>   stmt: 
>   ref: 
>   base_object: 
> )
> 
> I think it is the dst data ref that is NULL.  Might be an aliasing
> problem for the data dep analysis, but still, the data ref should be
> analyzed correctly first.

Data refs analysis fails because of the function call in the loop.

The vectorizer should check the return value of compute_data_dependences_for_loop() and print some better error message though.
Comment 3 Ira Rosen 2010-03-28 11:07:54 UTC
(In reply to comment #1)

> hadamard8_diff.c:44: note: not vectorized: unhandled data-ref 

There is a function call in this loop as well.

> hadamard8_diff.c:26: note: not vectorized: data ref analysis failed D.2771_12 =
> *D.2770_11;

Scalar evolution analysis fails here with:
failed: evolution of base is not affine.

  D.2768_8 = i_361 * stride_7(D);
  D.2769_9 = (long unsigned int) D.2768_8;
  D.2770_11 = src_10(D) + D.2769_9;
  D.2771_12 = *D.2770_11;

stride is function parameter.
Comment 4 Sebastian Pop 2010-03-28 16:28:32 UTC
What about fixing the diagnostic message like this:

diff --git a/gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c
index 37ae9b5..44248b3 100644
--- a/gcc/tree-vect-data-refs.c
+++ b/gcc/tree-vect-data-refs.c
@@ -1866,10 +1866,21 @@ vect_analyze_data_refs (loop_vec_info loop_vinfo, bb_vec_info bb_vinfo)
 
   if (loop_vinfo)
     {
+      bool res;
+
       loop = LOOP_VINFO_LOOP (loop_vinfo);
-      compute_data_dependences_for_loop (loop, true,
-                                         &LOOP_VINFO_DATAREFS (loop_vinfo),
-                                         &LOOP_VINFO_DDRS (loop_vinfo));
+      res = compute_data_dependences_for_loop
+	(loop, true, &LOOP_VINFO_DATAREFS (loop_vinfo),
+	 &LOOP_VINFO_DDRS (loop_vinfo));
+
+      if (!res)
+        {
+          if (vect_print_dump_info (REPORT_UNVECTORIZED_LOCATIONS))
+	    fprintf (vect_dump, "not vectorized: loop contains function calls"
+		     " or data references that cannot be analyzed");
+          return false;
+        }
+
       datarefs = LOOP_VINFO_DATAREFS (loop_vinfo);
     }
   else
Comment 5 Sebastian Pop 2010-03-28 16:35:13 UTC
When defining the missing function like this:

static inline int mid_pred(int a, int b, int c)
{
    int t= (a-b)&((a-b)>>31);
    a-=t;
    b+=t;
    b-= (b-c)&((b-c)>>31);
    b+= (a-b)&((a-b)>>31);

    return b;
}

The vectorization reports: "not vectorized: unsupported use in stmt."

When this function is defined like this:
static inline int mid_pred(int a, int b, int c)
{
  if(a>b){
    if(c>b){
      if(c>a) b=a;
      else    b=c;
    }
  }else{
    if(b>c){
      if(c>a) b=c;
      else    b=a;
    }
  }
  return b;
}

the vectorizer stops with: "not vectorized: control flow in loop."
Comment 6 Ira Rosen 2010-03-28 18:05:52 UTC
(In reply to comment #4)
> What about fixing the diagnostic message like this:
> 

It would be nice to do the same for SLP (compute_data_dependences_for_bb) for completeness.

Thanks,
Ira

> diff --git a/gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c
> index 37ae9b5..44248b3 100644
> --- a/gcc/tree-vect-data-refs.c
> +++ b/gcc/tree-vect-data-refs.c
> @@ -1866,10 +1866,21 @@ vect_analyze_data_refs (loop_vec_info loop_vinfo,
> bb_vec_info bb_vinfo)
> 
>    if (loop_vinfo)
>      {
> +      bool res;
> +
>        loop = LOOP_VINFO_LOOP (loop_vinfo);
> -      compute_data_dependences_for_loop (loop, true,
> -                                         &LOOP_VINFO_DATAREFS (loop_vinfo),
> -                                         &LOOP_VINFO_DDRS (loop_vinfo));
> +      res = compute_data_dependences_for_loop
> +       (loop, true, &LOOP_VINFO_DATAREFS (loop_vinfo),
> +        &LOOP_VINFO_DDRS (loop_vinfo));
> +
> +      if (!res)
> +        {
> +          if (vect_print_dump_info (REPORT_UNVECTORIZED_LOCATIONS))
> +           fprintf (vect_dump, "not vectorized: loop contains function calls"
> +                    " or data references that cannot be analyzed");
> +          return false;
> +        }
> +
>        datarefs = LOOP_VINFO_DATAREFS (loop_vinfo);
>      }
>    else
> 

Comment 7 Ira Rosen 2010-03-28 18:22:06 UTC
(In reply to comment #5)
> When defining the missing function like this:
> 
> static inline int mid_pred(int a, int b, int c)
> {
>     int t= (a-b)&((a-b)>>31);
>     a-=t;
>     b+=t;
>     b-= (b-c)&((b-c)>>31);
>     b+= (a-b)&((a-b)>>31);
> 
>     return b;
> }
> 
> The vectorization reports: "not vectorized: unsupported use in stmt."

Yes, we have an unsupported cycles for l and lt, since they don't match regular reduction pattern.

> 
> When this function is defined like this:
> static inline int mid_pred(int a, int b, int c)
> {
>   if(a>b){
>     if(c>b){
>       if(c>a) b=a;
>       else    b=c;
>     }
>   }else{
>     if(b>c){
>       if(c>a) b=c;
>       else    b=a;
>     }
>   }
>   return b;
> }
> 
> the vectorizer stops with: "not vectorized: control flow in loop."
> 
 
if-conversion fails with 
l_34 = *D.2750_33;
tree could trap...

Comment 8 Sebastian Pop 2010-03-29 16:38:47 UTC
Subject: Bug 43436

Author: spop
Date: Mon Mar 29 16:38:34 2010
New Revision: 157800

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=157800
Log:
Improve vectorization diagnostic: loop contains function calls.

2010-03-29  Sebastian Pop  <sebastian.pop@amd.com>

	PR middle-end/43436
	* tree-vect-data-refs.c (vect_analyze_data_refs): When
	compute_data_dependences_for_loop returns false, early exit
	and output an extra diagnostic for the failed data reference
	analysis.

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/tree-vect-data-refs.c