This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

RE: [PATCH] PING:fold two vector_cst in const_binop


Richard,
Here is the modified version. It works both on integer and floating-point. I couldn't avoid nreverse by reverse counting because number of elements are not the same as vector size. Basically, my code is similar to following code in the fold_unary function:

      else if (TREE_CODE (arg0) == VECTOR_CST)
	{
	  tree elements = TREE_VECTOR_CST_ELTS (arg0), elem, list = NULL_TREE;
	  int count = TYPE_VECTOR_SUBPARTS (type), i;

	  for (i = 0; i < count; i++)
	    {
	      if (elements)
		{
		  elem = TREE_VALUE (elements);
		  elem = fold_unary (BIT_NOT_EXPR, TREE_TYPE (type), elem);
		  if (elem == NULL_TREE)
		    break;
		  elements = TREE_CHAIN (elements);
		}
	      else
		elem = build_int_cst (TREE_TYPE (type), -1);
	      list = tree_cons (NULL_TREE, elem, list);
	    }
	  if (i == count)
	    return build_vector (type, nreverse (list));
	}


Another thing I am not sure is how to form the test case because which vectors used here are target-specific. Maybe I can put the test in gcc.dg/vect directory and make use of vect.exp infrastructure.

Thanks,
Bingfeng


test code: 

/* { dg-do compile } */
/* { dg-options "-O2 -fdump-tree-ccp1" } */

typedef unsigned int v4si __attribute__ ((vector_size (16)));
typedef float v4sf __attribute__ ((vector_size (16))); 

v4si c;
void foo1()
{
  v4si a = { 1, 2, 3, 4 };
  v4si b = { 5, 6, 7, 8 };
  c = a + b;
}

void foo2()
{
  v4si a = { 1, 2};
  v4si b = { 5, 6, 7 };
  c = a + b;
}

v4sf d;
void foo3()
{
  v4sf a = { 1.0, 2.0, 3.0, 4.0};
  v4sf b = { 5.0, 6.0, 7.0, 8.0};
  d = a + b;
}

/* { dg-final { scan-tree-dump-times "c =.* { 6, 8, 10, 12 }" 1 "ccp1" } } */
/* { dg-final { scan-tree-dump-times "c =.* { 6, 8, 7, 0 }" 1 "ccp1" } } */
/* { dg-final { scan-tree-dump-times "c =.* { 6.0e+0, 8.0e+0, 1.0e+1, 1.2e+1 }" 1 "ccp1" } } */
/* { dg-final { cleanup-tree-dump "ccp1" } } */

ccp1 file:

;; Function foo1 (foo1)

foo1 ()
{
<bb 2>:
  c = { 6, 8, 10, 12 };
  return;

}



;; Function foo2 (foo2)

foo2 ()
{
<bb 2>:
  c = { 6, 8, 7, 0 };
  return;

}



;; Function foo3 (foo3)

foo3 ()
{
<bb 2>:
  d = { 6.0e+0, 8.0e+0, 1.0e+1, 1.2e+1 };
  return;

}


The updated patch:
Index: fold-const.c
===================================================================
--- fold-const.c        (revision 145659)
+++ fold-const.c        (working copy)
@@ -141,6 +141,7 @@
 static tree fold_negate_const (tree, tree);
 static tree fold_not_const (tree, tree);
 static tree fold_relational_const (enum tree_code, tree, tree, tree);
+static tree fold_convert_const (enum tree_code, tree, tree);


 /* We know that A1 + B1 = SUM1, using 2's complement arithmetic and ignoring
@@ -1998,6 +1999,50 @@
        return build_complex (type, real, imag);
     }

+  if (TREE_CODE (arg1) == VECTOR_CST)
+    {
+      tree type = TREE_TYPE(arg1);
+      int count = TYPE_VECTOR_SUBPARTS (type), i;
+      tree elements1, elements2, list = NULL_TREE;
+
+      if(TREE_CODE(arg2) != VECTOR_CST)
+        return NULL_TREE;
+
+      elements1 = TREE_VECTOR_CST_ELTS (arg1);
+      elements2 = TREE_VECTOR_CST_ELTS (arg2);
+
+      for (i = 0; i < count; i++)
+       {
+          tree elem1, elem2, elem;
+
+          /* The trailing elements can be empty and should be treated as 0 */
+          if(!elements1)
+            elem1 = fold_convert_const (NOP_EXPR, TREE_TYPE (type), integer_zero_node);
+          else
+            {
+              elem1 = TREE_VALUE(elements1);
+              elements1 = TREE_CHAIN (elements1);
+            }
+
+          if(!elements2)
+            elem2 = fold_convert_const (NOP_EXPR, TREE_TYPE (type), integer_zero_node);
+          else
+            {
+              elem2 = TREE_VALUE(elements2);
+              elements2 = TREE_CHAIN (elements2);
+            }
+
+          elem = const_binop (code, elem1, elem2, notrunc);
+
+          /* It is possible that const_binop cannot handle the given
+            code and return NULL_TREE */
+          if(elem == NULL_TREE)
+            return NULL_TREE;
+
+          list = tree_cons (NULL_TREE, elem, list);
+       }
+      return build_vector(type, nreverse(list));
+    }
   return NULL_TREE;
 }
> -----Original Message-----
> From: Richard Guenther [mailto:richard.guenther@gmail.com] 
> Sent: 07 April 2009 14:17
> To: Bingfeng Mei
> Cc: gcc-patches
> Subject: Re: [PATCH] PING:fold two vector_cst in const_binop
> 
> On Tue, Apr 7, 2009 at 3:09 PM, Bingfeng Mei 
> <bmei@broadcom.com> wrote:
> >
> >
> >> -----Original Message-----
> >> From: Richard Guenther [mailto:richard.guenther@gmail.com]
> >> Sent: 07 April 2009 14:00
> >> To: Bingfeng Mei
> >> Cc: gcc-patches
> >> Subject: Re: [PATCH] PING:fold two vector_cst in const_binop
> >>
> >> On Tue, Apr 7, 2009 at 2:19 PM, Bingfeng Mei
> >> <bmei@broadcom.com> wrote:
> >> > Hello, Richard,
> >> >
> >> > The following is the patch I submitted before, which was
> >> not appropriate for late stage of 4.4. I retested the patch
> >> on current trunk.
> >> >
> >> > --
> >> > GCC fails to fold two vector_cst in ccp1 pass into one for
> >> vector modes that are natively supported by target. For
> >> example, the following test will fail in powerpc target with
> >> -maltivec turned on (v4si is supported by altivec natively).
> >> Ironically, if the vector mode is not supported natively by
> >> target, the folding can be successful through scalar operations :-)
> >> >
> >> > /* { dg-do compile } */
> >> > /* { dg-options "-O2 -fdump-tree-ccp1" } */
> >> >
> >> > typedef unsigned int v4si __attribute__ ((vector_size (16)));
> >> >
> >> > v4si c;
> >> >
> >> > void foo()
> >> > {
> >> > ?v4si a = { 1, 2, 3, 4 };
> >> > ?v4si b = { 5, 6, 7, 8 };
> >> > ?c = a + b;
> >> > }
> >> >
> >> > /* { dg-final { scan-tree-dump-times "c =.* { 6, 8, 10, 12
> >> }" 1 "ccp1" } } */
> >> > /* { dg-final { cleanup-tree-dump "ccp1" } } */
> >> >
> >> >
> >> > GCC produces following intermediate code (.ccp1):
> >> >
> >> > ;; Function foo (foo)
> >> >
> >> > foo ()
> >> > {
> >> > ?v4si c.0;
> >> >
> >> > <bb 2>:
> >> > ?c.0_3 = { 1, 2, 3, 4 } + { 5, 6, 7, 8 };
> >> > ?c ={v} c.0_3;
> >> > ?return;
> >> >
> >> > }
> >> >
> >> > The following patch fixes this problem. With the patch, GCC
> >> produces following intermediate code (.ccp1)
> >> >
> >> >
> >> > ;; Function foo (foo)
> >> >
> >> > foo ()
> >> > {
> >> > <bb 2>:
> >> > ?c ={v} { 6, 8, 10, 12 };
> >> > ?return;
> >> >
> >> > }
> >> >
> >> > The patch was tested on x86_64-unknown-linux-gnu. I am not
> >> sure how to generalize above test case, since different
> >> targets support different vector modes. OK for mainline?
> >> >
> >> > Bingfeng Mei
> >> > Broadcom UK
> >> >
> >> >
> >> > 2009-04-07 Bingfeng Mei <bmei@broadcom.com>
> >> >
> >> > ? ? ? ?* fold-const.c (const_binop): Combine two VECTOR_CST
> >> under operation CODE to produce a new one
> >> >
> >> > Index: fold-const.c
> >> > 
> ===================================================================
> >> > --- fold-const.c ? ? ? ?(revision 145659)
> >> > +++ fold-const.c ? ? ? ?(working copy)
> >> > @@ -1998,6 +1998,51 @@
> >> > ? ? ? ?return build_complex (type, real, imag);
> >> > ? ? }
> >> >
> >> > + ?if (TREE_CODE (arg1) == VECTOR_CST)
> >>
> >> ? && TREE_CODE (arg2) == VECTOR_CST)
> >>
> >> as if we are ever called for shifts the shift argument may be
> >> a scalar.
> >>
> >> > + ? ?{
> >> > + ? ? ?tree type = TREE_TYPE(arg1);
> >> > + ? ? ?int count = TYPE_VECTOR_SUBPARTS (type), i;
> >> > + ? ? ?bool fail = false;
> >> > + ? ? ?tree elements1, elements2, list = NULL_TREE;
> >> > + ? ? ?elements1 = TREE_VECTOR_CST_ELTS (arg1);
> >> > + ? ? ?elements2 = TREE_VECTOR_CST_ELTS (arg2);
> >> > +
> >> > + ? ? ?for (i = 0; i < count; i++)
> >> > + ? ? ? {
> >> > + ? ? ? ? ?tree elem1, elem2, elem;
> >> > +
> >> > + ? ? ? ? ?/* The trailing elements can be empty and should
> >> be treated as 0 */
> >> > + ? ? ? ? ?if(!elements1)
> >> > + ? ? ? ? ? ?elem1 = build_int_cst(TREE_TYPE(type), 0);
> >>
> >> Hmm, where is this documented? ? build_zero_vector fills in
> >> explicit zero members. ?Also using an integer type here is wrong
> >> for floating point vectors. ?Please instead bail out if 
> this happens
> >> on either vector.
> >
> > I don't know where it is documented. But it is how GCC is 
> implemented currently, and I just try to preserve the same 
> behaviour :-).
> >
> > For example:
> >
> > typedef unsigned int v4si __attribute__ ((vector_size (16)));
> > v4si bar()
> > {
> > ?v4si a = {1, 2};
> > ?return a;
> > }
> >
> > will produces on PowerPC.
> >
> > ? ? ? ?.file ? "tst2.c"
> > ? ? ? ?.gnu_attribute 4, 1
> > ? ? ? ?.gnu_attribute 8, 2
> > ? ? ? ?.gnu_attribute 12, 1
> > ? ? ? ?.section ? ? ? ?".text"
> > ? ? ? ?.align 2
> > ? ? ? ?.globl bar
> > ? ? ? ?.type ? bar, @function
> > bar:
> > ? ? ? ?lis 9,.LC0@ha
> > ? ? ? ?la 9,.LC0@l(9)
> > ? ? ? ?lvx 2,0,9
> > ? ? ? ?blr
> > ? ? ? ?.size ? bar, .-bar
> > ? ? ? ?.section ? ? ? ?.rodata.cst16,"aM",@progbits,16
> > ? ? ? ?.align 4
> > .LC0:
> > ? ? ? ?.4byte ?1
> > ? ? ? ?.4byte ?2
> > ? ? ? ?.4byte ?0
> > ? ? ? ?.4byte ?0
> > ? ? ? ?.ident ?"GCC: (GNU) 4.5.0 20090407 (experimental) 
> [trunk revision 143368]"
> 
> Ok, but then you need instead of build_int_cst(TREE_TYPE(type), 0)
> fold_convert (TREE_TYPE (type), integer_zero_node).
> 
> Richard.
> 
> 


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]