This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] PING:fold two vector_cst in const_binop


On Tue, Apr 7, 2009 at 3:09 PM, Bingfeng Mei <bmei@broadcom.com> wrote:
>
>
>> -----Original Message-----
>> From: Richard Guenther [mailto:richard.guenther@gmail.com]
>> Sent: 07 April 2009 14:00
>> To: Bingfeng Mei
>> Cc: gcc-patches
>> Subject: Re: [PATCH] PING:fold two vector_cst in const_binop
>>
>> On Tue, Apr 7, 2009 at 2:19 PM, Bingfeng Mei
>> <bmei@broadcom.com> wrote:
>> > Hello, Richard,
>> >
>> > The following is the patch I submitted before, which was
>> not appropriate for late stage of 4.4. I retested the patch
>> on current trunk.
>> >
>> > --
>> > GCC fails to fold two vector_cst in ccp1 pass into one for
>> vector modes that are natively supported by target. For
>> example, the following test will fail in powerpc target with
>> -maltivec turned on (v4si is supported by altivec natively).
>> Ironically, if the vector mode is not supported natively by
>> target, the folding can be successful through scalar operations :-)
>> >
>> > /* { dg-do compile } */
>> > /* { dg-options "-O2 -fdump-tree-ccp1" } */
>> >
>> > typedef unsigned int v4si __attribute__ ((vector_size (16)));
>> >
>> > v4si c;
>> >
>> > void foo()
>> > {
>> > ?v4si a = { 1, 2, 3, 4 };
>> > ?v4si b = { 5, 6, 7, 8 };
>> > ?c = a + b;
>> > }
>> >
>> > /* { dg-final { scan-tree-dump-times "c =.* { 6, 8, 10, 12
>> }" 1 "ccp1" } } */
>> > /* { dg-final { cleanup-tree-dump "ccp1" } } */
>> >
>> >
>> > GCC produces following intermediate code (.ccp1):
>> >
>> > ;; Function foo (foo)
>> >
>> > foo ()
>> > {
>> > ?v4si c.0;
>> >
>> > <bb 2>:
>> > ?c.0_3 = { 1, 2, 3, 4 } + { 5, 6, 7, 8 };
>> > ?c ={v} c.0_3;
>> > ?return;
>> >
>> > }
>> >
>> > The following patch fixes this problem. With the patch, GCC
>> produces following intermediate code (.ccp1)
>> >
>> >
>> > ;; Function foo (foo)
>> >
>> > foo ()
>> > {
>> > <bb 2>:
>> > ?c ={v} { 6, 8, 10, 12 };
>> > ?return;
>> >
>> > }
>> >
>> > The patch was tested on x86_64-unknown-linux-gnu. I am not
>> sure how to generalize above test case, since different
>> targets support different vector modes. OK for mainline?
>> >
>> > Bingfeng Mei
>> > Broadcom UK
>> >
>> >
>> > 2009-04-07 Bingfeng Mei <bmei@broadcom.com>
>> >
>> > ? ? ? ?* fold-const.c (const_binop): Combine two VECTOR_CST
>> under operation CODE to produce a new one
>> >
>> > Index: fold-const.c
>> > ===================================================================
>> > --- fold-const.c ? ? ? ?(revision 145659)
>> > +++ fold-const.c ? ? ? ?(working copy)
>> > @@ -1998,6 +1998,51 @@
>> > ? ? ? ?return build_complex (type, real, imag);
>> > ? ? }
>> >
>> > + ?if (TREE_CODE (arg1) == VECTOR_CST)
>>
>> ? && TREE_CODE (arg2) == VECTOR_CST)
>>
>> as if we are ever called for shifts the shift argument may be
>> a scalar.
>>
>> > + ? ?{
>> > + ? ? ?tree type = TREE_TYPE(arg1);
>> > + ? ? ?int count = TYPE_VECTOR_SUBPARTS (type), i;
>> > + ? ? ?bool fail = false;
>> > + ? ? ?tree elements1, elements2, list = NULL_TREE;
>> > + ? ? ?elements1 = TREE_VECTOR_CST_ELTS (arg1);
>> > + ? ? ?elements2 = TREE_VECTOR_CST_ELTS (arg2);
>> > +
>> > + ? ? ?for (i = 0; i < count; i++)
>> > + ? ? ? {
>> > + ? ? ? ? ?tree elem1, elem2, elem;
>> > +
>> > + ? ? ? ? ?/* The trailing elements can be empty and should
>> be treated as 0 */
>> > + ? ? ? ? ?if(!elements1)
>> > + ? ? ? ? ? ?elem1 = build_int_cst(TREE_TYPE(type), 0);
>>
>> Hmm, where is this documented? ? build_zero_vector fills in
>> explicit zero members. ?Also using an integer type here is wrong
>> for floating point vectors. ?Please instead bail out if this happens
>> on either vector.
>
> I don't know where it is documented. But it is how GCC is implemented currently, and I just try to preserve the same behaviour :-).
>
> For example:
>
> typedef unsigned int v4si __attribute__ ((vector_size (16)));
> v4si bar()
> {
> ?v4si a = {1, 2};
> ?return a;
> }
>
> will produces on PowerPC.
>
> ? ? ? ?.file ? "tst2.c"
> ? ? ? ?.gnu_attribute 4, 1
> ? ? ? ?.gnu_attribute 8, 2
> ? ? ? ?.gnu_attribute 12, 1
> ? ? ? ?.section ? ? ? ?".text"
> ? ? ? ?.align 2
> ? ? ? ?.globl bar
> ? ? ? ?.type ? bar, @function
> bar:
> ? ? ? ?lis 9,.LC0@ha
> ? ? ? ?la 9,.LC0@l(9)
> ? ? ? ?lvx 2,0,9
> ? ? ? ?blr
> ? ? ? ?.size ? bar, .-bar
> ? ? ? ?.section ? ? ? ?.rodata.cst16,"aM",@progbits,16
> ? ? ? ?.align 4
> .LC0:
> ? ? ? ?.4byte ?1
> ? ? ? ?.4byte ?2
> ? ? ? ?.4byte ?0
> ? ? ? ?.4byte ?0
> ? ? ? ?.ident ?"GCC: (GNU) 4.5.0 20090407 (experimental) [trunk revision 143368]"

Ok, but then you need instead of build_int_cst(TREE_TYPE(type), 0)
fold_convert (TREE_TYPE (type), integer_zero_node).

Richard.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]