62283 – basic-block vectorization fails

Bug 62283 - basic-block vectorization fails

Summary: basic-block vectorization fails

Status:	REOPENED

Alias:	None

Product:	gcc
Classification:	Unclassified
Component:	tree-optimization (show other bugs)
Version:	4.8.2

Importance:	P3 normal
Target Milestone:	6.0
Assignee:	Richard Biener

URL:
Keywords:	missed-optimization

Depends on:
Blocks:	vectorizer
	Show dependency tree / graph

Reported:	2014-08-27 11:49 UTC by Henrik Holst
Modified:	2024-03-15 21:41 UTC (History)
CC List:	2 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed:	2014-08-27 00:00:00

Attachments
vect dump (2.93 KB, text/plain) 2014-09-01 13:37 UTC, Rainer Orth	Details
bb-slp-11.c.126t.slp2 dump (3.44 KB, text/plain) 2014-09-02 11:08 UTC, Rainer Orth	Details
bb-slp-26.c.120t.slp1 dump (5.31 KB, text/plain) 2014-09-02 11:08 UTC, Rainer Orth	Details
bb-slp-14.c.141t.slp2 dump (2.53 KB, text/plain) 2015-05-04 14:44 UTC, Rainer Orth	Details
bb-slp-32.c.141t.slp2 dump (1.44 KB, text/plain) 2015-05-06 11:31 UTC, Rainer Orth	Details
32-bit sparc-sun-solaris2.11 vect-33.c.265t.optimized (1.63 KB, text/plain) 2024-02-28 15:01 UTC, Rainer Orth	Details
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Henrik Holst 2014-08-27 11:49:43 UTC

If the `value` specifier below is removed, GFortran generates scalar code instead of vectorized code as it should. 

$ cat saxpy.f
      subroutine saxpy(alpha,x,y)
      value alpha
      real x(4),y(4)
      y(1)=y(1)+alpha*x(1)
      y(2)=y(2)+alpha*x(2)
      y(3)=y(3)+alpha*x(3)
      y(4)=y(4)+alpha*x(4)
end
$ gfortran -S -msse2 -O3 saxpy.f
$ more saxpy.s

Comment 1 Henrik Holst 2014-08-27 11:52:22 UTC

(forgot to indent the end statement above.)

The expected assembler code should be something like:

	movaps	%xmm0, %xmm1
	movups	(%rdi), %xmm0
	shufps	$0, %xmm1, %xmm1
	movups	(%rsi), %xmm2
	mulps	%xmm1, %xmm0
	addps	%xmm2, %xmm0
	movups	%xmm0, (%rsi)
	ret

Comment 2 Richard Biener 2014-08-27 12:00:29 UTC

Confirmed.  This is because SLP analysis sees a load from *alpha which it doesn't
consider handling as extern def as it handles parameters.

Let me see if I can easily fix this.

Comment 3 Richard Biener 2014-08-27 12:43:31 UTC

Like

Index: gcc/tree-vect-slp.c
===================================================================
--- gcc/tree-vect-slp.c (revision 214572)
+++ gcc/tree-vect-slp.c (working copy)
@@ -329,6 +329,13 @@ vect_get_and_check_slp_defs (loop_vec_in
             }
         }
 
+      /* If this is an internal def but the definition is not vectorizable
+         treat it as external def for basic-block SLP.  */
+      if (!loop
+         && dt == vect_internal_def
+         && !STMT_VINFO_VECTORIZABLE (vinfo_for_stmt (def_stmt)))
+       dt = vect_external_def;
+
       if (first)
        {
          oprnd_info->first_dt = dt;

if one can sort out placing of the initializer (currently we place it at
the start of the basic-block which is wrong).  Like with

Index: gcc/tree-vect-slp.c
===================================================================
--- gcc/tree-vect-slp.c (revision 214572)
+++ gcc/tree-vect-slp.c (working copy)
@@ -2447,6 +2454,8 @@ vect_get_constant_vectors (tree op, slp_
 
   number_of_places_left_in_vector = nunits;
   elts = XALLOCAVEC (tree, nunits);
+  gimple_stmt_iterator where;
+  bool where_p = false;
   for (j = 0; j < number_of_copies; j++)
     {
       for (i = group_size - 1; stmts.iterate (i, &stmt); i--)
@@ -2517,6 +2526,20 @@ vect_get_constant_vectors (tree op, slp_
 
           /* Create 'vect_ = {op0,op1,...,opn}'.  */
           number_of_places_left_in_vector--;
+         /* For "externals" defined in BB compute an insert location.  */
+         if (TREE_CODE (op) == SSA_NAME
+             && (STMT_VINFO_BB_VINFO (stmt_vinfo)->bb
+                 == gimple_bb (SSA_NAME_DEF_STMT (op))))
+           {
+             gimple stmt = NULL;
+             if (where_p)
+               stmt = get_earlier_stmt (SSA_NAME_DEF_STMT (op),
+                                        gsi_stmt (where));
+             if (!stmt || stmt == gsi_stmt (where))
+               stmt = SSA_NAME_DEF_STMT (op);
+             where = gsi_for_stmt (stmt);
+             where_p = true;
+           }
          if (!types_compatible_p (TREE_TYPE (vector_type), TREE_TYPE (op)))
            {
              if (CONSTANT_CLASS_P (op))
@@ -2558,8 +2581,12 @@ vect_get_constant_vectors (tree op, slp_
                    CONSTRUCTOR_APPEND_ELT (v, NULL_TREE, elts[k]);
                  vec_cst = build_constructor (vector_type, v);
                }
+             if (where_p)
+               gsi_next (&where);
               voprnds.quick_push (vect_init_vector (stmt, vec_cst,
-                                                   vector_type, NULL));
+                                                   vector_type,
+                                                   where_p ? &where : NULL));
+             where_p = false;
              if (ctor_seq != NULL)
                {
                  gimple init_stmt = SSA_NAME_DEF_STMT (voprnds.last ());

a bit too hackish for my taste.  Also handling unvectorizable stmts as external
should instead turned into sth more general like forcing 'external' handling
as soon as the SLP build fails for children (and letting cost handling reject
the awful cases).

Comment 4 Henrik Holst 2014-08-27 14:26:03 UTC

Thank you Richard for looking into this issue.

You probably know already exactly why this bug appeared. I just wanted to stress the severity of this issue, and especially for Fortran which is often used in "number crunching" applications AND subroutine arguments are passed by reference by default.

The following F2008 code works as expected:

      subroutine test1(x,y)
      real x(4),y(4)
      gamma=3.141593
      block
        beta=gamma
        y(1)=y(1)+beta*x(1)
        y(2)=y(2)+beta*x(2)
        y(3)=y(3)+beta*x(3)
        y(4)=y(4)+beta*x(4)
      end block
      end

but when `beta=gamma` is replaced with `beta=alpha` it again fails and generates scalar code. So I ask: Does this bug force *ALL* computations which involves directly or indirectly parameter values to subroutines and functions, to be done in scalar? If so, its pretty bad.

Related or not, the following codes also generates scalar code:

      subroutine test2(x,y)
      real x(4),y(4)
      beta=3.141593
      do i=1,4
        y(i)=y(i)+beta*x(i)
      end do
      end

and

      subroutine test3(x,y)
      real x(4),y(4)
      beta=3.141593
      y=y+beta*x
      end

as well. I can create another bug for this if you think they are unrelated.

Comment 5 Henrik Holst 2014-08-27 14:52:27 UTC

The `alpha` argument in test1 subroutine went missing. (Visual copy-paste between two machines... :-))

Comment 6 Richard Biener 2014-08-28 09:17:05 UTC

(In reply to Henrik Holst from comment #4)
> Thank you Richard for looking into this issue.
> 
> You probably know already exactly why this bug appeared. I just wanted to
> stress the severity of this issue, and especially for Fortran which is often
> used in "number crunching" applications AND subroutine arguments are passed
> by reference by default.
> 
> The following F2008 code works as expected:
> 
>       subroutine test1(x,y)
>       real x(4),y(4)
>       gamma=3.141593
>       block
>         beta=gamma
>         y(1)=y(1)+beta*x(1)
>         y(2)=y(2)+beta*x(2)
>         y(3)=y(3)+beta*x(3)
>         y(4)=y(4)+beta*x(4)
>       end block
>       end
> 
> but when `beta=gamma` is replaced with `beta=alpha` it again fails and
> generates scalar code. So I ask: Does this bug force *ALL* computations
> which involves directly or indirectly parameter values to subroutines and
> functions, to be done in scalar? If so, its pretty bad.

For basic-block vectorization?  It depends whether the first use of
the indirect parameter is in the same basic-block that is supposed
to be vectorized or not ...

> Related or not, the following codes also generates scalar code:
> 
>       subroutine test2(x,y)
>       real x(4),y(4)
>       beta=3.141593
>       do i=1,4
>         y(i)=y(i)+beta*x(i)
>       end do
>       end

That's because we decide to peel for alignment and then decide the
result is not profitable to vectorize.  With -fno-vect-cost-model
the loop is vectorized (but in an awkward way).  You may want to
file a separate bug about this.

> and
> 
>       subroutine test3(x,y)
>       real x(4),y(4)
>       beta=3.141593
>       y=y+beta*x
>       end

Same issue (add it as another testcase to the new bug).

It is of course pointless to peel for alignment if the remaining loop
will always run less than the vectorization factor.  Versioning for
alignment may still be applied here.

> as well. I can create another bug for this if you think they are unrelated.

Comment 7 Richard Biener 2014-08-28 09:42:22 UTC

(In reply to Richard Biener from comment #6)
> (In reply to Henrik Holst from comment #4)
> > Thank you Richard for looking into this issue.
> > 
> > You probably know already exactly why this bug appeared. I just wanted to
> > stress the severity of this issue, and especially for Fortran which is often
> > used in "number crunching" applications AND subroutine arguments are passed
> > by reference by default.
> > 
> > The following F2008 code works as expected:
> > 
> >       subroutine test1(x,y)
> >       real x(4),y(4)
> >       gamma=3.141593
> >       block
> >         beta=gamma
> >         y(1)=y(1)+beta*x(1)
> >         y(2)=y(2)+beta*x(2)
> >         y(3)=y(3)+beta*x(3)
> >         y(4)=y(4)+beta*x(4)
> >       end block
> >       end
> > 
> > but when `beta=gamma` is replaced with `beta=alpha` it again fails and
> > generates scalar code. So I ask: Does this bug force *ALL* computations
> > which involves directly or indirectly parameter values to subroutines and
> > functions, to be done in scalar? If so, its pretty bad.
> 
> For basic-block vectorization?  It depends whether the first use of
> the indirect parameter is in the same basic-block that is supposed
> to be vectorized or not ...
> 
> > Related or not, the following codes also generates scalar code:
> > 
> >       subroutine test2(x,y)
> >       real x(4),y(4)
> >       beta=3.141593
> >       do i=1,4
> >         y(i)=y(i)+beta*x(i)
> >       end do
> >       end
> 
> That's because we decide to peel for alignment and then decide the
> result is not profitable to vectorize.  With -fno-vect-cost-model
> the loop is vectorized (but in an awkward way).  You may want to
> file a separate bug about this.
> 
> > and
> > 
> >       subroutine test3(x,y)
> >       real x(4),y(4)
> >       beta=3.141593
> >       y=y+beta*x
> >       end
> 
> Same issue (add it as another testcase to the new bug).
> 
> It is of course pointless to peel for alignment if the remaining loop
> will always run less than the vectorization factor.  Versioning for
> alignment may still be applied here.

I have a simple patch for this.

> > as well. I can create another bug for this if you think they are unrelated.

Comment 8 Richard Biener 2014-08-28 13:14:18 UTC

Author: rguenth
Date: Thu Aug 28 13:13:45 2014
New Revision: 214678

URL: https://gcc.gnu.org/viewcvs?rev=214678&root=gcc&view=rev
Log:
2014-08-28  Richard Biener  <rguenther@suse.de>

	PR tree-optimization/62283
	* tree-vect-data-refs.c (vect_enhance_data_refs_alignment):
	Do not peel loops for alignment where the vector loop likely
	doesn't run at least VF times.

	* gfortran.dg/vect/pr62283.f: New testcase.
	* gcc.dg/tree-ssa/cunroll-5.c: Adjust.
	* gcc.dg/vect/costmodel/i386/costmodel-vect-31.c: Likewise.
	* gcc.dg/vect/costmodel/i386/costmodel-vect-33.c: Likewise.
	* gcc.dg/vect/costmodel/x86_64/costmodel-vect-31.c: Likewise.
	* gcc.dg/vect/costmodel/x86_64/costmodel-vect-33.c: Likewise.
	* gcc.dg/vect/vect-33.c: Likewise.

Added:
    trunk/gcc/testsuite/gfortran.dg/vect/pr62283.f
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/testsuite/gcc.dg/tree-ssa/cunroll-5.c
    trunk/gcc/testsuite/gcc.dg/vect/costmodel/i386/costmodel-vect-31.c
    trunk/gcc/testsuite/gcc.dg/vect/costmodel/i386/costmodel-vect-33.c
    trunk/gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-vect-31.c
    trunk/gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-vect-33.c
    trunk/gcc/testsuite/gcc.dg/vect/vect-33.c
    trunk/gcc/tree-vect-data-refs.c

Comment 9 Richard Biener 2014-08-28 13:14:32 UTC

testcases in comment#4 should now work (no need to file a separate bug).

Comment 10 Rainer Orth 2014-09-01 13:33:06 UTC

The last patch introduced a failure on SPARC:

FAIL: gcc.dg/vect/vect-33.c scan-tree-dump-times vect "Alignment of access forced using versioning" 1
FAIL: gcc.dg/vect/vect-33.c -flto -ffat-lto-objects  scan-tree-dump-times vect "Alignment of access forced using versioning" 1

The vect dump still says

note: Alignment of access forced using peeling.

  Rainer

Comment 11 rguenther@suse.de 2014-09-01 13:35:19 UTC

On Mon, 1 Sep 2014, ro at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62283
> 
> Rainer Orth <ro at gcc dot gnu.org> changed:
> 
>            What    |Removed                     |Added
> ----------------------------------------------------------------------------
>                  CC|                            |ro at gcc dot gnu.org
> 
> --- Comment #10 from Rainer Orth <ro at gcc dot gnu.org> ---
> The last patch introduced a failure on SPARC:
> 
> FAIL: gcc.dg/vect/vect-33.c scan-tree-dump-times vect "Alignment of access
> forced using versioning" 1
> FAIL: gcc.dg/vect/vect-33.c -flto -ffat-lto-objects  scan-tree-dump-times vect
> "Alignment of access forced using versioning" 1
> 
> The vect dump still says
> 
> note: Alignment of access forced using peeling.

Can you attach the vect dump?  I suppose vectors are rather short on 
SPARC?

Comment 12 Rainer Orth 2014-09-01 13:37:41 UTC

Created attachment 33426 [details]
vect dump

Sure, I meant to but forgot...

Comment 13 Richard Biener 2014-09-01 14:11:27 UTC

(In reply to Rainer Orth from comment #12)
> Created attachment 33426 [details]
> vect dump
> 
> Sure, I meant to but forgot...

Ok:

/vol/gcc/src/hg/trunk/local/gcc/testsuite/gcc.dg/vect/vect-33.c:19:3: note: vectorization factor = 8

so we have a vector of 8 chars, v8qi but vect64 is not enabled for SPARC?
(target-supports.exp).  The { ! vect64 } / && vect64 cases were supposed
to catch the case where vectorization uses v8qi (instead of the more
common v16qi).

So ... can you test enabling vect64 for sparc?

Comment 14 ro@CeBiTec.Uni-Bielefeld.DE 2014-09-01 14:28:47 UTC

> --- Comment #13 from Richard Biener <rguenth at gcc dot gnu.org> ---
> (In reply to Rainer Orth from comment #12)
>> Created attachment 33426 [details]
>> vect dump
>> 
>> Sure, I meant to but forgot...
>
> Ok:
>
> /vol/gcc/src/hg/trunk/local/gcc/testsuite/gcc.dg/vect/vect-33.c:19:3: note:
> vectorization factor = 8
>
> so we have a vector of 8 chars, v8qi but vect64 is not enabled for SPARC?
> (target-supports.exp).  The { ! vect64 } / && vect64 cases were supposed
> to catch the case where vectorization uses v8qi (instead of the more
> common v16qi).
>
> So ... can you test enabling vect64 for sparc?

While it works for the test at hand, the two other tests using vect64
(bb-slp-11.c and bb-slp-26.c) now FAIL:

FAIL: gcc.dg/vect/bb-slp-11.c scan-tree-dump-times slp2 "basic block vectorized" 1
FAIL: gcc.dg/vect/bb-slp-26.c scan-tree-dump-times slp1 "basic block vectorized" 1

	Rainer

Comment 15 rguenther@suse.de 2014-09-01 14:40:46 UTC

On Mon, 1 Sep 2014, ro at CeBiTec dot Uni-Bielefeld.DE wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62283
> 
> --- Comment #14 from ro at CeBiTec dot Uni-Bielefeld.DE <ro at CeBiTec dot Uni-Bielefeld.DE> ---
> > --- Comment #13 from Richard Biener <rguenth at gcc dot gnu.org> ---
> > (In reply to Rainer Orth from comment #12)
> >> Created attachment 33426 [details]
> >> vect dump
> >> 
> >> Sure, I meant to but forgot...
> >
> > Ok:
> >
> > /vol/gcc/src/hg/trunk/local/gcc/testsuite/gcc.dg/vect/vect-33.c:19:3: note:
> > vectorization factor = 8
> >
> > so we have a vector of 8 chars, v8qi but vect64 is not enabled for SPARC?
> > (target-supports.exp).  The { ! vect64 } / && vect64 cases were supposed
> > to catch the case where vectorization uses v8qi (instead of the more
> > common v16qi).
> >
> > So ... can you test enabling vect64 for sparc?
> 
> While it works for the test at hand, the two other tests using vect64
> (bb-slp-11.c and bb-slp-26.c) now FAIL:
> 
> FAIL: gcc.dg/vect/bb-slp-11.c scan-tree-dump-times slp2 "basic block
> vectorized" 1
> FAIL: gcc.dg/vect/bb-slp-26.c scan-tree-dump-times slp1 "basic block
> vectorized" 1

I wonder about the dump files for those...  also if they fail on the
other vect64 target (arm-little-endian with NEON).

Richard.

Comment 16 Rainer Orth 2014-09-02 11:08:04 UTC

Created attachment 33433 [details]
bb-slp-11.c.126t.slp2 dump

Comment 17 Rainer Orth 2014-09-02 11:08:31 UTC

Created attachment 33434 [details]
bb-slp-26.c.120t.slp1 dump

Comment 18 ro@CeBiTec.Uni-Bielefeld.DE 2014-09-02 11:12:25 UTC

>> > So ... can you test enabling vect64 for sparc?
>> 
>> While it works for the test at hand, the two other tests using vect64
>> (bb-slp-11.c and bb-slp-26.c) now FAIL:
>> 
>> FAIL: gcc.dg/vect/bb-slp-11.c scan-tree-dump-times slp2 "basic block
>> vectorized" 1
>> FAIL: gcc.dg/vect/bb-slp-26.c scan-tree-dump-times slp1 "basic block
>> vectorized" 1
>
> I wonder about the dump files for those...  also if they fail on the

Attached now.

	Rainer

Comment 19 Richard Biener 2014-09-02 12:10:56 UTC

(In reply to Rainer Orth from comment #16)
> Created attachment 33433 [details]
> bb-slp-11.c.126t.slp2 dump

/vol/gcc/src/hg/trunk/local/gcc/testsuite/gcc.dg/vect/bb-slp-11.c:19:8: note: not vectorized: relevant stmt not supported: _4 = (unsigned short) _3;

seems SPARC VIS doesn't support int -> ushort conversion?  The idea is
that we need v4hi vectors for the conversion result and use two
v2si vectors for the input.

  _3 = MEM[(unsigned int *)&in];
  _4 = (unsigned short) _3;
  _5 = _4 + 23;
  a0_6 = (short int) _5;
...
  _19 = (unsigned int) a0_6;
  _21 = _19 * x_20(D);

Thus the testcase misses dg-require-effective-target vect_unpack and
vect_pack_trunc which both SPARC doesn't support.

Thus a testcase bug.

Comment 20 Richard Biener 2014-09-02 12:17:53 UTC

(In reply to Rainer Orth from comment #17)
> Created attachment 33434 [details]
> bb-slp-26.c.120t.slp1 dump

For this we see

/vol/gcc/src/hg/trunk/local/gcc/testsuite/gcc.dg/vect/bb-slp-26.c:19:14: note: Build SLP failed: unsupported unaligned load _12 = *src_132;

which means that SPARC VIS has no unalign scheme implemented (well, the
only valid one for basic-block vect being unaligned HW loads).

Not sure if that was intended for the testcase.

A fix is to see how arm manages to pass this (it isn't a hw_misalign
target - ah, but it only requires aligned vector elements - vect_element_align).

So again a testsuite bug, it should require vect_element_align
(which includes hw_misalign targets).

Comment 21 Richard Biener 2015-04-28 08:31:16 UTC

Author: rguenth
Date: Tue Apr 28 08:30:44 2015
New Revision: 222514

URL: https://gcc.gnu.org/viewcvs?rev=222514&root=gcc&view=rev
Log:
2015-04-28  Richard Biener  <rguenther@suse.de>

	PR tree-optimization/62283
	* tree-vect-slp.c (vect_build_slp_tree): When the SLP build
	fails fatally and we are vectorizing a basic-block simply
	cause the child to be constructed piecewise.
	(vect_analyze_slp_cost_1): Adjust.
	(vect_detect_hybrid_slp_stmts): Likewise.
	(vect_bb_slp_scalar_cost): Likewise.
	(vect_get_constant_vectors): For piecewise constructed
	constants place them after the last def.
	(vect_get_slp_defs): Adjust.
	* tree-vect-stmts.c (vect_is_simple_use): Detect in-BB
	externals for basic-block vectorization.

	* gfortran.dg/vect/pr62283-2.f: New testcase.
	* gcc.dg/vect/bb-slp-14.c: Adjust.

Added:
    trunk/gcc/testsuite/gfortran.dg/vect/pr62283-2.f
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/testsuite/gcc.dg/vect/bb-slp-14.c
    trunk/gcc/tree-vect-slp.c
    trunk/gcc/tree-vect-stmts.c

Comment 22 Richard Biener 2015-04-28 08:31:58 UTC

Fixed for GCC 6.

Comment 23 Rainer Orth 2015-05-04 14:43:31 UTC

The bb-slp-14.c testcase now FAILs on Solaris/SPARC.  Attaching the dump.

  Rainer

Comment 24 Rainer Orth 2015-05-04 14:44:31 UTC

Created attachment 35454 [details]
bb-slp-14.c.141t.slp2 dump

Comment 25 Richard Biener 2015-05-06 06:47:39 UTC

(In reply to Rainer Orth from comment #23)
> The bb-slp-14.c testcase now FAILs on Solaris/SPARC.  Attaching the dump.
> 
>   Rainer

/vol/gcc/src/hg/trunk/local/gcc/testsuite/gcc.dg/vect/bb-slp-14.c:19:10: note: not vectorized: relevant stmt not supported: _11 = a0_4 * x_10(D);

ok, so we miss to test for vect_int_mult.

Comment 26 Richard Biener 2015-05-06 06:48:04 UTC

Fixed.

Comment 27 Richard Biener 2015-05-06 06:48:10 UTC

Author: rguenth
Date: Wed May  6 06:47:38 2015
New Revision: 222843

URL: https://gcc.gnu.org/viewcvs?rev=222843&root=gcc&view=rev
Log:
2015-05-06  Richard Biener  <rguenther@suse.de>

	PR tree-optimization/62283
	* gcc.dg/vect/bb-slp-14.c: Adjust.

Modified:
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/testsuite/gcc.dg/vect/bb-slp-14.c

Comment 28 Rainer Orth 2015-05-06 11:31:47 UTC

Created attachment 35476 [details]
bb-slp-32.c.141t.slp2 dump

A reghunt just confirmed that the patch also caused 

XPASS: gcc.dg/vect/bb-slp-32.c -flto -ffat-lto-objects  scan-tree-dump slp2 "vectorization is not profitable"

Dump attached.

Comment 29 rguenther@suse.de 2015-05-06 12:18:24 UTC

On Wed, 6 May 2015, ro at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62283
> 
> --- Comment #28 from Rainer Orth <ro at gcc dot gnu.org> ---
> Created attachment 35476 [details]
>   --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=35476&action=edit
> bb-slp-32.c.141t.slp2 dump
> 
> A reghunt just confirmed that the patch also caused 
> 
> XPASS: gcc.dg/vect/bb-slp-32.c -flto -ffat-lto-objects  scan-tree-dump slp2
> "vectorization is not profitable"
> 
> Dump attached.

Well - for vect_no_align && { ! vect_hw_misalign } it now tries to
build the vector from scalar loads and that (rightfully) fails due
to the cost model:

  Vector inside of basic block cost: 4
  Vector prologue cost: 3
  Vector epilogue cost: 0
  Scalar cost of basic block: 4

so I bet we can now simply remove the XFAIL for all archs.  Will do that.

Comment 30 Richard Biener 2015-05-06 12:21:32 UTC

Author: rguenth
Date: Wed May  6 12:21:01 2015
New Revision: 222849

URL: https://gcc.gnu.org/viewcvs?rev=222849&root=gcc&view=rev
Log:
2015-05-06  Richard Biener  <rguenther@suse.de>

	PR tree-optimization/62283
	* gcc.dg/vect/bb-slp-32.c: Remove XFAIL.

Modified:
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/testsuite/gcc.dg/vect/bb-slp-32.c

Comment 31 Rainer Orth 2024-02-28 14:59:55 UTC

gcc.dg/vect/vect-33.c still FAILs on 32 and 64-bit SPARC:

FAIL: gcc.dg/vect/vect-33.c -flto -ffat-lto-objects  scan-tree-dump-not optimized "Invalid sum"
FAIL: gcc.dg/vect/vect-33.c scan-tree-dump-not optimized "Invalid sum"

Comment 32 Rainer Orth 2024-02-28 15:01:28 UTC

Created attachment 57563 [details]
32-bit sparc-sun-solaris2.11 vect-33.c.265t.optimized

Comment 33 anlauf 2024-03-15 21:27:21 UTC

Why should this be a fortran bug?

Can't we reassign it to target or middle-end?