19049 – not vectorizing a fortran loop

Bug 19049 - not vectorizing a fortran loop

Summary: not vectorizing a fortran loop

Status:	RESOLVED FIXED

Alias:	None

Product:	gcc
Classification:	Unclassified
Component:	tree-optimization (show other bugs)
Version:	4.0.0

Importance:	P2 enhancement
Target Milestone:	---
Assignee:	Not yet assigned to anyone

URL:
Keywords:	missed-optimization

Depends on:
Blocks:

Reported:	2004-12-17 04:02 UTC by Andrew Pinski
Modified:	2015-10-22 13:34 UTC (History)
CC List:	2 users (show)

See Also:	65962
Host:
Target:
Build:
Known to work:	6.0
Known to fail:	4.6.0
Last reconfirmed:	2013-02-01 00:00:00

Attachments
patch which fixes the non vectorizor problem (750 bytes, patch) 2004-12-17 04:27 UTC, Andrew Pinski	Details \| Diff
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Andrew Pinski 2004-12-17 04:02:55 UTC

From a benchmark which deals with vectorizing loops, I noticed that all of the loops were not being 
vectorized because we don't merge two BBs together.
      subroutine s111 (ntimes,ld,n,ctime,dtime,a,b,c,d,e,aa,bb,cc)
c
c     linear dependence testing
c     no dependence - vectorizable
c
      integer ntimes, ld, n, i, nl
      real a(n), b(n), c(n), d(n), e(n), aa(ld,n), bb(ld,n), cc(ld,n)
      real t1, t2, second, chksum, ctime, dtime, cs1d
      do 1 nl = 1,2*ntimes
      do 10 i = 2,n,2
         a(i) = a(i-1) + b(i)
  10  continue
      call dummy(ld,n,a,b,c,d,e,aa,bb,cc,1.)
  1   continue
      return
      end

Why we don't merge the loop with the label 10, I don't know.
I wonder if it is related to PR 19038.

Comment 1 Andrew Pinski 2004-12-17 04:05:43 UTC

If anyone wants the full benchmark I can attach it or sent it to them.

Comment 2 Andrew Pinski 2004-12-17 04:15:01 UTC

The problem is that tree_can_merge_blocks_p returns false as BB a has a user label, what should 
happen instead is just move the user label.

Comment 3 Andrew Pinski 2004-12-17 04:27:00 UTC

Created attachment 7768 [details]
patch which fixes the non vectorizor problem

This patch fixes the merging of the BB and the vectorizer can see the loop now.

Comment 4 Andrew Pinski 2005-04-02 00:16:58 UTC

pr19049.f:10: note: not vectorized: can't determine dependence between: (*a_38)[D.722_49] and 
(*a_38)[D.721_51]
pr19049.f:10: note: bad data dependence.

Comment 5 Ira Rosen 2005-04-25 09:58:44 UTC

The vectorizer fails to determine dependence between: (*a_38)[D.719_49] and 
(*a_38)[D.718_51], since it fails to determine that both of the data-refs have 
the same base, *a_38. This is already fixed in autovect branch, and I am 
working on a patch to bring the changes in data-refs analysis to mainline.

Comment 6 Ira Rosen 2005-07-26 07:07:20 UTC

The data dependence issue was solved by this patch http://gcc.gnu.org/ml/gcc-
patches/2005-07/msg01195.html (committed). However, this loop is still not 
vectorizable because of noncontinuous access.

Comment 7 Ira Rosen 2006-09-19 07:29:09 UTC

Even though vectorization of strided accesses is already implemented in the autovect branch (and will be committed to the mainline 4.3), this case contains
a store with a gap (store to a[i] without a store to a[i-1]), and such stores are not supported (the current implementation supports only loads with gaps).

Note, however, that adding a store to a[i-1] will create a data dependence in the loop.

Ira

Comment 8 Thomas Koenig 2010-11-09 20:07:30 UTC

Still working on this?

$ gfortran -S -O3 -ftree-vectorizer-verbose=8 vect.f

vect.f:9: note: not vectorized: inner-loop count not invariant.
vect.f:10: note: Detected single element interleaving *a_107(D)[D.1623_106] step 8
vect.f:10: note: Detected single element interleaving *b_111(D)[D.1620_110] step 8
vect.f:10: note: not vectorized: complicated access pattern.
vect.f:1: note: vectorized 0 loops in function.

Comment 9 Ira Rosen 2010-11-10 06:59:44 UTC

This is still not implemented. And at the moment I am not planning to do that.

Ira

Comment 10 Andrew Pinski 2014-12-01 05:33:30 UTC

We now get (at least on aarch64):
t.f90:11:0: note: === vect_pattern_recog ===
t.f90:11:0: note: === vect_analyze_data_ref_accesses ===
t.f90:11:0: note: Detected single element interleaving *a_23(D)[_22] step 8
t.f90:11:0: note: Data access with gaps requires scalar epilogue loop
t.f90:11:0: note: not consecutive access *a_23(D)[_25] = _28;

t.f90:11:0: note: not vectorized: complicated access pattern.
t.f90:11:0: note: bad data access.

Comment 11 Richard Biener 2015-10-22 13:33:49 UTC

Author: rguenth
Date: Thu Oct 22 13:33:17 2015
New Revision: 229172

URL: https://gcc.gnu.org/viewcvs?rev=229172&root=gcc&view=rev
Log:
2015-10-22  Richard Biener  <rguenther@suse.de>

	PR tree-optimization/19049
	PR tree-optimization/65962
	* tree-vect-data-refs.c (vect_analyze_group_access_1): Fall back
	to strided accesses if single-element interleaving doesn't work.

	* gcc.dg/vect/vect-strided-store-pr65962.c: New testcase.
	* gcc.dg/vect/vect-63.c: Adjust.
	* gcc.dg/vect/vect-70.c: Likewise.
	* gcc.dg/vect/vect-strided-u8-i2-gap.c: Likewise.
	* gcc.dg/vect/vect-strided-a-u8-i2-gap.c: Likewise.
	* gfortran.dg/vect/pr19049.f90: Likewise.
	* gfortran.dg/vect/vect-8.f90: Likewise.

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/testsuite/gcc.dg/vect/vect-63.c
    trunk/gcc/testsuite/gcc.dg/vect/vect-70.c
    trunk/gcc/testsuite/gcc.dg/vect/vect-strided-a-u8-i2-gap.c
    trunk/gcc/testsuite/gcc.dg/vect/vect-strided-u8-i2-gap.c
    trunk/gcc/testsuite/gfortran.dg/vect/pr19049.f90
    trunk/gcc/testsuite/gfortran.dg/vect/vect-8.f90
    trunk/gcc/tree-vect-data-refs.c

Comment 12 Richard Biener 2015-10-22 13:34:00 UTC

Fixed.