56595 – Tree-ssa-pre can create loop carried dependencies which prevent loop vectorization.

Bug 56595 - Tree-ssa-pre can create loop carried dependencies which prevent loop vectorization.

Summary: Tree-ssa-pre can create loop carried dependencies which prevent loop vectoriz...

Status:	RESOLVED DUPLICATE of bug 35229

Alias:	None

Product:	gcc
Classification:	Unclassified
Component:	tree-optimization (show other bugs)
Version:	4.8.0

Importance:	P3 normal
Target Milestone:	---
Assignee:	Not yet assigned to anyone

URL:
Keywords:

Depends on:
Blocks:	vectorizer
	Show dependency tree / graph

Reported:	2013-03-11 13:34 UTC by Yuri Rumyantsev
Modified:	2013-09-26 14:57 UTC (History)
CC List:	2 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed:	2013-03-11 00:00:00

Attachments
testcase (563 bytes, application/octet-stream) 2013-03-11 13:38 UTC, Yuri Rumyantsev	Details
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Yuri Rumyantsev 2013-03-11 13:34:38 UTC

In some cases pre can create loop carried dependencies spanning multiple iterations aka scalar replacement. This deficiency can be illustrated with attached test-case. After pre for stmt

            DO I = 0,I2
               T1 = 0.5D0 * (U1(I,J,K)    + U1(I+1,J,K))

pre creates loop carried dependence:

  <bb 172>:
...
  pretmp_690 = MEM[(real(kind=8)[0:] *)pretmp_675][pretmp_689];
...
  <bb 107>:
  # i_1 = PHI <0(172), i_437(175)>
  # prephitmp_691 = PHI <pretmp_690(172), _440(175)>


Note that in this particular test-case we have arrays with unknown stride1. If we have arrays with stride1 == 1 such transformation does not happen as for the following simple test-case which is successfully vectorized:

	subroutine bar(a,b,c,d,n, m)
	integer n, m
	real*8 a(n,*), b(n,*), c(n,*), d(n,*)
	do j=1,m
	do i=1,m
	x1 = 0.5 * (a(i,j) + a(i+1,j))
	x2 = 0.5 * (b(i,j) + b(i+1,j))
	x3 = 0.5 * (c(i,j) + c(i+1,j))
	d(i,j) = (x1 + x2 + x3) / 3.0
	enddo
	enddo
	end

Comment 1 Yuri Rumyantsev 2013-03-11 13:38:25 UTC

Created attachment 29636 [details]
testcase

This test must be compiled with the following options for x86:

-ffree-line-length-none -m64 -Ofast -march=core-avx-i -mavx

Comment 2 Richard Biener 2013-03-11 14:25:02 UTC

Confirmed (and known).  See inhibit_phi_insertion in PRE.

I bet you find some duplicate in the list of missed-vectorization bugs.

Loop store-motion can result in similar issues.

Note that an issue with limiting things even further is that PRE is limited
even when the resulting loop is _not_ vectorized.  Which is of course bad.

So the best solution is to teach the vectorizer to handle this kind of
dependency (after all user code can be written in that way from the start).

Comment 3 Richard Biener 2013-09-26 14:57:45 UTC

Dup.

*** This bug has been marked as a duplicate of bug 35229 ***