Bug 47341

Summary: unnecessary versioning in the vectorizer, not implemented affine-affine test
Product: gcc Reporter: Joost VandeVondele <Joost.VandeVondele>
Component: tree-optimizationAssignee: Not yet assigned to anyone <unassigned>
Status: NEW ---    
Severity: enhancement CC: Joost.VandeVondele, mikael, vincenzo.innocente
Priority: P3 Keywords: missed-optimization
Version: 4.6.0   
Target Milestone: ---   
Host: Target:
Build: Known to work:
Known to fail: Last reconfirmed: 2013-03-29 00:00:00
Bug Depends on:    
Bug Blocks: 53947    

Description Joost VandeVondele 2011-01-18 10:50:46 UTC
with current trunk:

> cat test.f90
   SUBROUTINE HARD_NN_4_4_4_5_1_2_4(C,A,B)
      REAL(KIND=8) :: C(4,*)
      REAL(KIND=8) :: B(4,*), A(4,*)
      INTEGER ::i,j,l
      l=           1
      DO j=           1 ,           4 ,           2
      DO i=           1 ,           4 ,           1
        C(i+0,j+0)=C(i+0,j+0)+A(i+0,l+0)*B(l+0,j+0)
        C(i+0,j+0)=C(i+0,j+0)+A(i+0,l+1)*B(l+1,j+0)
        C(i+0,j+0)=C(i+0,j+0)+A(i+0,l+2)*B(l+2,j+0)
        C(i+0,j+0)=C(i+0,j+0)+A(i+0,l+3)*B(l+3,j+0)
        C(i+0,j+1)=C(i+0,j+1)+A(i+0,l+0)*B(l+0,j+1)
        C(i+0,j+1)=C(i+0,j+1)+A(i+0,l+1)*B(l+1,j+1)
        C(i+0,j+1)=C(i+0,j+1)+A(i+0,l+2)*B(l+2,j+1)
        C(i+0,j+1)=C(i+0,j+1)+A(i+0,l+3)*B(l+3,j+1)
      ENDDO
      ENDDO
    END SUBROUTINE

> gfortran-trunk -c -O2 -fno-unroll-loops -ftree-vectorize -ftree-vectorizer-verbose=1 -march=core2 -msse4.2 test.f90

test.f90:7: note: created 1 versioning for alias checks.

test.f90:7: note: LOOP VECTORIZED.
test.f90:1: note: vectorized 1 loops in function.

The compiler should not need to generate various version of these loops. With the bounds info provided, nothing can alias (I think).
Comment 1 Richard Biener 2011-01-18 11:21:06 UTC
t.f90:7: note: versioning for alias required: can't determine dependence between *c_18(D)[D.1552_17] and *c_18(D)[D.1603_128]

  pretmp.22_273 = (integer(kind=8)) j_2;

  pretmp.22_274 = pretmp.22_273 * 4;

  pretmp.30_287 = pretmp.22_273 + 1;
  pretmp.30_288 = pretmp.30_287 * 4;

  D.1548_13 = (integer(kind=8)) i_1;
  D.1551_16 = D.1548_13 + pretmp.22_274;
  D.1552_17 = D.1551_16 + -5;

  D.1602_127 = D.1548_13 + pretmp.30_288;
  D.1603_128 = D.1602_127 + -5;

thus we can't determine the dependence between

  *(c_18(D) + (integer(kind=8)) i_1 + ((integer(kind=8)) j_2) * 4)

vs

  *(c_18(D) + (integer(kind=8)) i_1 + (((integer(kind=8)) j_2) + 1) * 4)


(compute_affine_dependence
  (stmt_a = 
D.1553_19 = *c_18(D)[D.1552_17];
)       
  (stmt_b = 
D.1604_129 = *c_18(D)[D.1603_128];
)
(subscript_dependence_tester
(analyze_overlapping_iterations 
  (chrec_a = {pretmp.22_274 + -4, +, 1}_2)
  (chrec_b = {pretmp.30_288 + -4, +, 1}_2)
(analyze_siv_subscript 
siv test failed: unimplemented.
)

the SCEVs cannot be expanded properly because of the casts.
Comment 2 Mikael Morin 2011-01-18 12:30:09 UTC
(In reply to comment #1)
> the SCEVs cannot be expanded properly because of the casts.

Doesn't seem to work better with i,j,l made integer(8), i.e. without the casts.
Comment 3 Joost VandeVondele 2012-06-30 13:39:57 UTC
versioning still happens with 4.8
Comment 4 Richard Biener 2012-07-19 11:21:01 UTC
can't determine dependence between *c_9(D)[D.1882_8] and *c_9(D)[D.1933_40]

  pretmp.22_166 = (integer(kind=8)) j_2;
  D.1880_6 = (integer(kind=8)) i_1;

  pretmp.22_167 = pretmp.22_166 * 16;  j * 16
  D.1881_7 = D.1880_6 + pretmp.22_167;
  D.1882_8 = D.1881_7 + -17;

  pretmp.30_181 = pretmp.22_166 + 1;
  pretmp.30_182 = pretmp.30_181 * 16;  (j + 1) * 16
  D.1932_39 = D.1880_6 + pretmp.30_182;
  D.1933_40 = D.1932_39 + -17;

(compute_affine_dependence
  stmt_a: D.1883_10 = *c_9(D)[D.1882_8];
  stmt_b: D.1934_41 = *c_9(D)[D.1933_40];
(subscript_dependence_tester
(analyze_overlapping_iterations
  (chrec_a = {{0, +, 32}_1, +, 1}_2)
  (chrec_b = {{16, +, 32}_1, +, 1}_2)
(analyze_miv_subscript
(analyze_subscript_affine_affine
affine-affine test failed: unimplemented.
) -> dependence analysis failed

so we seem to be one step further ;)  In fact we now hit the issue that
the fortran frontend presents us with lowered array accesses.  We
see a one-dimensional access and do not consider the two indices to
be independent.  In the above case though we know the number of iterations
of loop 2 and thus could see that there is never any overlap.
Comment 5 Joost VandeVondele 2013-03-29 08:29:53 UTC
still versioning for trunk 4.9.0