This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
[PATCH] Fix PR34163, long standing performance regression in Polyhedron NF
- From: Richard Guenther <rguenther at suse dot de>
- To: gcc-patches at gcc dot gnu dot org
- Date: Fri, 3 Jul 2009 16:09:20 +0200 (CEST)
- Subject: [PATCH] Fix PR34163, long standing performance regression in Polyhedron NF
With being more correct with respect to not introducing new overflow
fold has stopped to do some transformations that it previously did.
For the case in question SCEV, or rather data-ref analysis is
confused when it sees seemingly unrelated access-functions like
for
(Data Ref:
stmt: D.1627_57 = (*x_56(D))[D.1625_52];
ref: (*x_56(D))[D.1625_52];
base_object: (*x_56(D))[0];
Access function 0: {(integer(kind=8)) i_43 + -1, +, 1}_1
Access function 1: 0B
)
(Data Ref:
stmt: D.1634_67 = (*x_58(D))[D.1632_62];
ref: (*x_58(D))[D.1632_62];
base_object: (*x_58(D))[0];
Access function 0: {(integer(kind=8)) (i_43 + -1) + -1, +, 1}_1
Access function 1: 0B
Fold does not convert (integer(kind=8)) (i_43 + -1) to
(integer(kind=8)) i_43 + -1 just because this is not profitable, not
because it would be illegal. With performing this canonicalizing
transformation in SCEV analysis we get
(Data Ref:
stmt: D.1632_65 = (*x_56(D))[D.1630_60];
ref: (*x_56(D))[D.1630_60];
base_object: (*x_56(D))[0];
Access function 0: {(integer(kind=8)) i_43 + -2, +, 1}_1
Access function 1: 0B
)
instead which makes predictive-commoning optimize NF again (and does
not trick the vectorizer into thinking that maybe a runtime alias
check is profitable).
Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk.
Richard.
2009-07-03 Richard Guenther <rguenther@suse.de>
PR middle-end/34163
* tree-chrec.c (chrec_convert_1): Fold (T2)(t +- x) to
(T2)t +- (T2)x if t +- x is known to not overflow and
the conversion widens the operation.
* Makefile.in (tree-chrec.o): Add $(FLAGS_H) dependency.
* gfortran.dg/pr34163.f90: New testcase.
Index: gcc/tree-chrec.c
===================================================================
*** gcc/tree-chrec.c (revision 149204)
--- gcc/tree-chrec.c (working copy)
*************** along with GCC; see the file COPYING3.
*** 37,42 ****
--- 37,43 ----
#include "tree-chrec.h"
#include "tree-pass.h"
#include "params.h"
+ #include "flags.h"
#include "tree-scalar-evolution.h"
*************** chrec_convert_1 (tree type, tree chrec,
*** 1286,1292 ****
/* If we cannot propagate the cast inside the chrec, just keep the cast. */
keep_cast:
! res = fold_convert (type, chrec);
/* Don't propagate overflows. */
if (CONSTANT_CLASS_P (res))
--- 1287,1305 ----
/* If we cannot propagate the cast inside the chrec, just keep the cast. */
keep_cast:
! /* Fold will not canonicalize (long)(i - 1) to (long)i - 1 because that
! may be more expensive. We do want to perform this optimization here
! though for canonicalization reasons. */
! if (use_overflow_semantics
! && (TREE_CODE (chrec) == PLUS_EXPR
! || TREE_CODE (chrec) == MINUS_EXPR)
! && TYPE_PRECISION (type) > TYPE_PRECISION (ct)
! && TYPE_OVERFLOW_UNDEFINED (ct))
! res = fold_build2 (TREE_CODE (chrec), type,
! fold_convert (type, TREE_OPERAND (chrec, 0)),
! fold_convert (type, TREE_OPERAND (chrec, 1)));
! else
! res = fold_convert (type, chrec);
/* Don't propagate overflows. */
if (CONSTANT_CLASS_P (res))
Index: gcc/Makefile.in
===================================================================
*** gcc/Makefile.in (revision 149204)
--- gcc/Makefile.in (working copy)
*************** omega.o : omega.c omega.h $(CONFIG_H) $(
*** 2414,2420 ****
$(GGC_H) $(TREE_H) $(DIAGNOSTIC_H) varray.h $(TREE_PASS_H) $(PARAMS_H)
tree-chrec.o: tree-chrec.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) \
$(GGC_H) $(TREE_H) $(REAL_H) $(SCEV_H) $(TREE_PASS_H) $(PARAMS_H) \
! $(DIAGNOSTIC_H) $(CFGLOOP_H) $(TREE_FLOW_H)
tree-scalar-evolution.o: tree-scalar-evolution.c $(CONFIG_H) $(SYSTEM_H) \
coretypes.h $(TM_H) $(GGC_H) $(TREE_H) $(REAL_H) $(RTL_H) \
$(BASIC_BLOCK_H) $(DIAGNOSTIC_H) $(TREE_FLOW_H) $(TREE_DUMP_H) \
--- 2414,2420 ----
$(GGC_H) $(TREE_H) $(DIAGNOSTIC_H) varray.h $(TREE_PASS_H) $(PARAMS_H)
tree-chrec.o: tree-chrec.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) \
$(GGC_H) $(TREE_H) $(REAL_H) $(SCEV_H) $(TREE_PASS_H) $(PARAMS_H) \
! $(DIAGNOSTIC_H) $(CFGLOOP_H) $(TREE_FLOW_H) $(FLAGS_H)
tree-scalar-evolution.o: tree-scalar-evolution.c $(CONFIG_H) $(SYSTEM_H) \
coretypes.h $(TM_H) $(GGC_H) $(TREE_H) $(REAL_H) $(RTL_H) \
$(BASIC_BLOCK_H) $(DIAGNOSTIC_H) $(TREE_FLOW_H) $(TREE_DUMP_H) \
Index: gcc/testsuite/gfortran.dg/pr34163.f90
===================================================================
*** gcc/testsuite/gfortran.dg/pr34163.f90 (revision 0)
--- gcc/testsuite/gfortran.dg/pr34163.f90 (revision 0)
***************
*** 0 ****
--- 1,16 ----
+ ! { dg-do compile }
+ ! { dg-options "-O2 -fpredictive-commoning -fdump-tree-pcom-details" }
+ subroutine trisolve2(x,i1,i2,nxyz)
+ integer :: nxyz
+ real,dimension(nxyz):: au1
+ real,allocatable,dimension(:) :: gi
+ integer :: i1 , i2
+ real,dimension(i2)::x
+ integer :: i
+ allocate(gi(nxyz))
+ do i = i1+1 , i2
+ x(i) = gi(i)*(x(i)-au1(i-1)*x(i-1))
+ enddo
+ end subroutine trisolve2
+ ! { dg-final { scan-tree-dump "Executing predictive commoning" "pcom" } }
+ ! { dg-final { cleanup-tree-dump "pcom" } }