This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[PATCH PR79347]Maintain profile counter information in vect_do_peeling


Hi,
This patch fixes issue reported by PR79347 by calculating/maintaining profile counter information
on the fly in vect_do_peeling.  Due to the order that we first peel prologue loop, peel epilogue loop,
and then add guarding edge skipping prolog+vector loop if niter is small, this patch takes a trick
that firstly scales down counters for loop before peeling and scales counters back after adding the
aforementioned guarding edge.  Otherwise, more work would be needed to calculate counters for
prolog and vector loop. After this patch, # of profile counter for tramp3d benchmark is improved from:

tramp3d-v4.cpp.157t.ifcvt:296
tramp3d-v4.cpp.158t.vect:1118
tramp3d-v4.cpp.159t.dce6:1118
tramp3d-v4.cpp.160t.pcom:1118
tramp3d-v4.cpp.161t.cunroll:1019
tramp3d-v4.cpp.162t.slp1:1019
tramp3d-v4.cpp.164t.ivopts:1019
tramp3d-v4.cpp.165t.lim4:1019
tramp3d-v4.cpp.166t.loopdone:1007
tramp3d-v4.cpp.167t.no_loop:31
...
tramp3d-v4.cpp.226t.optimized:1009

to:

tramp3d-v4.cpp.157t.ifcvt:296
tramp3d-v4.cpp.158t.vect:814
tramp3d-v4.cpp.159t.dce6:814
tramp3d-v4.cpp.160t.pcom:814
tramp3d-v4.cpp.161t.cunroll:723
tramp3d-v4.cpp.162t.slp1:723
tramp3d-v4.cpp.164t.ivopts:723
tramp3d-v4.cpp.165t.lim4:723
tramp3d-v4.cpp.166t.loopdone:711
tramp3d-v4.cpp.167t.no_loop:31
...
tramp3d-v4.cpp.226t.optimized:831

Bootstrap and test on x86_64 and AArch64.  Is it OK?

BTW, with the patch, vectorizer only introduces mismatches by below code in vect_transform_loop:

  /* Reduce loop iterations by the vectorization factor.  */
  scale_loop_profile (loop, GCOV_COMPUTE_SCALE (1, vf),
		      expected_iterations / vf);

Though it makes sense to scale down according to vect-factor, but it definitely introduces
mismatch between vector_loop's frequency and the rest program.  I also believe it is not
that useful to scale here, especially without profiling information.  At least we need to make
vector_loop's frequency consistent with the rest program.

Thanks,
bin
2017-02-13  Bin Cheng  <bin.cheng@arm.com>

	PR tree-optimization/79347
	* tree-vect-loop-manip.c (apply_probability_for_bb): New function.
	(vect_do_peeling): Maintain profile counters during peeling.

gcc/testsuite/ChangeLog
2017-02-13  Bin Cheng  <bin.cheng@arm.com>

	PR tree-optimization/79347
	* gcc.dg/vect/pr79347.c: New test.

Attachment: pr79347-20170209.txt
Description: pr79347-20170209.txt


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]