This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |
Other format: | [Raw text] |
Hi, This patch fixes issue reported by PR79347 by calculating/maintaining profile counter information on the fly in vect_do_peeling. Due to the order that we first peel prologue loop, peel epilogue loop, and then add guarding edge skipping prolog+vector loop if niter is small, this patch takes a trick that firstly scales down counters for loop before peeling and scales counters back after adding the aforementioned guarding edge. Otherwise, more work would be needed to calculate counters for prolog and vector loop. After this patch, # of profile counter for tramp3d benchmark is improved from: tramp3d-v4.cpp.157t.ifcvt:296 tramp3d-v4.cpp.158t.vect:1118 tramp3d-v4.cpp.159t.dce6:1118 tramp3d-v4.cpp.160t.pcom:1118 tramp3d-v4.cpp.161t.cunroll:1019 tramp3d-v4.cpp.162t.slp1:1019 tramp3d-v4.cpp.164t.ivopts:1019 tramp3d-v4.cpp.165t.lim4:1019 tramp3d-v4.cpp.166t.loopdone:1007 tramp3d-v4.cpp.167t.no_loop:31 ... tramp3d-v4.cpp.226t.optimized:1009 to: tramp3d-v4.cpp.157t.ifcvt:296 tramp3d-v4.cpp.158t.vect:814 tramp3d-v4.cpp.159t.dce6:814 tramp3d-v4.cpp.160t.pcom:814 tramp3d-v4.cpp.161t.cunroll:723 tramp3d-v4.cpp.162t.slp1:723 tramp3d-v4.cpp.164t.ivopts:723 tramp3d-v4.cpp.165t.lim4:723 tramp3d-v4.cpp.166t.loopdone:711 tramp3d-v4.cpp.167t.no_loop:31 ... tramp3d-v4.cpp.226t.optimized:831 Bootstrap and test on x86_64 and AArch64. Is it OK? BTW, with the patch, vectorizer only introduces mismatches by below code in vect_transform_loop: /* Reduce loop iterations by the vectorization factor. */ scale_loop_profile (loop, GCOV_COMPUTE_SCALE (1, vf), expected_iterations / vf); Though it makes sense to scale down according to vect-factor, but it definitely introduces mismatch between vector_loop's frequency and the rest program. I also believe it is not that useful to scale here, especially without profiling information. At least we need to make vector_loop's frequency consistent with the rest program. Thanks, bin 2017-02-13 Bin Cheng <bin.cheng@arm.com> PR tree-optimization/79347 * tree-vect-loop-manip.c (apply_probability_for_bb): New function. (vect_do_peeling): Maintain profile counters during peeling. gcc/testsuite/ChangeLog 2017-02-13 Bin Cheng <bin.cheng@arm.com> PR tree-optimization/79347 * gcc.dg/vect/pr79347.c: New test.
Attachment:
pr79347-20170209.txt
Description: pr79347-20170209.txt
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |