This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |
Other format: | [Raw text] |
Hello, This patch defines the cost-model target-specific costs for x86 (only AMD K8 for now). I looked at some Polyhedron benchmarks to arrive at the costs. I also made some small fixes to the cost model itself. - The processor costs for all processors have currently been filled in with the default used by the cost model, except K8, which is tuned. For now, I have filled in size_cost with just 1 byte for all statements. The size_cost is not being used currently, but can be used in the future as there was some discussion at the summit about using size as an alternative to make the decision to vectorize. - An issue I have run into is the testsuite. I have not added any test cases because the test cases under vect/costmodel/x86 are run without any -mtune options and hence they default to mtune=generic. So the k8_costs added by this patch do not get used by make check right now. One way to address this is to add sub directories for individual x86 processors that can be tuned and replicating the test cases for each of those processors with possibly different behavior when vectorized with the cost model. Another way would be to use -mtune=native in the x86 costmodel-vect.exp file and having the actual processor tuned emitted in the vectorizer dump and having some macro along the lines of "istarget" that can search for the processor string. Any input on the approaches would be very welcome. The following changes have been made to the cost model itself: - When we don't know the number of prologue/epilogue iterations we currently assume (VF-1)/2. In case of double precision on x86, this causes prologue and epilogue to be estimated as 0, when alignment is unknown. At this point if the number of iterations is unknown or not exactly divisible by the vectorization factor, the cost model goes on to account for guard branches in the epilogue even though the epilogue iterations are estimated as 0. (This may be a different kind of a problem on the spu also, where the branch cost is 6 and run time test cost is -19. This may result in outside vector cost of (2*6) + (-19) = -7, a negative value). This patch changes the estimated prologue/epilogue, when unknown to vf/2 to avoid being estimated as 0 for double precision (this will still not help eradicate the corner case mentioned above for the spu). - The patch also changes the number of branches per prologue and epilogue each to 1 instead of 2. Looking at some loops, the branches can get optimized out, especially when the alignment and iterations are known at compile time. - The cost model equation is changed to min_profitable_iters = (voc * vf - vic * prologue_iters - vic * epilogue_iters) /(sic * vf - vic) instead of min_iters = (voc * vf)/(sic * vf - vic) - Currently, the cost model checks if the iterations are LESS than whats estimated by the cost model at compile time, instead of LESS than or EQUAL. - The cost model checks at compile time if the loop iterations are less than or equal to whats estimated by the cost model. It also checks at run time if the iterations left after peeling for prologue and epilogue are less than or equal to whats estimated by the cost model. If a loop is determined as profitable at compile-time, it may be evaluated as unprofitable at run-time. For example, num_iters=7 and vf=4 and cost model estimates profitable_iters=6 and data is aligned. At compile time since (num_iters > min_profitable_iters), this loop is profitable to vectorize. However, the run time check compares the iterations left after peeling for vf i.e 4 versus the min_profitable_iters and finds it to be unprofitable. This patch sets the runtime threshold as 0 for cases which can be evaluated for profitability at compile time. - The patch has been bootstrapped on x86-64 and all the 64 and 32-bit tests pass as expected, except costmodel-vect-31.c in x86_64 and i386. The loop at line 66 in vect-31.c was being estimated as not profitable to vectorize. Because of the changes to the cost model, this is now estimated as profitable. Running it on K8 indicates that it is profitable to vectorize and the cost model estimates this correctly for K8. ChangeLogs: gcc/ChangeLog: 2007-09-04 Harsha Jagasia <harsha.jagasia@amd.com> * config/i386/i386.h (processor_costs): Add scalar_stmt_cost, scalar_load_cost, scalar_store_cost, vec_stmt_cost, vec_to_scalar_cost, scalar_to_vec_cost, vec_align_load_cost, vect_unalign_load_cost, vec_store_cost. Define macros for x86 costs. * config/i386/i386.c (size_cost): Set scalar_stmt_cost, scalar_load_cost, scalar_store_cost, vec_stmt_cost, vec_to_scalar_cost, scalar_to_vec_cost, vec_align_load_cost, vect_unalign_load_cost, vec_store_cost to 1. (i386_cost, i486_cost, pentium_cost, pentiumpro_cost, geode_cost, k6_cost, athlon_cost, amdfam10_cost, pentium4_cost, nocona_cost, core2_cost, generic64_cost, generic32_cost): Set to default untuned costs. (k8_cost): Costs for vectorization tuned. (x86_builtin_vectorization_cost): New. * tree-vect-analyze.c (vect_analyze_operations): Change comparison of loop iterations with threshold to less than or equal to instead of less than. * tree-vect-transform.c (vect_estimate_min_profitable_iters): Change prologue and epilogue iterations estimate to vf/2, when unknown at compile-time. Change the number of guards for prologue and epilogue to be 1 each. Change the cost model equation to consider vector iterations as the loop iterations less the prologue and epilogue iterations. (vect_do_peeling_for_loop_bound): Set profitability threshold to zero when loop has known number of iterations and known alignment for data. gcc/testsuite/ChangeLog: 2007-09-04 Harsha Jagasia <harsha.jagasia@amd.com> * gcc.dg/vect/costmodel/i386/costmodel-vect-31.c: Change dg-final to expect 1 non-profitable loop and 3 profitable loops. * gcc.dg/vect/costmodel/x86-64/costmodel-vect-31.c: Change dg-final to expect 1 non-profitable loop and 3 profitable loops. Thanks, Harsha
Attachment:
target.patch.txt
Description: target.patch.txt
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |