We get the following failures on powerpc64-suse-linux: FAIL: gcc.dg/vect/vect-46.c scan-tree-dump-times vectorized 1 loops 1 FAIL: gcc.dg/vect/vect-50.c scan-tree-dump-times vectorized 1 loops 1 FAIL: gcc.dg/vect/vect-52.c scan-tree-dump-times vectorized 1 loops 1 FAIL: gcc.dg/vect/vect-58.c scan-tree-dump-times vectorized 1 loops 1 FAIL: gcc.dg/vect/vect-60.c scan-tree-dump-times vectorized 1 loops 1 FAIL: gcc.dg/vect/vect-77.c scan-tree-dump-times vectorized 1 loops 1 FAIL: gcc.dg/vect/vect-77a.c scan-tree-dump-times vectorized 1 loops 1 The access function that the evolution analyzer returns for the pointers in these loops when the compiler is configured for 64bit (powerpc64-suse-linux) is more complicated than when configured for 32bit (powerpc-suse-linux): When the compiler is configured for 32bit, the pointer arithmetic in the loop looks like: # i_1 = PHI <i_24(5), 0(3)>; <L0>:; i.4_6 = (unsigned int) i_1; D.1588_7 = i.4_6 * 4; D.1589_8 = (afloat * restrict) D.1588_7; D.1591_15 = D.1589_8 + pb_14; ... = *D.1591_15; ... i_24 = i_1 + 1; if (n_3 > i_24) goto <L9>; else goto <L10>; the access function that is computed for the pointer is: Access function of ptr: {pb_14, +, 4B}_1 which is simple enough, and the loop is vectorized. On the other hand, when the compiler is configured for 64bit, the pointer arithmetic in the loop looks like: # i_1 = PHI <i_24(5), 0(3)>; <L0>:; D.1816_6 = (long unsigned int) i_1; D.1817_7 = D.1816_6 * 4; D.1818_8 = (afloat * restrict) D.1817_7; D.1820_15 = D.1818_8 + pb_14; ... = *D.1820_15; ... i_24 = i_1 + 1; if (n_3 > i_24) goto <L9>; else goto <L10>; and in this case the access function that is computed for the pointer is: Access function of ptr: (afloat * restrict) ((long unsigned int) {0, +, 1}_1 * 4) + pb_14 The vectorizer does not handle such access-functions at the moment, and thereofore fails to vectorize the loop: loop at vect-46.c:37: not vectorized: pointer access is not simple. loop at vect-46.c:37: not vectorized: unhandled data ref: D.1821_16 = *D.1820_15 loop at vect-46.c:37: bad data references. These loops should be marked xfail for now for ppc64-linux. One of the following would allow vectorizing these loops: - The evolution analyzer knows to ignore the cast to (unsigned int) when it builds the access function, but it doesn't ignore the cast to (long unsigned int). If this cast can be avoided when building the access function, it would be simple enough to handle later on - Enhance the vectorizer to digest such access-functions
Confirmed.
At least on powerpc-darwin (with -m64) we now vectorize these loops but we ICE because we have: pointer_type + int_type which is not valid and is even worse on 64bit targets as int is 32 bit so we try to move SI mode register into a DI mode register and it ICE in emit_move_insn because of this.
(In reply to comment #2) > At least on powerpc-darwin (with -m64) we now vectorize these loops but we ICE because we have: > pointer_type + int_type which is not valid and is even worse on 64bit targets as int is 32 bit so we try to > move SI mode register into a DI mode register and it ICE in emit_move_insn because of this. Yes. I'll have a patch for that shortly. This would take care of testcases vect- [46,50,52,58,60].c. A separate problem is the testcases that don't get vectorized with -m64; these are vect-[77,77a,78].c and a newcomer - pr18425.c.
*** Bug 18505 has been marked as a duplicate of this bug. ***
patch: http://gcc.gnu.org/ml/gcc-patches/2004-11/msg01301.html
Testcases vect-[77,77a,78].c don't get vectorized with -m64 because the access function that the evolution analyzer returns for the pointers in these loops looks like the following: ib_16 + (aint *) ((long unsigned int) {off_11, +, 1}_1 * 4) Whereas with -m32 it looks like: {ib_17 + (aint *) ((unsigned int) off.4_12 * 4), +, 4B}_1 (the vectorizer is able to extract the initial-condition and step when the access function is represented this way: step: 4B init: ib_17 + (aint *) ((unsigned int) off.4_12 * 4)). These testcases should temporarily xfail when -m64 is used (or if the compiler is configured with powerpc64*). Testcase pr18425.c can't be vectorized with -m64 because there's no vector support for 64bit elements. This testcase should also xfail (not temporarily) when -m64 is used or if the compiler is configured with powerpc64*.
GCC for powerpc64-*-linux* could be any of the following: (a) a compiler that generates only LP64 code; (b) a biarch compiler that generates ILP32 code by default; or (c) a biarch compiler that generates LP64 code by default. There's currently no way to detect, in an xfail list, that a test is compiled for LP64 code on a powerpc64-*-linux* target. My nightly bootstrap and test run uses (b) and Jon Grimm's uses (c). If a test is unsupported for LP64 then it can always be skipped by using { dg-require-effective-target ilp32 }.
> Testcase pr18425.c can't be vectorized with -m64 because there's no vector > support for 64bit elements. This testcase should also xfail (not temporarily) > when -m64 is used or if the compiler is configured with powerpc64*. Same on SPARC 64-bit (i.e. sparc64-*-* or sparc-*-* with -m64).
Subject: Bug 18403 CVSROOT: /cvs/gcc Module name: gcc Changes by: dorit@gcc.gnu.org 2004-11-23 09:19:25 Modified files: gcc : ChangeLog tree-vectorizer.c Log message: PR tree-opt/18403 PR tree-opt/18505 * tree-vectorizer.c (vect_create_data_ref_ptr): Use lang_hooks.types.type_for_size instead of integer_type_node for the type of ptr_update. Patches: http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/ChangeLog.diff?cvsroot=gcc&r1=2.6482&r2=2.6483 http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/tree-vectorizer.c.diff?cvsroot=gcc&r1=2.39&r2=2.40
just for the record - related comments from http://gcc.gnu.org/ml/gcc- patches/2004-11/msg01394.html: " > > A question: how would you write a testcase that when compiled on powerpc* > > the dg-final check xfails for powerpc64* or if -m64 is used? (I want to > > xfail testcase pr18425.c for 64bit, and temporarily also xfail testcases > > vect-[77,77a,78].c for 64bit - see PR18403). > > I'll try to come up with another > solution, but in the meantime it's more important that those tests run > on powerpc64-*-* for 32-bit code than to xfail them for 64-bit code. So PR18403 will remain open for now with the remaining 64bit failures. OK, thanks dorit "
vect-[46,50,52,58,60] don't fail anymore, and vect-[77,78] xfail on vectorizing for lp62 targets, so I think we can classify this PR as missed-optimization only, or close it and open a new (missed-optimization) PR for vect-[77,78].c (for lp64 targets).
Another testcase that exhibits a similar problem: vect-5.f90 (patch: http://gcc.gnu.org/ml/gcc-patches/2005-03/msg02840.html) On powerpc64-linux (lp64) the second loop is not vectorized because the data- references analysis in the vectorizer can't extract the evolution from the access_function returned by the evolution analyzer for the accesses to array b. In lp32 mode the access function we get from the evolution analyzer is simpler, and the loop gets vectorized. ==> Access function for each case: lp64: (int8) {i_3, +, 1}_3 + -1 lp32: {i_3 + -1, +, 1}_3 ==> Vectorizer dataref analysis report for each case: vect-5.f90:3: note: Results of object analysis for: b lp64: base_address: &b offset: (<unnamed type>) ((int8) {i_3, +, 1}_3 * 4 + -4) step: 0 base aligned 1 lp32: base_address: &b offset: (<unnamed type>) (i_3 * 4 + -4) step: 4 base aligned 1 ==> Tree dump for each case: lp64: # j_5 = PHI <i_3(15), j_55(18)>; <L32>:; D.712_48 = (int8) j_5; D.713_49 = D.712_48 + 7; D.714_51 = D.712_48 + -1; D.715_52 = b[D.714_51]; a[D.713_49] = D.715_52; j_55 = j_5 + 1; if (j_5 == D.688_44) goto <L46>; else goto <L47>; lp32: # j_6 = PHI <i_4(21), j_40(25)>; <L34>:; D.518_35 = j_6 + 7; D.523_36 = a[D.518_35]; D.519_37 = j_6 + -1; D.520_38 = b[D.519_37]; if (D.523_36 != D.520_38) goto <L19>; else goto <L52>; ==> Evolution analyzer dumps for each case: lp64: (analyze_array (ref = b[D.714_51]; ) (analyze_scalar_evolution (loop_nb = 3) (scalar = D.714_51) (get_scalar_evolution (scalar = D.714_51) (scalar_evolution = (int8) {i_3, +, 1}_3 + -1)) (set_scalar_evolution (scalar = D.714_51) (scalar_evolution = (int8) {i_3, +, 1}_3 + -1)) ) (instantiate_parameters (loop_nb = 3) (chrec = (int8) {i_3, +, 1}_3 + -1) (analyze_scalar_evolution (loop_nb = 2) (scalar = i_3) (get_scalar_evolution (scalar = i_3) (scalar_evolution = {1, +, 1}_2)) (set_scalar_evolution (scalar = i_3) (scalar_evolution = {1, +, 1}_2)) ) (res = (int8) {{1, +, 1}_2, +, 1}_3 + -1)) ) lp32: (analyze_array (ref = b[D.835_47]; ) (analyze_scalar_evolution (loop_nb = 3) (scalar = D.835_47) (get_scalar_evolution (scalar = D.835_47) (scalar_evolution = {i_3 + -1, +, 1}_3)) (set_scalar_evolution (scalar = D.835_47) (scalar_evolution = {i_3 + -1, +, 1}_3)) ) (instantiate_parameters (loop_nb = 3) (chrec = {i_3 + -1, +, 1}_3) (analyze_scalar_evolution (loop_nb = 2) (scalar = i_3) (get_scalar_evolution (scalar = i_3) (scalar_evolution = {1, +, 1}_2)) (set_scalar_evolution (scalar = i_3) (scalar_evolution = {1, +, 1}_2)) ) (res = {{0, +, 1}_2, +, 1}_3)) )
Subject: Bug 18403 CVSROOT: /cvs/gcc Module name: gcc Changes by: spop@gcc.gnu.org 2005-06-07 19:51:26 Modified files: gcc : ChangeLog Makefile.in tree-chrec.c tree-chrec.h tree-flow.h tree-scalar-evolution.c tree-ssa-loop-ivopts.c tree-ssa-loop-niter.c tree-vrp.c gcc/testsuite/gcc.dg/vect: vect-77.c vect-78.c Log message: Fixes PR 18403 and meta PR 21861. * Makefile.in (tree-chrec.o): Depend on CFGLOOP_H and TREE_FLOW_H. * tree-chrec.c: Include cfgloop.h and tree-flow.h. (evolution_function_is_invariant_rec_p, evolution_function_is_invariant_p): New. (chrec_convert): Use an extra parameter AT_STMT for refining the information that is passed down to convert_step. Integrate the code that was in count_ev_in_wider_type. * tree-chrec.h (count_ev_in_wider_type): Removed. (chrec_convert): Modify its declaration. (evolution_function_is_invariant_p): Declared. (evolution_function_is_affine_p): Use evolution_function_is_invariant_p. * tree-flow.h (can_count_iv_in_wider_type): Renamed convert_step. (scev_probably_wraps_p): Declared. * tree-scalar-evolution.c (count_ev_in_wider_type): Removed. (follow_ssa_edge_in_rhs, interpret_rhs_modify_expr): Use an extra parameter AT_STMT for refining the information that is passed down to convert_step. (follow_ssa_edge_inner_loop_phi, follow_ssa_edge, analyze_scalar_evolution_1): Initialize AT_STMT with the current analyzed statement. (instantiate_parameters_1): Don't know yet how to initialize AT_STMT. * tree-ssa-loop-ivopts.c (idx_find_step): Update the use of can_count_iv_in_wider_type to use convert_step. * tree-ssa-loop-niter.c (can_count_iv_in_wider_type_bound): Move code that is independent of the loop over the known iteration bounds to convert_step_widening, the rest is moved to proved_non_wrapping_p. (scev_probably_wraps_p): New. (can_count_iv_in_wider_type): Renamed convert_step. * tree-vrp.c (adjust_range_with_scev): Take an extra AT_STMT parameter. Use scev_probably_wraps_p for computing init_is_max. (vrp_visit_assignment): Pass the current analyzed statement to adjust_range_with_scev. (execute_vrp): Call estimate_numbers_of_iterations for refining the information provided by scev analyzer. testsuite: * testsuite/gcc.dg/vect/vect-77.c: Remove xfail from lp64. * testsuite/gcc.dg/vect/vect-78.c: Same. Patches: http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/ChangeLog.diff?cvsroot=gcc&r1=2.9071&r2=2.9072 http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/Makefile.in.diff?cvsroot=gcc&r1=1.1500&r2=1.1501 http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/tree-chrec.c.diff?cvsroot=gcc&r1=2.19&r2=2.20 http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/tree-chrec.h.diff?cvsroot=gcc&r1=2.8&r2=2.9 http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/tree-flow.h.diff?cvsroot=gcc&r1=2.118&r2=2.119 http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/tree-scalar-evolution.c.diff?cvsroot=gcc&r1=2.27&r2=2.28 http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/tree-ssa-loop-ivopts.c.diff?cvsroot=gcc&r1=2.76&r2=2.77 http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/tree-ssa-loop-niter.c.diff?cvsroot=gcc&r1=2.28&r2=2.29 http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/tree-vrp.c.diff?cvsroot=gcc&r1=2.23&r2=2.24 http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/testsuite/gcc.dg/vect/vect-77.c.diff?cvsroot=gcc&r1=1.10&r2=1.11 http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/testsuite/gcc.dg/vect/vect-78.c.diff?cvsroot=gcc&r1=1.11&r2=1.12
Fixed in 4.1.0.