18403 – FAILs to vectorize testcases on ppc64-linux

Bug 18403 - FAILs to vectorize testcases on ppc64-linux

Summary: FAILs to vectorize testcases on ppc64-linux

Status:	RESOLVED FIXED

Alias:	None

Product:	gcc
Classification:	Unclassified
Component:	tree-optimization (show other bugs)
Version:	4.0.0

Importance:	P2 normal
Target Milestone:	4.1.0
Assignee:	Not yet assigned to anyone

URL:
Keywords:	ice-on-valid-code, missed-optimization

Duplicates (1):	18505 (view as bug list)
Depends on:
Blocks:	21861
	Show dependency tree / graph

Reported:	2004-11-09 15:50 UTC by Dorit Naishlos
Modified:	2023-12-31 19:10 UTC (History)
CC List:	5 users (show)

See Also:
Host:	powerpc64-suse-linux
Target:	powerpc64-suse-linux
Build:	powerpc64-suse-linux
Known to work:
Known to fail:
Last reconfirmed:	2004-11-15 02:34:40

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Dorit Naishlos 2004-11-09 15:50:01 UTC

We get the following failures on powerpc64-suse-linux:
FAIL: gcc.dg/vect/vect-46.c scan-tree-dump-times vectorized 1 loops 1
FAIL: gcc.dg/vect/vect-50.c scan-tree-dump-times vectorized 1 loops 1
FAIL: gcc.dg/vect/vect-52.c scan-tree-dump-times vectorized 1 loops 1
FAIL: gcc.dg/vect/vect-58.c scan-tree-dump-times vectorized 1 loops 1
FAIL: gcc.dg/vect/vect-60.c scan-tree-dump-times vectorized 1 loops 1
FAIL: gcc.dg/vect/vect-77.c scan-tree-dump-times vectorized 1 loops 1
FAIL: gcc.dg/vect/vect-77a.c scan-tree-dump-times vectorized 1 loops 1

The access function that the evolution analyzer returns for the pointers in 
these loops when the compiler is configured for 64bit (powerpc64-suse-linux)  
is more complicated than when configured for 32bit (powerpc-suse-linux):

When the compiler is configured for 32bit, the pointer arithmetic in the loop 
looks like:
  # i_1 = PHI <i_24(5), 0(3)>;
<L0>:;
  i.4_6 = (unsigned int) i_1;
  D.1588_7 = i.4_6 * 4;
  D.1589_8 = (afloat * restrict) D.1588_7;
  D.1591_15 = D.1589_8 + pb_14;
  ... = *D.1591_15;
  ...
  i_24 = i_1 + 1; 
  if (n_3 > i_24) goto <L9>; else goto <L10>;

the access function that is computed for the pointer is:
Access function of ptr: {pb_14, +, 4B}_1
which is simple enough, and the loop is vectorized.

On the other hand, when the compiler is configured for 64bit, the pointer 
arithmetic in the loop looks like:
  # i_1 = PHI <i_24(5), 0(3)>;
<L0>:;
  D.1816_6 = (long unsigned int) i_1;
  D.1817_7 = D.1816_6 * 4;
  D.1818_8 = (afloat * restrict) D.1817_7;
  D.1820_15 = D.1818_8 + pb_14;
  ... = *D.1820_15;
  ...
  i_24 = i_1 + 1; 
  if (n_3 > i_24) goto <L9>; else goto <L10>;

and in this case the access function that is computed for the pointer is:
Access function of ptr: (afloat * restrict) ((long unsigned int) {0, +, 1}_1 * 
4) + pb_14

The vectorizer does not handle such access-functions at the moment, and 
thereofore fails to vectorize the loop:
loop at vect-46.c:37: not vectorized: pointer access is not simple.
loop at vect-46.c:37: not vectorized: unhandled data ref: D.1821_16 = *D.1820_15
loop at vect-46.c:37: bad data references.

These loops should be marked xfail for now for ppc64-linux.

One of the following would allow vectorizing these loops:

- The evolution analyzer knows to ignore the cast to (unsigned int) when it 
builds the access function, but it doesn't ignore the cast to (long unsigned 
int). If this cast can be avoided when building the access function, it would 
be simple enough to handle later on

- Enhance the vectorizer to digest such access-functions

Comment 1 Andrew Pinski 2004-11-09 17:55:00 UTC

Confirmed.

Comment 2 Andrew Pinski 2004-11-15 02:34:39 UTC

At least on powerpc-darwin (with -m64) we now vectorize these loops but we ICE because we have:

pointer_type + int_type which is not valid and is even worse on 64bit targets as int is 32 bit so we try to 
move SI mode register into a DI mode register and it ICE in emit_move_insn because of this.

Comment 3 Dorit Naishlos 2004-11-15 18:53:15 UTC

(In reply to comment #2)
> At least on powerpc-darwin (with -m64) we now vectorize these loops but we 
ICE because we have:
> pointer_type + int_type which is not valid and is even worse on 64bit targets 
as int is 32 bit so we try to 
> move SI mode register into a DI mode register and it ICE in emit_move_insn 
because of this.

Yes. I'll have a patch for that shortly. This would take care of testcases vect-
[46,50,52,58,60].c. A separate problem is the testcases that don't get 
vectorized with -m64; these are vect-[77,77a,78].c and a newcomer - pr18425.c.

Comment 4 Andrew Pinski 2004-11-15 22:31:13 UTC

*** Bug 18505 has been marked as a duplicate of this bug. ***

Comment 5 Dorit Naishlos 2004-11-16 12:14:55 UTC

patch: http://gcc.gnu.org/ml/gcc-patches/2004-11/msg01301.html

Comment 6 Dorit Naishlos 2004-11-16 12:47:42 UTC

Testcases vect-[77,77a,78].c don't get vectorized with -m64 because the access 
function that the evolution analyzer returns for the pointers in these loops 
looks like the following:
ib_16 + (aint *) ((long unsigned int) {off_11, +, 1}_1 * 4)

Whereas with -m32 it looks like:
{ib_17 + (aint *) ((unsigned int) off.4_12 * 4), +, 4B}_1
(the vectorizer is able to extract the initial-condition and step when the 
access function is represented this way:
step: 4B
init: ib_17 + (aint *) ((unsigned int) off.4_12 * 4)).

These testcases should temporarily xfail when -m64 is used (or if the compiler 
is configured with powerpc64*).

Testcase pr18425.c can't be vectorized with -m64 because there's no vector 
support for 64bit elements. This testcase should also xfail (not temporarily) 
when -m64 is used or if the compiler is configured with powerpc64*.

Comment 7 janis187 2004-11-16 17:06:31 UTC

GCC for powerpc64-*-linux* could be any of the following: (a) a compiler that
generates only LP64 code; (b) a biarch compiler that generates ILP32 code by
default; or (c) a biarch compiler that generates LP64 code by default.  There's
currently no way to detect, in an xfail list, that a test is compiled for LP64
code on a powerpc64-*-linux* target.  My nightly bootstrap and test run uses
(b) and Jon Grimm's uses (c).

If a test is unsupported for LP64 then it can always be skipped by using
{ dg-require-effective-target ilp32 }.

Comment 8 Eric Botcazou 2004-11-20 18:26:21 UTC

> Testcase pr18425.c can't be vectorized with -m64 because there's no vector 
> support for 64bit elements. This testcase should also xfail (not temporarily) 
> when -m64 is used or if the compiler is configured with powerpc64*.

Same on SPARC 64-bit (i.e. sparc64-*-* or sparc-*-* with -m64).

Comment 9 GCC Commits 2004-11-23 09:19:42 UTC

Subject: Bug 18403

CVSROOT:	/cvs/gcc
Module name:	gcc
Changes by:	dorit@gcc.gnu.org	2004-11-23 09:19:25

Modified files:
	gcc            : ChangeLog tree-vectorizer.c 

Log message:
	PR tree-opt/18403
	PR tree-opt/18505
	* tree-vectorizer.c (vect_create_data_ref_ptr): Use
	lang_hooks.types.type_for_size instead of integer_type_node for the
	type of ptr_update.

Patches:
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/ChangeLog.diff?cvsroot=gcc&r1=2.6482&r2=2.6483
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/tree-vectorizer.c.diff?cvsroot=gcc&r1=2.39&r2=2.40

Comment 10 Dorit Naishlos 2004-11-23 19:48:26 UTC

just for the record - related comments from http://gcc.gnu.org/ml/gcc-
patches/2004-11/msg01394.html:
"
> > A question: how would you write a testcase that when compiled on
powerpc*
> > the dg-final check xfails for powerpc64* or if -m64 is used? (I want to
> > xfail testcase pr18425.c for 64bit, and temporarily also xfail
testcases
> > vect-[77,77a,78].c for 64bit - see PR18403).
>
> I'll try to come up with another
> solution, but in the meantime it's more important that those tests run
> on powerpc64-*-* for 32-bit code than to xfail them for 64-bit code.

So PR18403 will remain open for now with the remaining 64bit failures.

OK, thanks

dorit
"

Comment 11 Dorit Naishlos 2005-01-09 16:35:45 UTC

vect-[46,50,52,58,60] don't fail anymore, and vect-[77,78] xfail on vectorizing 
for lp62 targets, so I think we can classify this PR as missed-optimization 
only, or close it and open a new (missed-optimization) PR for vect-[77,78].c 
(for lp64 targets).

Comment 12 Dorit Naishlos 2005-03-31 12:58:51 UTC

Another testcase that exhibits a similar problem: vect-5.f90
(patch: http://gcc.gnu.org/ml/gcc-patches/2005-03/msg02840.html)

On powerpc64-linux (lp64) the second loop is not vectorized because the data-
references analysis in the vectorizer can't extract the evolution from the 
access_function returned by the evolution analyzer for the accesses to array b. 
In lp32 mode the access function we get from the evolution analyzer is simpler, 
and the loop gets vectorized.

==> Access function for each case:
lp64: (int8) {i_3, +, 1}_3 + -1
lp32: {i_3 + -1, +, 1}_3

==> Vectorizer dataref analysis report for each case:
vect-5.f90:3: note: Results of object analysis for: b

lp64:   base_address: &b
        offset: (<unnamed type>) ((int8) {i_3, +, 1}_3 * 4 + -4)
        step: 0
        base aligned 1

lp32:   base_address: &b
        offset: (<unnamed type>) (i_3 * 4 + -4)
        step: 4
        base aligned 1

==> Tree dump for each case:

lp64:
  # j_5 = PHI <i_3(15), j_55(18)>;
<L32>:;
  D.712_48 = (int8) j_5;
  D.713_49 = D.712_48 + 7;
  D.714_51 = D.712_48 + -1;
  D.715_52 = b[D.714_51];
  a[D.713_49] = D.715_52;
  j_55 = j_5 + 1;
  if (j_5 == D.688_44) goto <L46>; else goto <L47>;

lp32:
  # j_6 = PHI <i_4(21), j_40(25)>;
<L34>:;
  D.518_35 = j_6 + 7;
  D.523_36 = a[D.518_35];
  D.519_37 = j_6 + -1;
  D.520_38 = b[D.519_37];
  if (D.523_36 != D.520_38) goto <L19>; else goto <L52>;

==> Evolution analyzer dumps for each case:

lp64:

(analyze_array
  (ref = b[D.714_51];
) 
(analyze_scalar_evolution
  (loop_nb = 3)
  (scalar = D.714_51)
(get_scalar_evolution
  (scalar = D.714_51)
  (scalar_evolution = (int8) {i_3, +, 1}_3 + -1))
(set_scalar_evolution
  (scalar = D.714_51)
  (scalar_evolution = (int8) {i_3, +, 1}_3 + -1))
)
(instantiate_parameters
  (loop_nb = 3)
  (chrec = (int8) {i_3, +, 1}_3 + -1)
(analyze_scalar_evolution
  (loop_nb = 2)
  (scalar = i_3)
(get_scalar_evolution
  (scalar = i_3)
  (scalar_evolution = {1, +, 1}_2))
(set_scalar_evolution
  (scalar = i_3)
  (scalar_evolution = {1, +, 1}_2))
)
  (res = (int8) {{1, +, 1}_2, +, 1}_3 + -1))
)

lp32:

(analyze_array
  (ref = b[D.835_47];
)
(analyze_scalar_evolution
  (loop_nb = 3)
  (scalar = D.835_47)
(get_scalar_evolution
  (scalar = D.835_47)
  (scalar_evolution = {i_3 + -1, +, 1}_3))
(set_scalar_evolution
  (scalar = D.835_47)
  (scalar_evolution = {i_3 + -1, +, 1}_3))
)
(instantiate_parameters
  (loop_nb = 3)
  (chrec = {i_3 + -1, +, 1}_3)
(analyze_scalar_evolution
  (loop_nb = 2)
  (scalar = i_3)
(get_scalar_evolution
  (scalar = i_3)
  (scalar_evolution = {1, +, 1}_2))
(set_scalar_evolution
  (scalar = i_3)
  (scalar_evolution = {1, +, 1}_2))
)
  (res = {{0, +, 1}_2, +, 1}_3))
)

Comment 13 GCC Commits 2005-06-07 19:51:37 UTC

Subject: Bug 18403

CVSROOT:	/cvs/gcc
Module name:	gcc
Changes by:	spop@gcc.gnu.org	2005-06-07 19:51:26

Modified files:
	gcc            : ChangeLog Makefile.in tree-chrec.c tree-chrec.h 
	                 tree-flow.h tree-scalar-evolution.c 
	                 tree-ssa-loop-ivopts.c tree-ssa-loop-niter.c 
	                 tree-vrp.c 
	gcc/testsuite/gcc.dg/vect: vect-77.c vect-78.c 

Log message:
	Fixes PR 18403 and meta PR 21861.
	* Makefile.in (tree-chrec.o): Depend on CFGLOOP_H and TREE_FLOW_H.
	* tree-chrec.c: Include cfgloop.h and tree-flow.h.
	(evolution_function_is_invariant_rec_p,
	evolution_function_is_invariant_p): New.
	(chrec_convert): Use an extra parameter AT_STMT for refining the
	information that is passed down to convert_step.  Integrate the
	code that was in count_ev_in_wider_type.
	* tree-chrec.h (count_ev_in_wider_type): Removed.
	(chrec_convert): Modify its declaration.
	(evolution_function_is_invariant_p): Declared.
	(evolution_function_is_affine_p): Use evolution_function_is_invariant_p.
	* tree-flow.h (can_count_iv_in_wider_type): Renamed convert_step.
	(scev_probably_wraps_p): Declared.
	* tree-scalar-evolution.c (count_ev_in_wider_type): Removed.
	(follow_ssa_edge_in_rhs, interpret_rhs_modify_expr):
	Use an extra parameter AT_STMT for refining the information that is
	passed down to convert_step.
	(follow_ssa_edge_inner_loop_phi, follow_ssa_edge,
	analyze_scalar_evolution_1): Initialize AT_STMT with the current
	analyzed statement.
	(instantiate_parameters_1): Don't know yet how to initialize AT_STMT.
	* tree-ssa-loop-ivopts.c (idx_find_step): Update the use of
	can_count_iv_in_wider_type to use convert_step.
	* tree-ssa-loop-niter.c (can_count_iv_in_wider_type_bound): Move
	code that is independent of the loop over the known iteration
	bounds to convert_step_widening, the rest is moved to
	proved_non_wrapping_p.
	(scev_probably_wraps_p): New.
	(can_count_iv_in_wider_type): Renamed convert_step.
	* tree-vrp.c (adjust_range_with_scev): Take an extra AT_STMT parameter.
	Use scev_probably_wraps_p for computing init_is_max.
	(vrp_visit_assignment): Pass the current analyzed statement to
	adjust_range_with_scev.
	(execute_vrp): Call estimate_numbers_of_iterations for refining the
	information provided by scev analyzer.
	
	testsuite:
	
	* testsuite/gcc.dg/vect/vect-77.c: Remove xfail from lp64.
	* testsuite/gcc.dg/vect/vect-78.c: Same.

Patches:
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/ChangeLog.diff?cvsroot=gcc&r1=2.9071&r2=2.9072
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/Makefile.in.diff?cvsroot=gcc&r1=1.1500&r2=1.1501
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/tree-chrec.c.diff?cvsroot=gcc&r1=2.19&r2=2.20
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/tree-chrec.h.diff?cvsroot=gcc&r1=2.8&r2=2.9
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/tree-flow.h.diff?cvsroot=gcc&r1=2.118&r2=2.119
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/tree-scalar-evolution.c.diff?cvsroot=gcc&r1=2.27&r2=2.28
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/tree-ssa-loop-ivopts.c.diff?cvsroot=gcc&r1=2.76&r2=2.77
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/tree-ssa-loop-niter.c.diff?cvsroot=gcc&r1=2.28&r2=2.29
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/tree-vrp.c.diff?cvsroot=gcc&r1=2.23&r2=2.24
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/testsuite/gcc.dg/vect/vect-77.c.diff?cvsroot=gcc&r1=1.10&r2=1.11
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/testsuite/gcc.dg/vect/vect-78.c.diff?cvsroot=gcc&r1=1.11&r2=1.12

Comment 14 Andrew Pinski 2005-06-09 14:44:30 UTC

Fixed in 4.1.0.