25918 – gcc.dg/vect/vect-reduc-dot-s16.c scan-tree-dump-times vectorized 1 loops 1 and gcc.dg/vect/vect-reduc-pattern-2.c scan-tree-dump-times vectorized 2 loops 1 fail

Bug 25918 - gcc.dg/vect/vect-reduc-dot-s16.c scan-tree-dump-times vectorized 1 loops 1 and gcc.dg/vect/vect-reduc-pattern-2.c scan-tree-dump-times vectorized 2 loops 1 fail

Summary: gcc.dg/vect/vect-reduc-dot-s16.c scan-tree-dump-times vectorized 1 loops 1 an...

Status:	RESOLVED FIXED

Alias:	None

Product:	gcc
Classification:	Unclassified
Component:	testsuite (show other bugs)
Version:	4.2.0

Importance:	P3 normal
Target Milestone:	4.2.0
Assignee:	Not yet assigned to anyone

URL:
Keywords:

Depends on:
Blocks:

Reported:	2006-01-23 00:53 UTC by Joseph S. Myers
Modified:	2006-02-16 17:43 UTC (History)
CC List:	4 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed:	2006-02-08 02:07:47

Attachments
Dump file for vect-reduc-dot-s16.c, -mlp64. (7.28 KB, text/plain) 2006-01-28 17:04 UTC, Joseph S. Myers	Details
Dump file for vect-reduc-pattern-2.c, -mlp64. (4.46 KB, text/plain) 2006-01-28 17:05 UTC, Joseph S. Myers	Details
[patch] for PR 25918 (autovect) (2.63 KB, patch) 2006-02-14 12:16 UTC, Victor Kaplansky	Details \| Diff
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Joseph S. Myers 2006-01-23 00:53:07 UTC

The following appeared on mainline on ia64-hp-hpux11.23 (both -milp32 and -mlp64) between 20060118 (revision 109876) and 20060121 (revision 110062), probably when these tests were added.

FAIL: gcc.dg/vect/vect-reduc-dot-s16.c scan-tree-dump-times vectorized 1 loops 1
FAIL: gcc.dg/vect/vect-reduc-pattern-2.c scan-tree-dump-times vectorized 2 loops 1

Comment 1 Dorit Naishlos 2006-01-26 09:07:43 UTC

Can you please send the dump files generated by -fdump-tree-vect-details?

reduc-dot-s16.c needs the sdot_prodv4hi pattern, which is implemented for ia64, so I'd expect one loop to be vectorized. I wonder what's the problem there.

In vect-reduc-pattern-2.c - does the vectorizer report vectorizing one loop? The one loop (that sums shorts into and int accumulator) needs the widen_ssumv4hi3 pattern to be vectorized, which is implemented for ia64. Does that loop get vectorized?
The second loop however (that sums chars into and int accumulator) cannot be vectorized on ia64 because the mode of the result of the widen_ssumv8qi3 pattern as implemented on ia64 in short, not int. If this is indeed the reason for the failure we'd probably want to introduce finer keywords to represent the available widening support (in target-supports.exp we currently have just a "vect_widen_sum" keyword, which does not distinguish between char-to-short summation and char-to-int summation).

Comment 2 Joseph S. Myers 2006-01-28 17:04:36 UTC

Created attachment 10747 [details]
Dump file for vect-reduc-dot-s16.c, -mlp64.

Comment 3 Joseph S. Myers 2006-01-28 17:05:13 UTC

Created attachment 10748 [details]
Dump file for vect-reduc-pattern-2.c, -mlp64.

Comment 4 Jim Wilson 2006-02-08 02:07:47 UTC

The vect-reduc-dot-s16.c testcase fails because two loops were vectorized, and we only expected one to be vectorized.

The problem here is that the loop in foo1 is not supposed to be recognized and vectorized as a dot-product loop.  And it isn't, as expected.  However, on IA-64, it is recognized and vectorized as a widen-sum loop.  This happens because the IA-64 port defines the widen_ssumv4hi3 pattern.  The IA-64 port is the only one that defines this pattern, and hence is probably the only port "broken" here.  All others will presumably fail to vectorize this loop.

So the IA-64 port emits to the -fdump-tree-vect-details file
tmp.c:23: note: pattern recognized: prod_14 w+ result_3
tmp.c:23: note: vectorized 1 loops in function.
tmp.c:38: note: pattern recognized:  DOT_PROD_EXPR < D.2127_14 , D.2125_10 ...
tmp.c:38: note: vectorized 1 loops in function.
and the testcase fails because we only expected 1 loop to be vectorized.

I think the only thing wrong here is that the dg-final tests in the testcase are not precise enough to handle this case.

Comment 5 Jim Wilson 2006-02-08 02:19:26 UTC

Dorit's analysis of vect-reduc-pattern-2.c in comment #1 is correct.  The loop that sums char into int is not vectorized, because the widen_ssumv8qi3 pattern has the wrong output mode (HI instead of SI).  Dorit's suggested solution looks reasonable to me.

Comment 6 Dorit Naishlos 2006-02-08 14:17:57 UTC

(In reply to comment #4)
> ... This happens
> because the IA-64 port defines the widen_ssumv4hi3 pattern.  The IA-64 port is
> the only one that defines this pattern, and hence is probably the only port
> "broken" here.  All others will presumably fail to vectorize this loop.

that's correct. it's actually a combination of being able to support widen_ssumv4hi3 and (non widening) multiplication of shorts. looks like we need to split these loops into separate testcases, and for this particular loop expect vectorization if vect_widen_sum and vect_short_mult (new keyword) are supported. 

> and the testcase fails because we only expected 1 loop to be vectorized.
> I think the only thing wrong here is that the dg-final tests in the testcase
> are not precise enough to handle this case.

indeed. Will take care of that.

Comment 7 Dorit Naishlos 2006-02-08 14:19:34 UTC

(In reply to comment #5)
Will take care of that.

Comment 8 Victor Kaplansky 2006-02-14 12:16:28 UTC

Created attachment 10847 [details]
[patch] for PR 25918 (autovect)

Comment 9 Victor Kaplansky 2006-02-14 12:20:08 UTC

Hello,

I've prepared a patch to the testsuite to make dg-final checks
more precise. The patch was tested on ppc64-yellowdog-linux.

I have no access to IA64 box. Could someone check this patch on ia64?
The patch can be found in "mainline.PR25918-2.patch.txt" attachment.

Thanks,
-- victor


2006-02-12  Victor Kaplansky  <victork@il.ibm.com>
        PR tree-opt/25918

        * lib/target-supports.exp
        (check_effective_target_vect_char_mult): New.
	  (check_effective_target_vect_short_mult): New.
        (check_effective_target_vect_widen_sum_qi_to_si): New.
        (check_effective_target_vect_widen_sum_qi_to_hi): New.
        (check_effective_target_vect_widen_sum_hi_to_si): New.
        * gcc.dg/vect/vect-reduc-dot-s16.c: Remove, split into
        vect-reduc-dot-s16a.c and vect-reduc-dot-s16b.c
        * vect-reduc-dot-s16a.c: New, split from vect-reduc-dot-s16.c.
        * vect-reduc-dot-s16b.c: New, split from vect-reduc-dot-s16.c.
        * gcc.dg/vect/vect-reduc-pattern-2.c: Remove, split into
        vect-reduc-pattern-2a.c, vect-reduc-pattern-2b.c and
        vect-reduc-pattern-2c.c
        * gcc.dg/vect/vect-reduc-pattern-1.c: Remove, split into
        vect-reduc-pattern-1a.c, vect-reduc-pattern-1b.c and
        vect-reduc-pattern-1c.c

Comment 10 Jim Wilson 2006-02-14 23:41:50 UTC

Subject: Re:  gcc.dg/vect/vect-reduc-dot-s16.c
	scan-tree-dump-times vectorized 1 loops 1 and
	gcc.dg/vect/vect-reduc-pattern-2.c scan-tree-dump-times vectorized 2 loops
	1 fail

On Tue, 2006-02-14 at 04:20, victork at il dot ibm dot com wrote:
> I have no access to IA64 box. Could someone check this patch on ia64?
> The patch can be found in "mainline.PR25918-2.patch.txt" attachment.

The vect-reduc-dot-s16a.c file has a bug.  It is missing a declaration
for the local variable dot.  I added a "int dot;" line to that test.

I did "make check-gcc RUNTESTFLAGS=vect.exp".  Without the patch, I get
two unexpected failures.  With the patch, I get no unexpected failures.
So the patch works fine for me on an ia64-linux system.

Comment 11 victork 2006-02-16 09:59:05 UTC

Subject: Bug 25918

Author: victork
Date: Thu Feb 16 09:59:00 2006
New Revision: 111135

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=111135
Log:
testsuite/
2006-02-16  Victor Kaplansky  <victork@il.ibm.com>

	PR tree-opt/25918
	* lib/target-supports.exp
	(check_effective_target_vect_short_mult): New.
	(check_effective_target_vect_char_mult): New.
	(check_effective_target_vect_widen_sum_qi_to_si): New.
	(check_effective_target_vect_widen_sum_qi_to_hi): New.
	(check_effective_target_vect_widen_sum_hi_to_si): New.
	* gcc.dg/vect/vect-reduc-dot-s16.c: Remove, split into
	vect-reduc-dot-s16a.c and vect-reduc-dot-s16b.c
	* vect-reduc-dot-s16a.c: New, split from vect-reduc-dot-s16.c.
	* vect-reduc-dot-s16b.c: New, split from vect-reduc-dot-s16.c.
	* gcc.dg/vect/vect-reduc-pattern-2.c: Remove, split into
	vect-reduc-pattern-2a.c, vect-reduc-pattern-2b.c and
	vect-reduc-pattern-2c.c
	* gcc.dg/vect/vect-reduc-pattern-1.c: Remove, split into
	vect-reduc-pattern-1a.c, vect-reduc-pattern-1b.c and
	vect-reduc-pattern-1c.c

Added:
    trunk/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-s16a.c
    trunk/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-s16b.c
    trunk/gcc/testsuite/gcc.dg/vect/vect-reduc-pattern-1a.c
    trunk/gcc/testsuite/gcc.dg/vect/vect-reduc-pattern-1b.c
    trunk/gcc/testsuite/gcc.dg/vect/vect-reduc-pattern-1c.c
    trunk/gcc/testsuite/gcc.dg/vect/vect-reduc-pattern-2a.c
    trunk/gcc/testsuite/gcc.dg/vect/vect-reduc-pattern-2b.c
    trunk/gcc/testsuite/gcc.dg/vect/vect-reduc-pattern-2c.c
    trunk/gcc/testsuite/gcc.dg/vect/wrapv-vect-reduc-pattern-2c.c
Removed:
    trunk/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-s16.c
    trunk/gcc/testsuite/gcc.dg/vect/vect-reduc-pattern-1.c
    trunk/gcc/testsuite/gcc.dg/vect/vect-reduc-pattern-2.c
    trunk/gcc/testsuite/gcc.dg/vect/wrapv-vect-reduc-pattern-2.c
Modified:
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/testsuite/lib/target-supports.exp

Comment 12 Andrew Pinski 2006-02-16 17:43:27 UTC

Fixed.