Bug 35252 - No vectorization for complex arrays
Summary: No vectorization for complex arrays
Status: RESOLVED DUPLICATE of bug 30211
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 4.4.0
: P3 enhancement
Target Milestone: ---
Assignee: victork
URL:
Keywords: missed-optimization
Depends on:
Blocks: 31485 36099
  Show dependency treegraph
 
Reported: 2008-02-19 10:20 UTC by Uroš Bizjak
Modified: 2008-08-05 08:16 UTC (History)
4 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2008-03-12 06:05:04


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Uroš Bizjak 2008-02-19 10:20:08 UTC
This testcase produces unoptimal code:

_Complex float af[16], bf[16], cf[16];
_Complex double ad[16], bd[16], cd[16];

void testf(void)
{
  int i;

  for (i = 0; i < 16; i++)
    cf[i] = af[i] * bf[i];
}

void testd(void)
{
  int i;

  for (i = 0; i < 16; i++)
    cd[i] = ad[i] + bd[i];
}

gcc -O2 -ftree-vectorize -msse2:

testd:
        xorl    %eax, %eax
        .p2align 4,,7
        .p2align 3
.L7:
        movsd   ad+8(%eax), %xmm1
        movsd   ad(%eax), %xmm0
        addsd   bd+8(%eax), %xmm1
        addsd   bd(%eax), %xmm0
        movsd   %xmm1, cd+8(%eax)
        movsd   %xmm0, cd(%eax)
        addl    $16, %eax
        cmpl    $256, %eax
        jne     .L7
        rep
        ret

And with -ffast-math:

testf:
        xorl    %eax, %eax
        .p2align 4,,7
        .p2align 3
.L2:
        movss   bf(,%eax,8), %xmm2
        movss   bf+4(,%eax,8), %xmm3
        movss   af(,%eax,8), %xmm5
        movss   af+4(,%eax,8), %xmm4
        movaps  %xmm2, %xmm0
        movaps  %xmm3, %xmm1
        mulss   %xmm5, %xmm0
        mulss   %xmm4, %xmm1
        mulss   %xmm4, %xmm2
        mulss   %xmm5, %xmm3
        subss   %xmm1, %xmm0
        addss   %xmm3, %xmm2
        movss   %xmm0, cf(,%eax,8)
        movss   %xmm2, cf+4(,%eax,8)
        addl    $1, %eax
        cmpl    $16, %eax
        jne     .L2
        rep
        ret

Note, that we can use SSE3 addsubps insn in the later case.
Comment 1 victork 2008-03-12 06:05:04 UTC
We don't recognize REALPART_EXPR and IMAGPART_EXPR in vectorizer.

These should be recognized as load operations:
  CR.39_21 = REALPART_EXPR <ad[i_17]>;
  CI.40_22 = IMAGPART_EXPR <ad[i_17]>;
  CR.41_23 = REALPART_EXPR <bd[i_17]>;
  CI.42_24 = IMAGPART_EXPR <bd[i_17]>;

These should be recognized as store operations:
  REALPART_EXPR <cd[i_17]> = CR.43_25;
  IMAGPART_EXPR <cd[i_17]> = CI.44_26;
Comment 2 victork 2008-07-27 21:45:16 UTC
Subject: Bug 35252

Author: victork
Date: Sun Jul 27 21:44:25 2008
New Revision: 138198

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=138198
Log:
2008-07-27  Victor Kaplansky  <victork@il.ibm.com>

        PR tree-optimization/35252
        * tree-vect-analyze.c (vect_build_slp_tree): Make IMAGPART_EXPR and
        REALPART_EXPR to be considered as same load operation.

testsuite

        PR tree-optimization/35252
        * gcc.dg/vect/vect-complex-1.c, gcc.dg/vect/vect-complex-2.c,
        gcc.dg/vect/fast-math-vect-complex-3.c,
        gcc.dg/vect/vect-complex-4.c: New tests.


Added:
    trunk/gcc/testsuite/gcc.dg/vect/fast-math-vect-complex-3.c
    trunk/gcc/testsuite/gcc.dg/vect/vect-complex-1.c
    trunk/gcc/testsuite/gcc.dg/vect/vect-complex-2.c
    trunk/gcc/testsuite/gcc.dg/vect/vect-complex-4.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/tree-vect-analyze.c

Comment 3 victork 2008-07-29 21:54:15 UTC
Revision 138198 fixes vectorization of addition of complex numbers, while vectorization complex multiplication works on PowerPC and on x86 is a known issue
- see pr30211.

I'm closing this bugzilla as duplicate of PR30211.

*** This bug has been marked as a duplicate of 30211 ***
Comment 4 Richard Biener 2008-08-02 12:06:56 UTC
Subject: Bug 35252

Author: rguenth
Date: Sat Aug  2 12:05:47 2008
New Revision: 138553

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=138553
Log:
2008-08-02  Richard Guenther  <rguenther@suse.de>

	PR target/35252
	* config/i386/sse.md (SSEMODE4S, SSEMODE2D): New mode iterators.
	(ssedoublesizemode): New mode attribute.
	(sse_shufps): Call gen_sse_shufps_v4sf.
	(sse_shufps_1): Macroize.
	(sse2_shufpd): Call gen_Sse_shufpd_v2df.
	(sse2_shufpd_1): Macroize.
	(vec_extract_odd, vec_extract_even): New expanders.
	(vec_interleave_highv4sf, vec_interleave_lowv4sf,
	vec_interleave_highv2df, vec_interleave_lowv2df): Likewise.
	* i386.c (ix86_expand_vector_init_one_nonzero): Call
	gen_sse_shufps_v4sf instead of gen_sse_shufps_1.
	(ix86_expand_vector_set): Likewise.
	(ix86_expand_reduc_v4sf): Likewise.

	* lib/target-supports.exp (vect_extract_even_odd_wide) Add.
	(vect_strided_wide): Likewise.
	* gcc.dg/vect/fast-math-pr35982.c: Enable for
	vect_extract_even_odd_wide.
	* gcc.dg/vect/fast-math-vect-complex-3.c: Likewise.
	* gcc.dg/vect/vect-1.c: Likewise.
	* gcc.dg/vect/vect-107.c: Likewise.
	* gcc.dg/vect/vect-98.c: Likewise.
	* gcc.dg/vect/vect-strided-float.c: Likewise.
	* gcc.dg/vect/slp-11.c: Enable for vect_strided_wide.
	* gcc.dg/vect/slp-12a.c: Likewise.
	* gcc.dg/vect/slp-12b.c: Likewise.
	* gcc.dg/vect/slp-19.c: Likewise.
	* gcc.dg/vect/slp-23.c: Likewise.
	* gcc.dg/vect/slp-5.c: Likewise.

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/config/i386/i386.c
    trunk/gcc/config/i386/sse.md
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/testsuite/gcc.dg/vect/fast-math-pr35982.c
    trunk/gcc/testsuite/gcc.dg/vect/fast-math-vect-complex-3.c
    trunk/gcc/testsuite/gcc.dg/vect/slp-11.c
    trunk/gcc/testsuite/gcc.dg/vect/slp-12a.c
    trunk/gcc/testsuite/gcc.dg/vect/slp-12b.c
    trunk/gcc/testsuite/gcc.dg/vect/slp-19.c
    trunk/gcc/testsuite/gcc.dg/vect/slp-23.c
    trunk/gcc/testsuite/gcc.dg/vect/slp-5.c
    trunk/gcc/testsuite/gcc.dg/vect/vect-1.c
    trunk/gcc/testsuite/gcc.dg/vect/vect-107.c
    trunk/gcc/testsuite/gcc.dg/vect/vect-98.c
    trunk/gcc/testsuite/gcc.dg/vect/vect-strided-float.c
    trunk/gcc/testsuite/lib/target-supports.exp

Comment 5 Uroš Bizjak 2008-08-04 11:43:54 UTC
Hm, following testcase doesn't vectorize due to vect cost model
(-O2 -msse3 -ftree-vectorize -ffast-math) on i686 target:

--cut here--
void testf(void)
{
  int i;

  for (i = 0; i < 16; i++)
    cf[i] = af[i] + bf[i];
}
--cut here--


Compilation reports:

pr30211.c:8: note: vectorization_factor = 2, niters = 16
pr30211.c:8: note: === vect_update_slp_costs_according_to_vf ===
pr30211.c:8: note: cost model: vector iteration cost = 16 is divisible by scalar iteration cost = 8 by a factor greater than or equal to the vectorization factor = 2 .
pr30211.c:8: note: not vectorized: vectorization not profitable.
pr30211.c:8: note: not vectorized: vector version will never be profitable.

However, without cost model the loop in this testcase compiles to:

.L2:
        movaps  bf(%eax), %xmm0
        addps   af(%eax), %xmm0
        movaps  %xmm0, cf(%eax)
        addl    $16, %eax
        cmpl    $128, %eax
        jne     .L2

which is IMO faster than equivalent scalar version:

.L2:
        movss   bf+4(,%eax,8), %xmm1
        addss   af+4(,%eax,8), %xmm1
        movss   bf(,%eax,8), %xmm0
        addss   af(,%eax,8), %xmm0
        movss   %xmm0, cf(,%eax,8)
        movss   %xmm1, cf+4(,%eax,8)
        addl    $1, %eax
        cmpl    $16, %eax
        jne     .L2
Comment 6 victork 2008-08-05 08:16:42 UTC
> Hm, following testcase doesn't vectorize due to vect cost model
> (-O2 -msse3 -ftree-vectorize -ffast-math) on i686 target:

The problem is that we count some costs twice - as being vectorized by SLP and non-SLP. I'm going to submit a patch to fix this.