Hello!
This patch introduces the same approach using NARROW and WIDEN
modifier as already implemented in vectorizable_conversion() into
vectorizable_call() function. Using this modifier, gcc can vectorize
calls where (nunits_in == nunits_out / 2).
Attached patch uses this infrastructure to vectorize BUILT_IN_RINT
using cvtpd2dq sse insn. Also, this patch re-defines all 2-arg i386
builtins as const builtins (all builtins were checked that none of
them clobbers global memory).
Following testcase:
--cut here--
void foo(void)
{
int i;
for (i=0; i<256; ++i)
b[i] = lrint (a[i]);
}
--cut here--
generates (-O2 -msse3 -ffast-math -ftree-vectorize):
.L7:
cvtpd2dq a(%eax,%eax), %xmm0
cvtpd2dq a+16(%eax,%eax), %xmm1
punpcklqdq %xmm1, %xmm0
movdqa %xmm0, b(%eax)
addl $16, %eax
cmpl $1024, %eax
jne .L7
The patch was bootstrapped on i686-pc-linux-gnu, regression tested for
all default languages. This patch finally closes PR
tree-optimization/24659, as all conversions are now vectorized (on
SSEx targets).
OK for mainline (The patch needs approval for vectorizer part)?
2007-06-29 Uros Bizjak <ubizjak@gmail.com>
PR tree-optimization/24659
* tree-vect-transform.c (vectorizable_call): Handle
(nunits_in == nunits_out / 2) and (nunits_out == nunits_in / 2) cases.
* config/i386/sse.md (vec_pack_sfix_v2df): New expander.
* config/i386/i386.c (enum ix86_builtins) [IX86_BUILTIN_VEC_PACK_SFIX]:
New constant.
(struct bdesc_2arg) [__builtin_ia32_vec_pack_sfix]: New builtin
description.
(ix86_init_mmx_sse_builtins): Define all builtins with 2 arguments as
const using def_builtin_const.
(ix86_expand_binop_builtin): Remove bogus assert() that insn wants
input operands in the same modes as the result.
(ix86_builtin_vectorized_function): Handle BUILT_IN_LRINT.
testsuite/ChangeLog:
2007-06-29 Uros Bizjak <ubizjak@gmail.com>
PR tree-optimization/24659
* gcc.target/i386/vectorize2.c: New test.
* gcc.target/i386/sse2-lrint-vec.c: New runtime test.
* gcc.target/i386/sse2-lrintf-vec.c: Ditto.
Uros.