This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.
| Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
|---|---|---|
| Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |
| Other format: | [Raw text] | |
This patch adds support for handling unaligned loads in the vectorizer.
The parts in tree-vectorizer.c are largely based on Ayal's implementation
on apple-ppc branch.
Passed bootstrap on ppc-darwin and i686-pc-linux, and SPEC on ppc-darwin.
vect-27.c and vect-52.c now get vectorized, and a new testcase vect-72.c.
It's not entirely complete, but complete enough that there's something
working that I can commit to lno as a preview. I hope it can be
pre-reviewed before I start working on porting it to mainline. Especially
the parts I'm less confident about - the parts of the patch that deal with
RTL expansion (expr.c, optabs.c, builtins.c), and the issues I bring up
below:
(1) The patch includes a little more than what is actually supported; i.e,
you'll find a declaration of vec_realign_store_optab although misaligned
stores are not supported yet, and a declaration of
BUILT_IN_BUILD_CC_MASK_FOR_LOAD even though it's not implemented. I
included these as an indication of how things are intended to look like and
so I could get feedback on whether the overall picture conforms with the
meaning of the new tree-codes and optabs we have discussed (
http://gcc.gnu.org/ml/gcc/2004-08/msg00317.html). These should probably not
be included in the patch for mainline?
(2) I did not systematically add consideration of the new
ALIGN_INDIRECT_REF and MISALIGNED_INDIRECT_REF wherever INDIRECT_REF is
considered; I only added the minimum that allowed me to build the compiler
and pass the vectorizer testcases and SPEC. I'll complete this before I
prepare the patch for mainline. Anything else to look for besides 'grep'ing
for INDIRECT_REFs?
(3) The function vect_create_data_ref used to return a "(*vp)[indx]" (vp
was a pointer to an array of vectypes). Now it is renamed to
vect_create_data_ref_ptr and returns "vp" pointing to the same address,
computed as follows:
vp = vp_init + (indx * vectype_size_in_bytes). The caller is responsible to
create an (ALIGN/MISALIGNED_)INDIRECT_REF based on vp.
This change triggered a failure in vect-6.c, which seems to be caused by a
problem in DCE that has been fixed on mainline. I disabled it for now, and
will verify that it passes when I port this patch to mainline.
(4) In tree-pretty-print.c I have ALIGN_INDIRECT_REF represented as "A*"
and MISALIGNED_INDIRECT_REF represented as "M*". e.g:
vect_var_.61_65 = A*vect_p.62_64
vect_var_.61_65 = M*vect_p.62_64{misalignment=1}
Maybe using a "*" like a regular INDIRECT_REF is better?
(5) with respect to:
> I'm also thinking that perhaps the "mask" versions should not use
> an optab at all, but rather a builtin function. The reason here
> is that the form of the mask differs between systems. For Altivec,
> the mask is a complete 16-byte vector. For SPE, the mask is a CCmode
> value. If we have this as a builtin, then the target can easily
> influence the return type of the function, and thus the type of the
> variable that we create for the loop.
I implemented it by introducing a different generic builtin for each type
of mask - BUILT_IN_BUILD_VECTOR_MASK_FOR_LOAD for a vector mask, which is
implemented for altivec, and just as an example I also introduced
BUILT_IN_BUILD_CC_MASK_FOR_LOAD, which may be implemented for SPE. More
forms can be added. Which builtin form will be generated is determined by
checking available target support (i.e, HAVE_build_vector_mask_for_load or
HAVE_build_cc_mask_for_load). Is this a reasonable approach?
An alternative way could be to introduce a single generic builtin whose
return type is determined by the target, as follows (in builtins.def):
#ifdef HAVE_build_vector_mask_for_load
DEF_GCC_BUILTIN (BUILT_IN_BUILD_MASK, "build_mask_for_load",
BT_FN_CHAR_VECTOR_PTR, ATTR_NULL)
#else
#if HAVE_build_cc_mask_for_load
DEF_GCC_BUILTIN (BUILT_IN_BUILD_MASK, "build_mask_for_load",
BT_FN_WORD_PTR, ATTR_NULL)
#else
....
? (this one seems cleaner. I think I had a problem when I tried it, but I
don't remember what it was...)
(6) I hardcoded '16' in the declaration of BT_CHAR_VECTOR in
builtin-types.def. I wanted to use UNITS_PER_SIMD_WORD instead, but that
would require including default.h wherever DEF_PRIMITIVE_TYPE and
DEF_FUNCTION_TYPE_1 are defined, so I wasn't sure about that. ?
(7) altivec.md: addr_floor_v* basically does nothing (copies the input to
the output). Is that ok, or is there a better way to represent that?
(8) i386.md: addr_misaligned_ is implemented only for 16QI because
sse2_movdqu expects only 16QI. As a result I have misalignment support on
i386 only for chars. I know this is wrong, but I wanted to get something
preliminary done quickly so I could get feedback on how to really do this
for i386... ?
Given this testcase:
char ia[N];
char ib[N+1] = {....};
for (i = 1; i < N+1; i++){
ia[i-1] = ib[i];
}
This is what is generated for altivec:
addi r9,r1,193
li r0,8
neg r2,r9
mtctr r0
lvsr v12,0,r2
addi r11,r1,208
lvx v13,0,r9
li r2,0
addi r9,r1,64
L6:
lvx v0,r2,r11
vperm v1,v13,v0,v12
vor v13,v0,v0
stvx v1,r2,r9
addi r2,r2,16
bdnz L6
Given the above testcase, this is what is generated for i386:
movb %al, -280(%ebp,%eax)
incl %eax
cmpl $129, %eax
jne .L4
leal -279(%ebp), %ebx
xorl %ecx, %ecx
leal -136(%ebp), %esi
movl %ebx, %edx
.L6:
movl %esi, %eax
incl %ecx
movdqu (%edx), %xmm0
subl %ebx, %eax
movdqa %xmm0, (%edx,%eax)
addl $16, %edx
cmpl $8, %ecx
jne .L6
thanks,
dorit
Changelog.lno:
* tree.def (ALIGN_INDIRECT_REF, MISALIGNED_INDIRECT_REF)
(REALIGN_LOAD_EXPR, REALIGN_STORE_EXPR): New tree-codes.
* tree.h (REF_ORIGINAL): Consider ALIGN_INDIRECT_REF and
MISALIGNED_INDIRECT_REF.
* alias.c (get_alias_set, nonoverlapping_memrefs_p): Likewise.
* tree-gimple.c (get_base_address): Likewise.
* tree-ssa-loop-ivopts.c (for_each_index, peel_address): Likewise.
* tree-pretty-print.c (op_prio): Likewise.
(dump_generic_node): Likewise + consider REALIGN_LOAD_EXPR.
* tree-ssa-operands.c (get_expr_operands): Same.
* expr.c (safe_from_p, expand_expr_real_1, rewrite_address_base)
(find_interesting_uses_address): Consider
ALIGN_INDIRECT_REF and MISALIGNED_INDIRECT_REF.
(expand_expr_real_1): Consider REALIGN_LOAD_EXPR.
* optabs.h (vec_realign_store_optab, vec_realign_load_optab)
(addr_floor_optab, addr_misaligned_optab): New optabs.
(OTI_vec_realign_store, OTI_vec_realign_load, OTI_addr_floor)
(OTI_addr_misaligned): New optab_index values for the above new
optabs.
(expand_realign_op, expand_addr_floor_op,
expand_addr_misaligned_op):
Declaration for new functions.
* optabs.c (optab_for_tree_code): Add new cases for the above
new tree-codes.
(expand_realign_op, expand_addr_floor_op,
expand_addr_misaligned_op):
New functions.
(init_optabs): Init vec_realign_load_optab, addr_floor_optab and
addr_misaligned_optab.
* genopinit.c (optabs): Handle above new optabs.
* builtin-types.def (BT_CHAR_VECTOR, BT_FN_CHAR_VECTOR_PTR): New
types.
* builtins.def (BUILT_IN_BUILD_VECTOR_MASK_FOR_LOAD): New builtin.
(BUILT_IN_BUILD_CC_MASK_FOR_LOAD): New builtin.
* builtins.c (expand_builtin_build_mask_for_load): New function.
(exapnd_builtin): New cases for BUILT_IN_BUILD_VECTOR_MASK_FOR_LOAD
and BUILT_IN_BUILD_CC_MASK_FOR_LOAD.
* config/rs6000/altivec.md (build_vector_mask_for_load): New
define_expand.
(addr_floor_v4si, addr_floor_v4sf, addr_floor_v4hi,
addr_floor_v16qi):
New define_expand.
(vec_realign_load_v4si, vec_realign_load_v4sf,
vec_realign_load_v8hi)
(vec_realign_load_v16qi): New define_insn.
* config/i386/i386.md (addr_misaligned_v16qi): New define_expand.
* tree-vectorizer.c (vect_create_data_ref): Renamed to
vect_create_data_ref_ptr. Functionality changes reflected in the
function documentation.
(offset, initial_address, only_init): New arguments.
(vect_ptr): New pointer to vectype rather than pointer to array of
vectypes.
(vectorizable_store): Call vect_create_data_ref_ptr with additional
arguments, and create an indirect_ref with its return value
data_ref.
Check aligned_access_p.
(vect_create_cond_for_align_checks): Call vect_create_data_ref_ptr
with
additional arguments.
(vect_create_addr_base_for_vector_ref): Takes an additional
argument -
offset. Creates &(base[init_val+offset]) instead of
&(base[init_val) if
offst is provided.
(vectorizable_load): Handle misaligned loads, using
software-pipelined
scheme with REALIGN_LOAD_EXPR and ALIGN_INDIRECT_REF if
vec_realign_load_optab and addr_floor_optab are supported, or using
regular scheme (without software-pipelining) with
MISALIGNED_INDIRECT_REF if addr_misaligned_optab is supported.
(BUILT_IN_build_mask_for_load): New variable.
(vect_enhance_data_refs_alignment): Don't do versioning for
misaligned
loads/stores that can be vectorized. Call vectorizable_load/store
and
initialize STMT_VINFO_VECTYPE.
(vect_analyze_data_refs_alignment): Don't fail vectorization in the
presence of misaligned loads.
(vect_create_addr_base_for_vector_ref, vect_create_data_ref)
(vect_compute_data_ref_alignment): Call unshare_expr.
(add_loop_guard_on_edge, vect_create_index_for_vector_ref)
(vect_finish_stmt_generation, vect_compute_data_refs_alignment)
(vect_analyze_data_ref_access): Minor editting fixes (don't
overflow 80
columns).
testsuite Changelog.lno:
* gcc.dg/vect/vect-72.c: New test.
* gcc.dg/vect/vect-27.c: Now vectorized on ppc*.
* gcc.dg/vect/vect-52.c: Now vectorized on ppc*.
* gcc.dg/vect/vect-6.c: Temporarily changed from run to compile
* gcc.dg/vect/vect-26.c: Use sse2 instead of sse.
* gcc.dg/vect/vect-27.c: Use sse2 instead of sse.
* gcc.dg/vect/vect-28.c: Use sse2 instead of sse.
* gcc.dg/vect/vect-29.c: Use sse2 instead of sse.
* gcc.dg/vect/vect-4?.c: Use sse2 instead of sse.
* gcc.dg/vect/vect-5?.c: Use sse2 instead of sse.
* gcc.dg/vect/vect-60.c: Use sse2 instead of sse.
* gcc.dg/vect/vect-61.c: Use sse2 instead of sse.
(See attached file: patch.Sept10)
Attachment:
patch.Sept10
Description: Binary data
| Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
|---|---|---|
| Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |