This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |
Other format: | [Raw text] |
This patch adds support for handling unaligned loads in the vectorizer. The parts in tree-vectorizer.c are largely based on Ayal's implementation on apple-ppc branch. Passed bootstrap on ppc-darwin and i686-pc-linux, and SPEC on ppc-darwin. vect-27.c and vect-52.c now get vectorized, and a new testcase vect-72.c. It's not entirely complete, but complete enough that there's something working that I can commit to lno as a preview. I hope it can be pre-reviewed before I start working on porting it to mainline. Especially the parts I'm less confident about - the parts of the patch that deal with RTL expansion (expr.c, optabs.c, builtins.c), and the issues I bring up below: (1) The patch includes a little more than what is actually supported; i.e, you'll find a declaration of vec_realign_store_optab although misaligned stores are not supported yet, and a declaration of BUILT_IN_BUILD_CC_MASK_FOR_LOAD even though it's not implemented. I included these as an indication of how things are intended to look like and so I could get feedback on whether the overall picture conforms with the meaning of the new tree-codes and optabs we have discussed ( http://gcc.gnu.org/ml/gcc/2004-08/msg00317.html). These should probably not be included in the patch for mainline? (2) I did not systematically add consideration of the new ALIGN_INDIRECT_REF and MISALIGNED_INDIRECT_REF wherever INDIRECT_REF is considered; I only added the minimum that allowed me to build the compiler and pass the vectorizer testcases and SPEC. I'll complete this before I prepare the patch for mainline. Anything else to look for besides 'grep'ing for INDIRECT_REFs? (3) The function vect_create_data_ref used to return a "(*vp)[indx]" (vp was a pointer to an array of vectypes). Now it is renamed to vect_create_data_ref_ptr and returns "vp" pointing to the same address, computed as follows: vp = vp_init + (indx * vectype_size_in_bytes). The caller is responsible to create an (ALIGN/MISALIGNED_)INDIRECT_REF based on vp. This change triggered a failure in vect-6.c, which seems to be caused by a problem in DCE that has been fixed on mainline. I disabled it for now, and will verify that it passes when I port this patch to mainline. (4) In tree-pretty-print.c I have ALIGN_INDIRECT_REF represented as "A*" and MISALIGNED_INDIRECT_REF represented as "M*". e.g: vect_var_.61_65 = A*vect_p.62_64 vect_var_.61_65 = M*vect_p.62_64{misalignment=1} Maybe using a "*" like a regular INDIRECT_REF is better? (5) with respect to: > I'm also thinking that perhaps the "mask" versions should not use > an optab at all, but rather a builtin function. The reason here > is that the form of the mask differs between systems. For Altivec, > the mask is a complete 16-byte vector. For SPE, the mask is a CCmode > value. If we have this as a builtin, then the target can easily > influence the return type of the function, and thus the type of the > variable that we create for the loop. I implemented it by introducing a different generic builtin for each type of mask - BUILT_IN_BUILD_VECTOR_MASK_FOR_LOAD for a vector mask, which is implemented for altivec, and just as an example I also introduced BUILT_IN_BUILD_CC_MASK_FOR_LOAD, which may be implemented for SPE. More forms can be added. Which builtin form will be generated is determined by checking available target support (i.e, HAVE_build_vector_mask_for_load or HAVE_build_cc_mask_for_load). Is this a reasonable approach? An alternative way could be to introduce a single generic builtin whose return type is determined by the target, as follows (in builtins.def): #ifdef HAVE_build_vector_mask_for_load DEF_GCC_BUILTIN (BUILT_IN_BUILD_MASK, "build_mask_for_load", BT_FN_CHAR_VECTOR_PTR, ATTR_NULL) #else #if HAVE_build_cc_mask_for_load DEF_GCC_BUILTIN (BUILT_IN_BUILD_MASK, "build_mask_for_load", BT_FN_WORD_PTR, ATTR_NULL) #else .... ? (this one seems cleaner. I think I had a problem when I tried it, but I don't remember what it was...) (6) I hardcoded '16' in the declaration of BT_CHAR_VECTOR in builtin-types.def. I wanted to use UNITS_PER_SIMD_WORD instead, but that would require including default.h wherever DEF_PRIMITIVE_TYPE and DEF_FUNCTION_TYPE_1 are defined, so I wasn't sure about that. ? (7) altivec.md: addr_floor_v* basically does nothing (copies the input to the output). Is that ok, or is there a better way to represent that? (8) i386.md: addr_misaligned_ is implemented only for 16QI because sse2_movdqu expects only 16QI. As a result I have misalignment support on i386 only for chars. I know this is wrong, but I wanted to get something preliminary done quickly so I could get feedback on how to really do this for i386... ? Given this testcase: char ia[N]; char ib[N+1] = {....}; for (i = 1; i < N+1; i++){ ia[i-1] = ib[i]; } This is what is generated for altivec: addi r9,r1,193 li r0,8 neg r2,r9 mtctr r0 lvsr v12,0,r2 addi r11,r1,208 lvx v13,0,r9 li r2,0 addi r9,r1,64 L6: lvx v0,r2,r11 vperm v1,v13,v0,v12 vor v13,v0,v0 stvx v1,r2,r9 addi r2,r2,16 bdnz L6 Given the above testcase, this is what is generated for i386: movb %al, -280(%ebp,%eax) incl %eax cmpl $129, %eax jne .L4 leal -279(%ebp), %ebx xorl %ecx, %ecx leal -136(%ebp), %esi movl %ebx, %edx .L6: movl %esi, %eax incl %ecx movdqu (%edx), %xmm0 subl %ebx, %eax movdqa %xmm0, (%edx,%eax) addl $16, %edx cmpl $8, %ecx jne .L6 thanks, dorit Changelog.lno: * tree.def (ALIGN_INDIRECT_REF, MISALIGNED_INDIRECT_REF) (REALIGN_LOAD_EXPR, REALIGN_STORE_EXPR): New tree-codes. * tree.h (REF_ORIGINAL): Consider ALIGN_INDIRECT_REF and MISALIGNED_INDIRECT_REF. * alias.c (get_alias_set, nonoverlapping_memrefs_p): Likewise. * tree-gimple.c (get_base_address): Likewise. * tree-ssa-loop-ivopts.c (for_each_index, peel_address): Likewise. * tree-pretty-print.c (op_prio): Likewise. (dump_generic_node): Likewise + consider REALIGN_LOAD_EXPR. * tree-ssa-operands.c (get_expr_operands): Same. * expr.c (safe_from_p, expand_expr_real_1, rewrite_address_base) (find_interesting_uses_address): Consider ALIGN_INDIRECT_REF and MISALIGNED_INDIRECT_REF. (expand_expr_real_1): Consider REALIGN_LOAD_EXPR. * optabs.h (vec_realign_store_optab, vec_realign_load_optab) (addr_floor_optab, addr_misaligned_optab): New optabs. (OTI_vec_realign_store, OTI_vec_realign_load, OTI_addr_floor) (OTI_addr_misaligned): New optab_index values for the above new optabs. (expand_realign_op, expand_addr_floor_op, expand_addr_misaligned_op): Declaration for new functions. * optabs.c (optab_for_tree_code): Add new cases for the above new tree-codes. (expand_realign_op, expand_addr_floor_op, expand_addr_misaligned_op): New functions. (init_optabs): Init vec_realign_load_optab, addr_floor_optab and addr_misaligned_optab. * genopinit.c (optabs): Handle above new optabs. * builtin-types.def (BT_CHAR_VECTOR, BT_FN_CHAR_VECTOR_PTR): New types. * builtins.def (BUILT_IN_BUILD_VECTOR_MASK_FOR_LOAD): New builtin. (BUILT_IN_BUILD_CC_MASK_FOR_LOAD): New builtin. * builtins.c (expand_builtin_build_mask_for_load): New function. (exapnd_builtin): New cases for BUILT_IN_BUILD_VECTOR_MASK_FOR_LOAD and BUILT_IN_BUILD_CC_MASK_FOR_LOAD. * config/rs6000/altivec.md (build_vector_mask_for_load): New define_expand. (addr_floor_v4si, addr_floor_v4sf, addr_floor_v4hi, addr_floor_v16qi): New define_expand. (vec_realign_load_v4si, vec_realign_load_v4sf, vec_realign_load_v8hi) (vec_realign_load_v16qi): New define_insn. * config/i386/i386.md (addr_misaligned_v16qi): New define_expand. * tree-vectorizer.c (vect_create_data_ref): Renamed to vect_create_data_ref_ptr. Functionality changes reflected in the function documentation. (offset, initial_address, only_init): New arguments. (vect_ptr): New pointer to vectype rather than pointer to array of vectypes. (vectorizable_store): Call vect_create_data_ref_ptr with additional arguments, and create an indirect_ref with its return value data_ref. Check aligned_access_p. (vect_create_cond_for_align_checks): Call vect_create_data_ref_ptr with additional arguments. (vect_create_addr_base_for_vector_ref): Takes an additional argument - offset. Creates &(base[init_val+offset]) instead of &(base[init_val) if offst is provided. (vectorizable_load): Handle misaligned loads, using software-pipelined scheme with REALIGN_LOAD_EXPR and ALIGN_INDIRECT_REF if vec_realign_load_optab and addr_floor_optab are supported, or using regular scheme (without software-pipelining) with MISALIGNED_INDIRECT_REF if addr_misaligned_optab is supported. (BUILT_IN_build_mask_for_load): New variable. (vect_enhance_data_refs_alignment): Don't do versioning for misaligned loads/stores that can be vectorized. Call vectorizable_load/store and initialize STMT_VINFO_VECTYPE. (vect_analyze_data_refs_alignment): Don't fail vectorization in the presence of misaligned loads. (vect_create_addr_base_for_vector_ref, vect_create_data_ref) (vect_compute_data_ref_alignment): Call unshare_expr. (add_loop_guard_on_edge, vect_create_index_for_vector_ref) (vect_finish_stmt_generation, vect_compute_data_refs_alignment) (vect_analyze_data_ref_access): Minor editting fixes (don't overflow 80 columns). testsuite Changelog.lno: * gcc.dg/vect/vect-72.c: New test. * gcc.dg/vect/vect-27.c: Now vectorized on ppc*. * gcc.dg/vect/vect-52.c: Now vectorized on ppc*. * gcc.dg/vect/vect-6.c: Temporarily changed from run to compile * gcc.dg/vect/vect-26.c: Use sse2 instead of sse. * gcc.dg/vect/vect-27.c: Use sse2 instead of sse. * gcc.dg/vect/vect-28.c: Use sse2 instead of sse. * gcc.dg/vect/vect-29.c: Use sse2 instead of sse. * gcc.dg/vect/vect-4?.c: Use sse2 instead of sse. * gcc.dg/vect/vect-5?.c: Use sse2 instead of sse. * gcc.dg/vect/vect-60.c: Use sse2 instead of sse. * gcc.dg/vect/vect-61.c: Use sse2 instead of sse. (See attached file: patch.Sept10)
Attachment:
patch.Sept10
Description: Binary data
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |