This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[lno] [patch] misaligned loads support


This patch adds support for handling unaligned loads in the vectorizer.
The parts in tree-vectorizer.c are largely based on Ayal's implementation
on apple-ppc branch.
Passed bootstrap on ppc-darwin and i686-pc-linux, and SPEC on ppc-darwin.
vect-27.c and vect-52.c now get vectorized, and a new testcase vect-72.c.

It's not entirely complete, but complete enough that there's something
working that I can commit to lno as a preview. I hope it can be
pre-reviewed before I start working on porting it to mainline. Especially
the parts I'm less confident about - the parts of the patch that deal with
RTL expansion (expr.c, optabs.c, builtins.c), and the issues I bring up
below:

(1) The patch includes a little more than what is actually supported; i.e,
you'll find a declaration of vec_realign_store_optab although misaligned
stores are not supported yet, and a declaration of
BUILT_IN_BUILD_CC_MASK_FOR_LOAD even though it's not implemented. I
included these as an indication of how things are intended to look like and
so I could get feedback on whether the overall picture conforms with the
meaning of the new tree-codes and optabs we have discussed (
http://gcc.gnu.org/ml/gcc/2004-08/msg00317.html). These should probably not
be included in the patch for mainline?

(2) I did not systematically add consideration of the new
ALIGN_INDIRECT_REF and MISALIGNED_INDIRECT_REF wherever INDIRECT_REF is
considered; I only added the minimum that allowed me to build the compiler
and pass the vectorizer testcases and SPEC. I'll complete this before I
prepare the patch for mainline. Anything else to look for besides 'grep'ing
for INDIRECT_REFs?

(3) The function vect_create_data_ref used to return a "(*vp)[indx]" (vp
was a pointer to an array of vectypes). Now it is renamed to
vect_create_data_ref_ptr and returns "vp" pointing to the same address,
computed as follows:
vp = vp_init + (indx * vectype_size_in_bytes). The caller is responsible to
create an (ALIGN/MISALIGNED_)INDIRECT_REF based on vp.
This change triggered a failure in vect-6.c, which seems to be caused by a
problem in DCE that has been fixed on mainline. I disabled it for now, and
will verify that it passes when I port this patch to mainline.

(4) In tree-pretty-print.c I have ALIGN_INDIRECT_REF represented as "A*"
and MISALIGNED_INDIRECT_REF represented as "M*". e.g:
vect_var_.61_65 = A*vect_p.62_64
vect_var_.61_65 = M*vect_p.62_64{misalignment=1}
Maybe using a "*" like a regular INDIRECT_REF is better?

(5) with respect to:
> I'm also thinking that perhaps the "mask" versions should not use
> an optab at all, but rather a builtin function.  The reason here
> is that the form of the mask differs between systems.  For Altivec,
> the mask is a complete 16-byte vector.  For SPE, the mask is a CCmode
> value.  If we have this as a builtin, then the target can easily
> influence the return type of the function, and thus the type of the
> variable that we create for the loop.

I implemented it by introducing a different generic builtin for each type
of mask - BUILT_IN_BUILD_VECTOR_MASK_FOR_LOAD for a vector mask, which is
implemented for altivec, and just as an example I also introduced
BUILT_IN_BUILD_CC_MASK_FOR_LOAD, which may be implemented for SPE. More
forms can be added. Which builtin form will be generated is determined by
checking available target support (i.e, HAVE_build_vector_mask_for_load or
HAVE_build_cc_mask_for_load). Is this a reasonable approach?
An alternative way could be to introduce a single generic builtin whose
return type is determined by the target, as follows (in builtins.def):
#ifdef HAVE_build_vector_mask_for_load
DEF_GCC_BUILTIN  (BUILT_IN_BUILD_MASK, "build_mask_for_load",
BT_FN_CHAR_VECTOR_PTR, ATTR_NULL)
#else
#if HAVE_build_cc_mask_for_load
DEF_GCC_BUILTIN  (BUILT_IN_BUILD_MASK, "build_mask_for_load",
BT_FN_WORD_PTR, ATTR_NULL)
#else
....
? (this one seems cleaner. I think I had a problem when I tried it, but I
don't remember what it was...)

(6) I hardcoded '16' in the declaration of BT_CHAR_VECTOR in
builtin-types.def. I wanted to use UNITS_PER_SIMD_WORD instead, but that
would require including default.h wherever DEF_PRIMITIVE_TYPE and
DEF_FUNCTION_TYPE_1 are defined, so I wasn't sure about that. ?

(7) altivec.md: addr_floor_v* basically does nothing (copies the input to
the output). Is that ok, or is there a better way to represent that?

(8) i386.md: addr_misaligned_ is implemented only for 16QI because
sse2_movdqu expects only 16QI. As a result I have misalignment support on
i386 only for chars. I know this is wrong, but I wanted to get something
preliminary done quickly so I could get feedback on how to really do this
for i386... ?

Given this testcase:
  char ia[N];
  char ib[N+1] = {....};
  for (i = 1; i < N+1; i++){
      ia[i-1] = ib[i];
    }

This is what is generated for altivec:
        addi r9,r1,193
        li r0,8
        neg r2,r9
        mtctr r0
        lvsr v12,0,r2
        addi r11,r1,208
        lvx v13,0,r9
        li r2,0
        addi r9,r1,64
L6:
        lvx v0,r2,r11
        vperm v1,v13,v0,v12
        vor v13,v0,v0
        stvx v1,r2,r9
        addi r2,r2,16
        bdnz L6

Given the above testcase, this is what is generated for i386:
        movb    %al, -280(%ebp,%eax)
        incl    %eax
        cmpl    $129, %eax
        jne     .L4
        leal    -279(%ebp), %ebx
        xorl    %ecx, %ecx
        leal    -136(%ebp), %esi
        movl    %ebx, %edx
.L6:
        movl    %esi, %eax
        incl    %ecx
        movdqu  (%edx), %xmm0
        subl    %ebx, %eax
        movdqa  %xmm0, (%edx,%eax)
        addl    $16, %edx
        cmpl    $8, %ecx
        jne     .L6

thanks,

dorit

Changelog.lno:

        * tree.def (ALIGN_INDIRECT_REF, MISALIGNED_INDIRECT_REF)
        (REALIGN_LOAD_EXPR, REALIGN_STORE_EXPR): New tree-codes.
        * tree.h (REF_ORIGINAL): Consider ALIGN_INDIRECT_REF and
        MISALIGNED_INDIRECT_REF.
        * alias.c (get_alias_set, nonoverlapping_memrefs_p): Likewise.
        * tree-gimple.c (get_base_address): Likewise.
        * tree-ssa-loop-ivopts.c (for_each_index, peel_address): Likewise.
        * tree-pretty-print.c (op_prio): Likewise.
        (dump_generic_node): Likewise + consider REALIGN_LOAD_EXPR.
        * tree-ssa-operands.c (get_expr_operands): Same.
        * expr.c (safe_from_p, expand_expr_real_1, rewrite_address_base)
        (find_interesting_uses_address): Consider
        ALIGN_INDIRECT_REF and MISALIGNED_INDIRECT_REF.
        (expand_expr_real_1): Consider REALIGN_LOAD_EXPR.
        * optabs.h (vec_realign_store_optab, vec_realign_load_optab)
        (addr_floor_optab, addr_misaligned_optab): New optabs.
        (OTI_vec_realign_store, OTI_vec_realign_load, OTI_addr_floor)
        (OTI_addr_misaligned): New optab_index values for the above new
        optabs.
        (expand_realign_op, expand_addr_floor_op,
expand_addr_misaligned_op):
        Declaration for new functions.
        * optabs.c (optab_for_tree_code): Add new cases for the above
        new tree-codes.
        (expand_realign_op, expand_addr_floor_op,
expand_addr_misaligned_op):
        New functions.
        (init_optabs): Init vec_realign_load_optab, addr_floor_optab and
        addr_misaligned_optab.
        * genopinit.c (optabs): Handle above new optabs.

        * builtin-types.def (BT_CHAR_VECTOR, BT_FN_CHAR_VECTOR_PTR): New
types.
        * builtins.def (BUILT_IN_BUILD_VECTOR_MASK_FOR_LOAD): New builtin.
        (BUILT_IN_BUILD_CC_MASK_FOR_LOAD): New builtin.
        * builtins.c (expand_builtin_build_mask_for_load): New function.
        (exapnd_builtin): New cases for BUILT_IN_BUILD_VECTOR_MASK_FOR_LOAD
        and BUILT_IN_BUILD_CC_MASK_FOR_LOAD.

        * config/rs6000/altivec.md (build_vector_mask_for_load): New
        define_expand.
        (addr_floor_v4si, addr_floor_v4sf, addr_floor_v4hi,
addr_floor_v16qi):
        New define_expand.
        (vec_realign_load_v4si, vec_realign_load_v4sf,
vec_realign_load_v8hi)
        (vec_realign_load_v16qi): New define_insn.
        * config/i386/i386.md (addr_misaligned_v16qi): New define_expand.

        * tree-vectorizer.c (vect_create_data_ref): Renamed to
        vect_create_data_ref_ptr. Functionality changes reflected in the
        function documentation.
        (offset, initial_address, only_init): New arguments.
        (vect_ptr): New pointer to vectype rather than pointer to array of
        vectypes.
        (vectorizable_store): Call vect_create_data_ref_ptr with additional
        arguments, and create an indirect_ref with its return value
data_ref.
        Check aligned_access_p.
        (vect_create_cond_for_align_checks): Call vect_create_data_ref_ptr
with
        additional arguments.
        (vect_create_addr_base_for_vector_ref): Takes an additional
argument -
        offset. Creates &(base[init_val+offset]) instead of
&(base[init_val) if
        offst is provided.
        (vectorizable_load): Handle misaligned loads, using
software-pipelined
        scheme with REALIGN_LOAD_EXPR and ALIGN_INDIRECT_REF if
        vec_realign_load_optab and addr_floor_optab are supported, or using
        regular scheme (without software-pipelining) with
        MISALIGNED_INDIRECT_REF if addr_misaligned_optab is supported.
        (BUILT_IN_build_mask_for_load): New variable.
        (vect_enhance_data_refs_alignment): Don't do versioning for
misaligned
        loads/stores that can be vectorized. Call vectorizable_load/store
and
        initialize STMT_VINFO_VECTYPE.
        (vect_analyze_data_refs_alignment): Don't fail vectorization in the
        presence of misaligned loads.

        (vect_create_addr_base_for_vector_ref, vect_create_data_ref)
        (vect_compute_data_ref_alignment): Call unshare_expr.

        (add_loop_guard_on_edge, vect_create_index_for_vector_ref)
        (vect_finish_stmt_generation, vect_compute_data_refs_alignment)
        (vect_analyze_data_ref_access): Minor editting fixes (don't
overflow 80
        columns).

testsuite Changelog.lno:

        * gcc.dg/vect/vect-72.c: New test.
        * gcc.dg/vect/vect-27.c: Now vectorized on ppc*.
        * gcc.dg/vect/vect-52.c: Now vectorized on ppc*.
        * gcc.dg/vect/vect-6.c: Temporarily changed from run to compile
        * gcc.dg/vect/vect-26.c: Use sse2 instead of sse.
        * gcc.dg/vect/vect-27.c: Use sse2 instead of sse.
        * gcc.dg/vect/vect-28.c: Use sse2 instead of sse.
        * gcc.dg/vect/vect-29.c: Use sse2 instead of sse.
        * gcc.dg/vect/vect-4?.c: Use sse2 instead of sse.
        * gcc.dg/vect/vect-5?.c: Use sse2 instead of sse.
        * gcc.dg/vect/vect-60.c: Use sse2 instead of sse.
        * gcc.dg/vect/vect-61.c: Use sse2 instead of sse.

(See attached file: patch.Sept10)

Attachment: patch.Sept10
Description: Binary data


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]