This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Use of vector instructions in memmov/memset expanding


On Mon, Jul 11, 2011 at 1:57 PM, Michael Zolotukhin
<michael.v.zolotukhin@gmail.com> wrote:
> Sorry, for sending once again - forgot to attach the patch.
>
> On 11 July 2011 23:50, Michael Zolotukhin
> <michael.v.zolotukhin@gmail.com> wrote:
>> The attached patch enables use of vector instructions in memmov/memset
>> expanding.
>>
>> New algorithm for move-mode selection is implemented for move_by_pieces,
>> store_by_pieces.
>> x86-specific ix86_expand_movmem and ix86_expand_setmem are also changed in
>> similar way, x86 cost-models parameters are slightly changed to support
>> this. This implementation checks if array's alignment is known at compile
>> time and chooses expanding algorithm and move-mode according to it.
>>
>> Bootstrapped, two new fails due to incorrect tests (see
>> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49503). New implementation gives
>> quite big performance gain on memset/memcpy in some cases.
>>
>> A bunch of new tests are added to verify the implementation.
>>
>> Is it ok for trunk?
>>
>> Changelog:
>>
>> 2011-07-11? Zolotukhin Michael? <michael.v.zolotukhin@intel.com>
>>
>> ??? * config/i386/i386.h (processor_costs): Add second dimension to
>> ??? stringop_algs array.
>> ??? (clear_ratio): Tune value to improve performance.
>> ??? * config/i386/i386.c (cost models): Initialize second dimension of
>> ??? stringop_algs arrays.? Tune cost model in atom_cost, generic32_cost
>> ??? and generic64_cost.
>> ??? (ix86_expand_move): Add support for vector moves, that use half of
>> ??? vector register.
>> ??? (expand_set_or_movmem_via_loop_with_iter): New function.
>> ??? (expand_set_or_movmem_via_loop): Enable reuse of the same iters in
>> ??? different loops, produced by this function.
>> ??? (emit_strset): New function.
>> ??? (promote_duplicated_reg): Add support for vector modes, add
>> ??? declaration.
>> ??? (promote_duplicated_reg_to_size): Likewise.
>> ??? (expand_movmem_epilogue): Add epilogue generation for bigger sizes.
>> ??? (expand_setmem_epilogue): Likewise.
>> ??? (expand_movmem_prologue): Likewise for prologue.
>> ??? (expand_setmem_prologue): Likewise.
>> ??? (expand_constant_movmem_prologue): Likewise.
>> ??? (expand_constant_setmem_prologue): Likewise.
>> ??? (decide_alg): Add new argument align_unknown.? Fix algorithm of
>> ??? strategy selection if TARGET_INLINE_ALL_STRINGOPS is set.
>> ??? (decide_alignment): Update desired alignment according to chosen move
>> ??? mode.
>> ??? (ix86_expand_movmem): Change unrolled_loop strategy to use SSE-moves.
>> ??? (ix86_expand_setmem): Likewise.
>> ??? (ix86_slow_unaligned_access): Implementation of new hook
>> ??? slow_unaligned_access.
>> ??? (ix86_promote_rtx_for_memset): Implementation of new hook
>> ??? promote_rtx_for_memset.
>> ??? * config/i386/sse.md (sse2_loadq): Add expand for sse2_loadq.
>> ??? (vec_dupv4si): Add expand for vec_dupv4si.
>> ??? (vec_dupv2di): Add expand for vec_dupv2di.
>> ??? * emit-rtl.c (adjust_address_1): Improve algorithm for determining
>> ??? alignment of address+offset.
>> ??? (get_mem_align_offset): Add handling of MEM_REFs.
>> ??? * expr.c (compute_align_by_offset): New function.
>> ??? (move_by_pieces_insn): New function.
>> ??? (widest_mode_for_unaligned_mov): New function.
>> ??? (widest_mode_for_aligned_mov): New function.
>> ??? (widest_int_mode_for_size): Change type of size from int to
>> ??? HOST_WIDE_INT.
>> ??? (set_by_pieces_1): New function (new algorithm of memset expanding).
>> ??? (set_by_pieces_2): New function.
>> ??? (generate_move_with_mode): New function for set_by_pieces.
>> ??? (alignment_for_piecewise_move): Use hook slow_unaligned_access instead
>> ??? of macros SLOW_UNALIGNED_ACCESS.
>> ??? (emit_group_load_1): Likewise.
>> ??? (emit_group_store): Likewise.
>> ??? (emit_push_insn): Likewise.
>> ??? (store_field): Likewise.
>> ??? (expand_expr_real_1): Likewise.
>> ??? (compute_aligned_cost): New function.
>> ??? (compute_unaligned_cost): New function.
>> ??? (vector_mode_for_mode): New function.
>> ??? (vector_extensions_used_for_mode): New function.
>> ??? (move_by_pieces): New algorithm of memmove expanding.
>> ??? (move_by_pieces_ninsns): Update according to changes in
>> ??? move_by_pieces.
>> ??? (move_by_pieces_1): Remove as unused.
>> ??? (store_by_pieces): New algorithm for memset expanding.
>> ??? (clear_by_pieces): Likewise.
>> ??? (store_by_pieces_1): Remove incorrect parameters' attributes.
>> ??? * expr.h (compute_align_by_offset): Add declaration.
>> ??? * rtl.h (vector_extensions_used_for_mode): Add declaration.
>> ??? * builtins.c (expand_builtin_memset_args): Update according to changes
>> ??? in set_by_pieces.
>> ??? * target.def (DEFHOOK): Add hook slow_unaligned_access and
>> ??? promote_rtx_for_memset.
>> ??? * targhooks.c (default_slow_unaligned_access): Add default hook
>> ??? implementation.
>> ??? (default_promote_rtx_for_memset): Likewise.
>> ??? * targhooks.h (default_slow_unaligned_access): Add prototype.
>> ??? (default_promote_rtx_for_memset): Likewise.
>> ??? * cse.c (cse_insn): Stop forward propagation of vector constants.
>> ??? * fwprop.c (forward_propagate_and_simplify): Likewise.
>> ??? * doc/tm.texi (SLOW_UNALIGNED_ACCESS): Remove documentation for deleted
>> ??? macro SLOW_UNALIGNED_ACCESS.
>> ??? (TARGET_SLOW_UNALIGNED_ACCESS): Add documentation on new hook.
>> ??? (TARGET_PROMOTE_RTX_FOR_MEMSET): Likewise.
>> ??? * doc/tm.texi.in (SLOW_UNALIGNED_ACCESS): Likewise.
>> ??? (TARGET_SLOW_UNALIGNED_ACCESS): Likewise.
>> ??? (TARGET_PROMOTE_RTX_FOR_MEMSET): Likewise.
>>
>> 2011-07-11? Zolotukhin Michael? <michael.v.zolotukhin@intel.com>
>>
>> ??? * testsuite/gcc.target/i386/memset-s64-a0-1.c: New testcase.
>> ??? * testsuite/gcc.target/i386/memset-s64-a0-2.c: Ditto.
>> ??? * testsuite/gcc.target/i386/memset-s768-a0-1.c: Ditto.
>> ??? * testsuite/gcc.target/i386/memset-s768-a0-2.c: Ditto.
>> ??? * testsuite/gcc.target/i386/memset-s16-a1-1.c: Ditto.
>> ??? * testsuite/gcc.target/i386/memcpy-s16-a1-1.c: Ditto.
>> ??? * testsuite/gcc.target/i386/memset-s64-a0-3.c: Ditto.
>> ??? * testsuite/gcc.target/i386/memcpy-s64-a0-1.c: Ditto.
>> ??? * testsuite/gcc.target/i386/memset-s64-a1-1.c: Ditto.
>> ??? * testsuite/gcc.target/i386/memcpy-s64-a1-1.c: Ditto.
>> ??? * testsuite/gcc.target/i386/memset-s64-au-1.c: Ditto.
>> ??? * testsuite/gcc.target/i386/memcpy-s64-au-1.c: Ditto.
>> ??? * testsuite/gcc.target/i386/memset-s512-a0-1.c: Ditto.
>> ??? * testsuite/gcc.target/i386/memcpy-s512-a0-1.c: Ditto.
>> ??? * testsuite/gcc.target/i386/memset-s512-a1-1.c: Ditto.
>> ??? * testsuite/gcc.target/i386/memcpy-s512-a1-1.c: Ditto.
>> ??? * testsuite/gcc.target/i386/memset-s512-au-1.c: Ditto.
>> ??? * testsuite/gcc.target/i386/memcpy-s512-au-1.c: Ditto.
>> ??? * testsuite/gcc.target/i386/memset-s3072-a1-1.c: Ditto.
>> ??? * testsuite/gcc.target/i386/memcpy-s3072-a1-1.c: Ditto.
>> ??? * testsuite/gcc.target/i386/memset-s3072-au-1.c: Ditto.
>> ??? * testsuite/gcc.target/i386/memcpy-s3072-au-1.c: Ditto.
>> ??? * testsuite/gcc.target/i386/memset-s64-a0-4.c: Ditto.
>> ??? * testsuite/gcc.target/i386/memset-s64-a0-5.c: Ditto.
>> ??? * testsuite/gcc.target/i386/memset-s768-a0-3.c: Ditto.
>> ??? * testsuite/gcc.target/i386/memset-s768-a0-4.c: Ditto.
>> ??? * testsuite/gcc.target/i386/memset-s16-a1-2.c: Ditto.
>> ??? * testsuite/gcc.target/i386/memcpy-s16-a1-2.c: Ditto.
>> ??? * testsuite/gcc.target/i386/memset-s64-a0-6.c: Ditto.
>> ??? * testsuite/gcc.target/i386/memcpy-s64-a0-2.c: Ditto.
>> ??? * testsuite/gcc.target/i386/memset-s64-a1-2.c: Ditto.
>> ??? * testsuite/gcc.target/i386/memcpy-s64-a1-2.c: Ditto.
>> ??? * testsuite/gcc.target/i386/memset-s64-au-2.c: Ditto.
>> ??? * testsuite/gcc.target/i386/memcpy-s64-au-2.c: Ditto.
>> ??? * testsuite/gcc.target/i386/memset-s512-a0-2.c: Ditto.
>> ??? * testsuite/gcc.target/i386/memcpy-s512-a0-2.c: Ditto.
>> ??? * testsuite/gcc.target/i386/memset-s512-a1-2.c: Ditto.
>> ??? * testsuite/gcc.target/i386/memcpy-s512-a1-2.c: Ditto.
>> ??? * testsuite/gcc.target/i386/memset-s512-au-2.c: Ditto.
>> ??? * testsuite/gcc.target/i386/memcpy-s512-au-2.c: Ditto.
>> ??? * testsuite/gcc.target/i386/memset-s3072-a1-2.c: Ditto.
>> ??? * testsuite/gcc.target/i386/memcpy-s3072-a1-2.c: Ditto.
>> ??? * testsuite/gcc.target/i386/memset-s3072-au-2.c: Ditto.
>> ??? * testsuite/gcc.target/i386/memcpy-s3072-au-2.c: Ditto.
>> ??? * testsuite/gcc.target/i386/memset-s64-a0-7.c: Ditto.
>> ??? * testsuite/gcc.target/i386/memset-s64-a0-8.c: Ditto.
>> ??? * testsuite/gcc.target/i386/memset-s768-a0-5.c: Ditto.
>> ??? * testsuite/gcc.target/i386/memset-s768-a0-6.c: Ditto.
>> ??? * testsuite/gcc.target/i386/memset-s16-a1-3.c: Ditto.
>> ??? * testsuite/gcc.target/i386/memcpy-s16-a1-3.c: Ditto.
>> ??? * testsuite/gcc.target/i386/memset-s64-a0-9.c: Ditto.
>> ??? * testsuite/gcc.target/i386/memcpy-s64-a0-3.c: Ditto.
>> ??? * testsuite/gcc.target/i386/memset-s64-a1-3.c: Ditto.
>> ??? * testsuite/gcc.target/i386/memcpy-s64-a1-3.c: Ditto.
>> ??? * testsuite/gcc.target/i386/memset-s64-au-3.c: Ditto.
>> ??? * testsuite/gcc.target/i386/memcpy-s64-au-3.c: Ditto.
>> ??? * testsuite/gcc.target/i386/memset-s512-a0-3.c: Ditto.
>> ??? * testsuite/gcc.target/i386/memcpy-s512-a0-3.c: Ditto.
>> ??? * testsuite/gcc.target/i386/memset-s512-a1-3.c: Ditto.
>> ??? * testsuite/gcc.target/i386/memcpy-s512-a1-3.c: Ditto.
>> ??? * testsuite/gcc.target/i386/memset-s512-au-3.c: Ditto.
>> ??? * testsuite/gcc.target/i386/memcpy-s512-au-3.c: Ditto.
>> ??? * testsuite/gcc.target/i386/memset-s64-a0-10.c: Ditto.
>> ??? * testsuite/gcc.target/i386/memset-s64-a0-11.c: Ditto.
>> ??? * testsuite/gcc.target/i386/memset-s768-a0-7.c: Ditto.
>> ??? * testsuite/gcc.target/i386/memset-s768-a0-8.c: Ditto.
>> ??? * testsuite/gcc.target/i386/memset-s16-a1-4.c: Ditto.
>> ??? * testsuite/gcc.target/i386/memcpy-s16-a1-4.c: Ditto.
>> ??? * testsuite/gcc.target/i386/memset-s64-a0-12.c: Ditto.
>> ??? * testsuite/gcc.target/i386/memcpy-s64-a0-4.c: Ditto.
>> ??? * testsuite/gcc.target/i386/memset-s64-a1-4.c: Ditto.
>> ??? * testsuite/gcc.target/i386/memcpy-s64-a1-4.c: Ditto.
>> ??? * testsuite/gcc.target/i386/memset-s64-au-4.c: Ditto.
>> ??? * testsuite/gcc.target/i386/memcpy-s64-au-4.c: Ditto.
>> ??? * testsuite/gcc.target/i386/memset-s512-a0-4.c: Ditto.
>> ??? * testsuite/gcc.target/i386/memcpy-s512-a0-4.c: Ditto.
>> ??? * testsuite/gcc.target/i386/memset-s512-a1-4.c: Ditto.
>> ??? * testsuite/gcc.target/i386/memcpy-s512-a1-4.c: Ditto.
>> ??? * testsuite/gcc.target/i386/memset-s512-au-4.c: Ditto.
>>
>

Please don't use -m32/-m64 in testcases directly.
You should use

/* { dg-do compile { target { ! ia32 } } } */

for 32bit insns and

/* { dg-do compile { target { ia32 } } } */

for 64bit insns.


-- 
H.J.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]