This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: Thoughts on memcmp expansion (PR43052)
- From: Richard Biener <richard dot guenther at gmail dot com>
- To: Bernd Schmidt <bschmidt at redhat dot com>
- Cc: GCC Patches <gcc-patches at gcc dot gnu dot org>, Nick Clifton <nickc at redhat dot com>
- Date: Fri, 13 May 2016 12:20:44 +0200
- Subject: Re: Thoughts on memcmp expansion (PR43052)
- Authentication-results: sourceware.org; auth=none
- References: <56992541 dot 3090402 at redhat dot com> <CAFiYyc1NX9V9sM=cij75qAGYqGpXGM=1ZioB8cxWNZxsW=uJeQ at mail dot gmail dot com> <5722581B dot 5050402 at redhat dot com> <CAFiYyc3Nvb3Om81Fs6eGpc7oOkvQxJD7Qa_RP-AQh3x_-w-LRw at mail dot gmail dot com> <57274ECF dot 3060909 at redhat dot com> <CAFiYyc3CVOA-wwzeKVkOyKRVOzDS45C3h4PuvACJ1ZWDpRAqwQ at mail dot gmail dot com> <5734B9DF dot 9000505 at redhat dot com>
On Thu, May 12, 2016 at 7:14 PM, Bernd Schmidt <bschmidt@redhat.com> wrote:
> On 05/02/2016 03:14 PM, Richard Biener wrote:
>
>> I think it fits best in tree-ssa-strlen.c:strlen_optimize_stmt for the
>> moment.
>
>
> I've done this in this version. Note that this apparently means it won't be
> run at -O unlike the previous version.
>
> The code moved to tree-ssa-strlen.c is nearly identical to the previous
> version, with the exception of a new line that copies the contents of
> gimple_call_use_set to the newly constructed call. Without this, I found a
> testcase where we can miss a DSE opportunity since we think memcmp_eq can
> access anything.
>
> In the by_pieces_code, I've gone ahead and progressed the C++ conversion a
> little further by making subclasses of op_by_pieces_d. Now the
> {move,store,compare}_by_pieces functions generally just make an object of
> the appropriate type and call its run method. I've also converted
> store_by_pieces to this infrastructure to eliminate more duplication.
>
> I've verified that if you disable the strlenopt code, you get identical code
> generation before and after the patch on x86_64-linux and arm-eabi. With the
> strlenopt enabled, it finds memcmp optimization opportunities on both
> targets (more on x86 due to !SLOW_UNALIGNED_ACCESS). On arm, there is a
> question mark over whether the autoinc support in the by_pieces code is
> really a good idea, but that's unchanged from before and can be addressed
> later.
> Microbenchmarks on x86 suggest that the new by-pieces expansion is a whole
> lot faster than calling memcmp (about a factor of 3 in my tests).
>
> Bootstrapped and tested on x86_64-linux. The testcase failed at -O (now
> changed to -O2), and there was a failure in vrp47.c which also seems to
> appear in gcc-testresults, so I think it's unrelated. Still, I'll make
> another run against a new baseline. Ok for trunk if that all comes back
> clean?
@@ -6081,9 +6056,16 @@ expand_builtin (tree exp, rtx target, rt
case BUILT_IN_BCMP:
case BUILT_IN_MEMCMP:
- target = expand_builtin_memcmp (exp, target);
+ case BUILT_IN_MEMCMP_EQ:
+ target = expand_builtin_memcmp (exp, target, fcode ==
BUILT_IN_MEMCMP_EQ);
if (target)
return target;
+ if (fcode == BUILT_IN_MEMCMP_EQ)
+ {
+ exp = copy_node (exp);
+ tree newdecl = builtin_decl_explicit (BUILT_IN_MEMCMP);
+ TREE_OPERAND (exp, 1) = build_fold_addr_expr (newdecl);
+ }
you can modify 'exp' in-place (yeah, we still build a GENERIC call from GIMPLE
just because nobody "fixed" downstream routines to accept a GIMPLE call).
I'm not much of a fan of C++-ification (in this case it makes review
harder) but well ...
+ if (tree_fits_uhwi_p (len)
+ && (leni = tree_to_uhwi (len)) <= GET_MODE_SIZE (word_mode)
+ && exact_log2 (leni) != -1
+ && (align1 = get_pointer_alignment (arg1)) >= leni * BITS_PER_UNIT
+ && (align2 = get_pointer_alignment (arg2)) >= leni * BITS_PER_UNIT)
I think * BITS_PER_UNIT has to be * 8 here as the C standard defines
it that way.
I think to increase coverage you want || ! SLOW_UNALIGNED_ACCESS
(TYPE_MODE (), align1)
here and build properly mis-aligned access types for the MEM_REF like
if (src_align < GET_MODE_ALIGNMENT (TYPE_MODE (type)))
srctype = build_aligned_type (type, src_align);
(copied from gimple_fold_builtin_memory_op).
+ type = build_nonstandard_integer_type (leni * BITS_PER_UNIT, 1);
+ gcc_assert (GET_MODE_SIZE (TYPE_MODE (type)) == leni);
Likewise leni * 8 and the assert would have to assert
GET_MODE_SIZE () * BITS_PER_UNIT == leni * 8 (or using GET_MODE_BITSIZE).
+ arg1 = build2_loc (loc, MEM_REF, type, arg1, off);
+ arg2 = build2_loc (loc, MEM_REF, type, arg2, off);
you want to add
tree tem = fold_const_aggregate_ref (arg1);
if (tem)
arg1 = tem;
and same for arg2. That avoids leaving unfolded memory refs in the IL
after the transform.
+ tree newfn = builtin_decl_explicit (BUILT_IN_MEMCMP_EQ);
+ gcall *repl = gimple_build_call (newfn, 3, arg1, arg2, len);
+ gimple_call_set_lhs (repl, res);
+ gimple_set_location (repl, gimple_location (stmt2));
+ gimple_set_vuse (repl, gimple_vuse (stmt2));
+ *gimple_call_use_set (repl) = *gimple_call_use_set (stmt2);
+ gcc_assert (!gimple_vdef (stmt2));
+ gsi_replace (gsi, repl, false);
+ return false;
I think you can avoid all this by just doing
gimple_call_set_fndecl (stmt2, builtin_decl_explicit (BUILT_IN_MEMCMP_EQ));
as MEMCMP and MEMCMP_EQ have equal signatures.
Ok with those changes.
Thanks,
Richard.
>
> Bernd