This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Thoughts on memcmp expansion (PR43052)


On Thu, May 12, 2016 at 7:14 PM, Bernd Schmidt <bschmidt@redhat.com> wrote:
> On 05/02/2016 03:14 PM, Richard Biener wrote:
>
>> I think it fits best in tree-ssa-strlen.c:strlen_optimize_stmt for the
>> moment.
>
>
> I've done this in this version. Note that this apparently means it won't be
> run at -O unlike the previous version.
>
> The code moved to tree-ssa-strlen.c is nearly identical to the previous
> version, with the exception of a new line that copies the contents of
> gimple_call_use_set to the newly constructed call. Without this, I found a
> testcase where we can miss a DSE opportunity since we think memcmp_eq can
> access anything.
>
> In the by_pieces_code, I've gone ahead and progressed the C++ conversion a
> little further by making subclasses of op_by_pieces_d. Now the
> {move,store,compare}_by_pieces functions generally just make an object of
> the appropriate type and call its run method. I've also converted
> store_by_pieces to this infrastructure to eliminate more duplication.
>
> I've verified that if you disable the strlenopt code, you get identical code
> generation before and after the patch on x86_64-linux and arm-eabi. With the
> strlenopt enabled, it finds memcmp optimization opportunities on both
> targets (more on x86 due to !SLOW_UNALIGNED_ACCESS). On arm, there is a
> question mark over whether the autoinc support in the by_pieces code is
> really a good idea, but that's unchanged from before and can be addressed
> later.
> Microbenchmarks on x86 suggest that the new by-pieces expansion is a whole
> lot faster than calling memcmp (about a factor of 3 in my tests).
>
> Bootstrapped and tested on x86_64-linux. The testcase failed at -O (now
> changed to -O2), and there was a failure in vrp47.c which also seems to
> appear in gcc-testresults, so I think it's unrelated. Still, I'll make
> another run against a new baseline. Ok for trunk if that all comes back
> clean?

@@ -6081,9 +6056,16 @@ expand_builtin (tree exp, rtx target, rt

     case BUILT_IN_BCMP:
     case BUILT_IN_MEMCMP:
-      target = expand_builtin_memcmp (exp, target);
+    case BUILT_IN_MEMCMP_EQ:
+      target = expand_builtin_memcmp (exp, target, fcode ==
BUILT_IN_MEMCMP_EQ);
       if (target)
        return target;
+      if (fcode == BUILT_IN_MEMCMP_EQ)
+       {
+         exp = copy_node (exp);
+         tree newdecl = builtin_decl_explicit (BUILT_IN_MEMCMP);
+         TREE_OPERAND (exp, 1) = build_fold_addr_expr (newdecl);
+       }

you can modify 'exp' in-place (yeah, we still build a GENERIC call from GIMPLE
just because nobody "fixed" downstream routines to accept a GIMPLE call).

I'm not much of a fan of C++-ification (in this case it makes review
harder) but well ...

+  if (tree_fits_uhwi_p (len)
+      && (leni = tree_to_uhwi (len)) <= GET_MODE_SIZE (word_mode)
+      && exact_log2 (leni) != -1
+      && (align1 = get_pointer_alignment (arg1)) >= leni * BITS_PER_UNIT
+      && (align2 = get_pointer_alignment (arg2)) >= leni * BITS_PER_UNIT)

I think * BITS_PER_UNIT has to be * 8 here as the C standard defines
it that way.

I think to increase coverage you want || ! SLOW_UNALIGNED_ACCESS
(TYPE_MODE (), align1)
here and build properly mis-aligned access types for the MEM_REF like

                  if (src_align < GET_MODE_ALIGNMENT (TYPE_MODE (type)))
                    srctype = build_aligned_type (type, src_align);

(copied from gimple_fold_builtin_memory_op).

+      type = build_nonstandard_integer_type (leni * BITS_PER_UNIT, 1);
+      gcc_assert (GET_MODE_SIZE (TYPE_MODE (type)) == leni);

Likewise leni * 8 and the assert would have to assert
GET_MODE_SIZE () * BITS_PER_UNIT == leni * 8 (or using GET_MODE_BITSIZE).

+      arg1 = build2_loc (loc, MEM_REF, type, arg1, off);
+      arg2 = build2_loc (loc, MEM_REF, type, arg2, off);

you want to add

        tree tem = fold_const_aggregate_ref (arg1);
        if (tem)
          arg1 = tem;

and same for arg2.  That avoids leaving unfolded memory refs in the IL
after the transform.

+  tree newfn = builtin_decl_explicit (BUILT_IN_MEMCMP_EQ);
+  gcall *repl = gimple_build_call (newfn, 3, arg1, arg2, len);
+  gimple_call_set_lhs (repl, res);
+  gimple_set_location (repl, gimple_location (stmt2));
+  gimple_set_vuse (repl, gimple_vuse (stmt2));
+  *gimple_call_use_set (repl) = *gimple_call_use_set (stmt2);
+  gcc_assert (!gimple_vdef (stmt2));
+  gsi_replace (gsi, repl, false);
+  return false;

I think you can avoid all this by just doing

   gimple_call_set_fndecl (stmt2, builtin_decl_explicit (BUILT_IN_MEMCMP_EQ));

as MEMCMP and MEMCMP_EQ have equal signatures.

Ok with those changes.

Thanks,
Richard.


>
> Bernd


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]