This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: inlined memcpy/memset degradation in gcc 4.6 or later


Hi Walter,
I faced with similar problem when I worked on optimizing memcpy
expanding for x86.
x86-specific expander also needed alignment info and it was also
incorrect (i.e. too conservative). Routine get_mem_align_offset () is
used there to determine alignment, but after some moment it started to
return 1-byte instead of 16-byte or whatever alignment, which I
expected.
I made small fix for it and it seemed to work well again:
diff --git a/gcc/emit-rtl.c b/gcc/emit-rtl.c
index 9565c61..9108022 100644
--- a/gcc/emit-rtl.c
+++ b/gcc/emit-rtl.c
@@ -1516,6 +1516,14 @@ get_mem_align_offset (rtx mem, unsigned int align)
       if (TYPE_ALIGN (TREE_TYPE (expr)) < (unsigned int) align)
        return -1;
     }
+  else if (TREE_CODE (expr) == MEM_REF)
+    {
+      int al, off;
+      get_object_alignment_1 (expr, &al, &offset);
+      offset /= BITS_PER_UNIT;
+      if (al < align)
+       return -1;
+    }
   else if (TREE_CODE (expr) == COMPONENT_REF)

So, returning to your problem - probably routines you mentioned also
don't handle MEM_REF (and before some commit they didn't have to).
Also, you could look into routine I mentioned - probably you could
find something useful for you there.

---
Thanks, Michael

On 2 October 2012 18:19, Walter Lee <walt@tilera.com> wrote:
>
> On TILE-Gx, I'm observing a degradation in inlined memcpy/memset in
> gcc 4.6 and later versus gcc 4.4.  Though I find the problem on
> TILE-Gx, I think this is a problem for any architectures with
> SLOW_UNALIGNED_ACCESS set to 1.
>
> Consider the following program:
>
> struct foo {
>   int x;
> };
>
> void copy(struct foo* f0, struct foo* f1)
> {
>   memcpy (f0, f1, sizeof(struct foo));
> }
>
> In gcc 4.4, I get the desired inline memcpy:
>
> copy:
>         ld4s    r1, r1
>         st4     r0, r1
>         jrp     lr
>
> In gcc 4.7, however, I get inlined byte-by-byte copies:
>
> copy:
>         ld1u_add r10, r1, 1
>         st1_add  r0, r10, 1
>         ld1u_add r10, r1, 1
>         st1_add  r0, r10, 1
>         ld1u_add r10, r1, 1
>         st1_add  r0, r10, 1
>         ld1u     r10, r1
>         st1      r0, r10
>         jrp      lr
>
> The inlining of memcpy is done in expand_builtin_memcpy in builtins.c.
> Tracing through that, I see that the alignment of src_align and
> dest_align, which is computed by get_pointer_alignment, has degraded:
> in gcc 4.4 they are 32 bits, but in gcc 4.7 they are 8 bits.  This
> causes the loads generated by the inlined memcopy to be per-byte
> instead of per-4-byte.
>
> Looking further, gcc 4.7 uses the "align" field in "struct
> ptr_info_def" to compute the alignment.  This field appears to be
> initialized in get_ptr_info in tree-ssanames.c but it is always
> initialized to 1 byte and does not appear to change.  gcc 4.4 computes
> its alignment information differently.
>
> I get the same byte-copies with gcc 4.8 and gcc 4.6.
>
> I see a couple related open PRs: 50417, 53535, but no suggested fixes
> for them yet.  Can anyone advise on how this can be fixed?  Should I
> file a new bug, or add this info to one of the existing PRs?
>
> Thanks,
>
> Walter
>



-- 
---
Best regards,
Michael V. Zolotukhin,
Software Engineer
Intel Corporation.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]