This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: inlined memcpy/memset degradation in gcc 4.6 or later
- From: Michael Zolotukhin <michael dot v dot zolotukhin at gmail dot com>
- To: Walter Lee <walt at tilera dot com>
- Cc: gcc at gcc dot gnu dot org
- Date: Tue, 2 Oct 2012 21:23:24 +0400
- Subject: Re: inlined memcpy/memset degradation in gcc 4.6 or later
- References: <201210021419.q92EJ4LC005220@farm-0038.internal.tilera.com>
Hi Walter,
I faced with similar problem when I worked on optimizing memcpy
expanding for x86.
x86-specific expander also needed alignment info and it was also
incorrect (i.e. too conservative). Routine get_mem_align_offset () is
used there to determine alignment, but after some moment it started to
return 1-byte instead of 16-byte or whatever alignment, which I
expected.
I made small fix for it and it seemed to work well again:
diff --git a/gcc/emit-rtl.c b/gcc/emit-rtl.c
index 9565c61..9108022 100644
--- a/gcc/emit-rtl.c
+++ b/gcc/emit-rtl.c
@@ -1516,6 +1516,14 @@ get_mem_align_offset (rtx mem, unsigned int align)
if (TYPE_ALIGN (TREE_TYPE (expr)) < (unsigned int) align)
return -1;
}
+ else if (TREE_CODE (expr) == MEM_REF)
+ {
+ int al, off;
+ get_object_alignment_1 (expr, &al, &offset);
+ offset /= BITS_PER_UNIT;
+ if (al < align)
+ return -1;
+ }
else if (TREE_CODE (expr) == COMPONENT_REF)
So, returning to your problem - probably routines you mentioned also
don't handle MEM_REF (and before some commit they didn't have to).
Also, you could look into routine I mentioned - probably you could
find something useful for you there.
---
Thanks, Michael
On 2 October 2012 18:19, Walter Lee <walt@tilera.com> wrote:
>
> On TILE-Gx, I'm observing a degradation in inlined memcpy/memset in
> gcc 4.6 and later versus gcc 4.4. Though I find the problem on
> TILE-Gx, I think this is a problem for any architectures with
> SLOW_UNALIGNED_ACCESS set to 1.
>
> Consider the following program:
>
> struct foo {
> int x;
> };
>
> void copy(struct foo* f0, struct foo* f1)
> {
> memcpy (f0, f1, sizeof(struct foo));
> }
>
> In gcc 4.4, I get the desired inline memcpy:
>
> copy:
> ld4s r1, r1
> st4 r0, r1
> jrp lr
>
> In gcc 4.7, however, I get inlined byte-by-byte copies:
>
> copy:
> ld1u_add r10, r1, 1
> st1_add r0, r10, 1
> ld1u_add r10, r1, 1
> st1_add r0, r10, 1
> ld1u_add r10, r1, 1
> st1_add r0, r10, 1
> ld1u r10, r1
> st1 r0, r10
> jrp lr
>
> The inlining of memcpy is done in expand_builtin_memcpy in builtins.c.
> Tracing through that, I see that the alignment of src_align and
> dest_align, which is computed by get_pointer_alignment, has degraded:
> in gcc 4.4 they are 32 bits, but in gcc 4.7 they are 8 bits. This
> causes the loads generated by the inlined memcopy to be per-byte
> instead of per-4-byte.
>
> Looking further, gcc 4.7 uses the "align" field in "struct
> ptr_info_def" to compute the alignment. This field appears to be
> initialized in get_ptr_info in tree-ssanames.c but it is always
> initialized to 1 byte and does not appear to change. gcc 4.4 computes
> its alignment information differently.
>
> I get the same byte-copies with gcc 4.8 and gcc 4.6.
>
> I see a couple related open PRs: 50417, 53535, but no suggested fixes
> for them yet. Can anyone advise on how this can be fixed? Should I
> file a new bug, or add this info to one of the existing PRs?
>
> Thanks,
>
> Walter
>
--
---
Best regards,
Michael V. Zolotukhin,
Software Engineer
Intel Corporation.