[Bug tree-optimization/50417] regression: memcpy with known alignment

Tue Jul 12 09:11:00 GMT 2016

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=50417

--- Comment #23 from rguenther at suse dot de <rguenther at suse dot de> ---
On Tue, 12 Jul 2016, npl at chello dot at wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=50417
> 
> --- Comment #22 from npl at chello dot at ---
> > > 00000014 <fixme>:
> > >   14:	e3a03000 	mov	r3, #0
> > >   18:	e5803000 	str	r3, [r0]
> > >   1c:	e5812000 	str	r2, [r1]
> > >   20:	e5900000 	ldr	r0, [r0] // The load thats missing above
> > >   24:	e24dd010 	sub	sp, sp, #16 // Time for another 
> > >   28:	e28dd010 	add	sp, sp, #16 // Bugreport ?
> > >   2c:	e12fff1e 	bx	lr
> > 
> > It's not done on STRICT_ALIGNMENT platforms because not all of those expand
> > misaligned moves correctly (IIRC).  Looking at RTL expansion at least the
> > misaligned destination will work correctly.  The question remains is what
> > happens for -Os and for example both misaligned source and destination.
> > Or on x86 where a simple rep; movb; is possible (plus the register setup
> > of course).
> 
> Not sure what you mean, x86 has unaligned accesses and shouldn't be affected.
> Also, I doubt there are many cases where the function-call (and the register
> and stack shuffling) will use less code than the aligned access.
> 
> The generated code for unaligned access could be improved in many cases (ARM
> atleast), possibly fixed. But thats not generally an argument against improving
> the builtins?

Sure.  It is just that I am usually (overly?) cautionous when introducing
unaligned accesses because historically they were handled wrong and
nowadays they might be handled very inefficient on strict-align targets.

I am going to propose the patch and add a testcase with most misalign
cases so we can see runtime fallout on the more weird targets.