[Bug middle-end/102877] missed optimization: memcpy produces lots more asm than otherwise
rguenth at gcc dot gnu.org
gcc-bugzilla@gcc.gnu.org
Thu Oct 21 13:03:40 GMT 2021
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102877
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|UNCONFIRMED |NEW
Component|c++ |middle-end
Ever confirmed|0 |1
Keywords| |missed-optimization
Last reconfirmed| |2021-10-21
--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
Confirmed. With memcpy we expand from
MEM <vector(2) unsigned char> [(unsigned char *)&value] = { 0, 0 };
MEM <unsigned char[6]> [(char * {ref-all})&value + 2B] = MEM <unsigned
char[6]> [(char * {ref-all})&gc];
value.0_1 = value;
_2 = __builtin_bswap64 (value.0_1); [tail call]
value ={v} {CLOBBER};
return _2;
thus we expand 'value' on the stack. Without memcpy we manage to do
MEM <unsigned short> [(unsigned char *)&value] = 0;
_19 = MEM <unsigned short> [(unsigned char *)&gc];
MEM <unsigned short> [(unsigned char *)&value + 2B] = _19;
_21 = MEM <unsigned int> [(unsigned char *)&gc + 2B];
MEM <unsigned int> [(unsigned char *)&value + 4B] = _21;
value.0_7 = value;
_8 = __builtin_bswap64 (value.0_7); [tail call]
which also expands 'value' to the stack but is appearantly nicer to later
passes which means the way we expand the aggregate copy of type char[6]
is highly sub-optimal (we do 6 byte loads & stores).
More information about the Gcc-bugs
mailing list