[Bug middle-end/102877] missed optimization: memcpy produces lots more asm than otherwise

Thu Oct 21 13:03:40 GMT 2021

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102877

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
          Component|c++                         |middle-end
     Ever confirmed|0                           |1
           Keywords|                            |missed-optimization
   Last reconfirmed|                            |2021-10-21

--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
Confirmed.  With memcpy we expand from

  MEM <vector(2) unsigned char> [(unsigned char *)&value] = { 0, 0 };
  MEM <unsigned char[6]> [(char * {ref-all})&value + 2B] = MEM <unsigned
char[6]> [(char * {ref-all})&gc];
  value.0_1 = value;
  _2 = __builtin_bswap64 (value.0_1); [tail call]
  value ={v} {CLOBBER};
  return _2;

thus we expand 'value' on the stack.  Without memcpy we manage to do

  MEM <unsigned short> [(unsigned char *)&value] = 0;
  _19 = MEM <unsigned short> [(unsigned char *)&gc];
  MEM <unsigned short> [(unsigned char *)&value + 2B] = _19;
  _21 = MEM <unsigned int> [(unsigned char *)&gc + 2B];
  MEM <unsigned int> [(unsigned char *)&value + 4B] = _21;
  value.0_7 = value;
  _8 = __builtin_bswap64 (value.0_7); [tail call]

which also expands 'value' to the stack but is appearantly nicer to later
passes which means the way we expand the aggregate copy of type char[6]
is highly sub-optimal (we do 6 byte loads & stores).