[Bug tree-optimization/78821] GCC7: Copying whole 32 bits structure field by field not optimised into copying whole 32 bits at once

Mon Nov 20 11:28:00 GMT 2017

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78821

--- Comment #19 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
(In reply to Uroš Bizjak from comment #17)
> Hm, even with the latest patch, the testcase from comment #5:
> still compiles to:
> 
>         movl    %esi, %eax
>         movw    %si, (%rdi)
>         notl    %esi
>         notl    %eax
>         movb    %sil, 3(%rdi)
>         movb    %ah, 2(%rdi)
>         ret

The reason for that is that the IL is something the bswap framework can't
handle.  Let's look just at the simplified:
void baz (char *buf, unsigned int data)
{
  buf[2] = ~data >> 8;
  buf[3] = ~data;
}

  _1 = ~data_6(D);
  _2 = _1 >> 8;
  _3 = (char) _2;
  MEM[(char *)buf_7(D) + 2B] = _3;
  _4 = (char) data_6(D);
  _5 = ~_4;
  MEM[(char *)buf_7(D) + 3B] = _5;

If it was instead:
  _1 = ~data_6(D);
  _2 = _1 >> 8;
  _3 = (char) _2;
  MEM[(char *)buf_7(D) + 2B] = _3;
  _4 = (char) _1;
  MEM[(char *)buf_7(D) + 3B] = _4;
then it would handle that.  So I think it is a missed optimization in FRE or
whatever else does SCCVN, or something match.pd should handle.

As for:
> void baz (char *buf, unsigned int data)
> {
>   buf[0] = data >> 8;
>   buf[1] = data;
> }
not using movbew, that is something that should be done in the backend.
For the middle-end, we don't have bswap16 and consider {L,R}ROTATE_EXPR by 8
as the canonical 16-bit byte swap.  Please also have a look:
unsigned short
baz (unsigned short *buf)
{
  unsigned short a = buf[0];
  return ((unsigned short) (a >> 8)) | (unsigned short) (a << 8);
}
where we could also emit movbew instead of movw + rolw (if it is actually a
win).  Thus, I think i386.md should provide patterns for combine (or peephole2
if the former doesn't work for some reason) for this.