[Bug target/81763] Issues with BMI on 32bit x86 apps on GCC 7.1+

Fri Jan 26 09:00:00 GMT 2018

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81763

--- Comment #40 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
(In reply to Uroš Bizjak from comment #37)
> (In reply to Jakub Jelinek from comment #33)
> 
> > and it should work.  The last case would be right now:
> >   SI:N+1 = SI:N &~ SI:N+2; SI:N+2 = SI:N+1 &~ SI:N+3;
> > and is again wrong, but we could again swap:
> >   SI:N+2 = SI:N+1 &~ SI:N+3; SI:N+1 = SI:N &~ SI:N+2;
> > and all is fine.
> 
> Whoops, it looks that SI:N+2 is clobbered in the swapped case.

You're right.  So the question is if IRA/LRA can ever allow that case where
there is partial overlap with both registers.  I've tried hard to simulate that
case with:
unsigned long long
foo (unsigned long long x, unsigned long long y)
{
  unsigned long long z;
  asm ("" : "+A" (x), "+Q" (y));
  z = x & ~y;
  asm ("" : "+Q" (z) : "a" (0), "b" (0));
  return z;
}
where IRA indeed allocates the used pseudos such that x is in ax:dx, y in cx:bx
and z in dx:cx.  Now, if I try this and testcase with ~x & y instead of x & ~y
with GCC patched with #c36, I get:
        andn    %eax, %ecx, %ecx
        xorl    %eax, %eax
        andn    %edx, %ebx, %ebx
        movl    %ecx, %edx
        movl    %ebx, %ecx
        movl    %eax, %ebx
resp.
        andn    %ecx, %eax, %ecx
        xorl    %eax, %eax
        andn    %ebx, %edx, %ebx
        movl    %ecx, %edx
        movl    %ebx, %ecx
        movl    %eax, %ebx
between the two inline asms, and if I leave just the =r <- (r, r) alternative
and nothing else, LRA ICEs on it (on both variants).  All is with -O2 -m32
-mbmi -mstv -msse2.  So, is there something in LRA that prevents these partial
overlaps?