46479 – "+m" (*regs) : "a" (regs) doesn't use (%eax) for the MEM

Bug 46479 - "+m" (*regs) : "a" (regs) doesn't use (%eax) for the MEM

Summary: "+m" (*regs) : "a" (regs) doesn't use (%eax) for the MEM

Status:	RESOLVED FIXED

Alias:	None

Product:	gcc
Classification:	Unclassified
Component:	inline-asm (show other bugs)
Version:	4.4.6

Importance:	P3 normal
Target Milestone:	4.8.0
Assignee:	Not yet assigned to anyone

URL:
Keywords:	rejects-valid

Depends on:
Blocks:

Reported:	2010-11-15 09:17 UTC by Jakub Jelinek
Modified:	2021-08-12 05:37 UTC (History)
CC List:	7 users (show)

See Also:
Host:
Target:
Build:
Known to work:	4.8.0
Known to fail:	4.7.4
Last reconfirmed:

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Jakub Jelinek 2010-11-15 09:17:31 UTC

/* { dg-do compile { target { { i686-*-* x86_64-*-* } && ilp32 } } } */
/* { dg-options "-Os -fno-omit-frame-pointer" } */

struct S { long c[64]; };
int foo (struct S *regs)
{
  int rc;
  asm volatile ("" : "=a" (rc), "+m" (*regs) : "a" (regs) : "ebx", "ecx", "edx", "esi", "edi", "memory");
  return rc;
}

doesn't compile any longer, starting with 4.4.  Even with smaller register pressure GCC before IRA used to reuse the register holding the address, wasting another one is unnecessary.  If there is just "m" (*regs) as input operand even with IRA the eax register is used as the address for the memory operand.

I guess what is confusing IRA here is that the MEM appears as output operand and the asm clobbers the %eax register (as it sets it to something else) and so thinks it must give the output operand an address that is still valid at the end of the inline asm rather than just start.  But for memory addresses that is not true, unless there is an earlyclobber - it is enough if the address of the output MEM is valid at the beginning of the inline asm.

I'd say this is an important bug, not because in this very high register pressure asm we fail to compile it, but because

int bar (struct S *regs)
{
  int rc;
  asm volatile ("" : "=a" (rc), "+m" (*regs) : "a" (regs));
  return rc;
}

is quite common, the "+m" there is just to tell GCC what side-effects it has, is never used in the asm and it is desirable that it doesn't introduce runtime overhead.

Comment 1 Jeffrey A. Law 2010-11-15 14:34:08 UTC

Isn't the "+m" (*regs) is an in/out operand and doesn't it have to be valid throughout the entire asm and thus its memory address can't be held by %eax because %eax is used elsewhere in the asm as an input and an output?

I'm not aware of a way to handle the second case where we want to show a memory read/write effect, but not consume any resources.   I can see how that would be valuable

Comment 2 Jakub Jelinek 2010-11-15 15:00:37 UTC

Perhaps you're right, if
asm ("movl $0, %0; movl $1, %1" : "=g" (x), "=g" (y))
would be allowed to use register for x and memory for y and use the register chosen for x as address of memory chosen for y, then the above one wouldn't work without early clobbers, eventhough no inputs are consumed after first output is written.
That would mean gccs before 4.0 were buggy.

Now, perhaps we could see if the output operand for the MEM is ever referenced in the asm template, but then we are jumping into the territory of different code generation depending on what is actually appears in the template, not sure if we want to go there (well, we already base inlining decisions etc.on the quess count of insns in the template).

In any case, having a way to express some memory is clobbered without actually forcing its address to be passed to the inline asm might be useful too.

Comment 3 Jeffrey A. Law 2010-11-15 18:17:26 UTC

On 11/15/10 08:07, jakub at gcc dot gnu.org wrote:
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46479
>
> --- Comment #2 from Jakub Jelinek<jakub at gcc dot gnu.org>  2010-11-15 15:00:37 UTC ---
> Perhaps you're right, if
> asm ("movl $0, %0; movl $1, %1" : "=g" (x), "=g" (y))
> would be allowed to use register for x and memory for y and use the register
> chosen for x as address of memory chosen for y, then the above one wouldn't
> work without early clobbers, eventhough no inputs are consumed after first
> output is written.
> That would mean gccs before 4.0 were buggy.
In general, I wouldn't think a register used in an output address would 
be valid in another output.    There may be cases where it works, 
depending on what gets reloaded and the ordering of operands, etc, but 
writing code which depended on allowing the same reg to be used in both 
circumstances, knowing reload would DTRT because of how the asm was 
written would be, umm, bad.


> Now, perhaps we could see if the output operand for the MEM is ever referenced
> in the asm template, but then we are jumping into the territory of different
> code generation depending on what is actually appears in the template, not sure
> if we want to go there (well, we already base inlining decisions etc.on the
> quess count of insns in the template).
I really don't like the idea of peeking in the template.


> In any case, having a way to express some memory is clobbered without actually
> forcing its address to be passed to the inline asm might be useful too.
Yea.  I'm assuming that clobbers still force generation of the address?  
And presumably we can't model the use of a memory location in the 
clobber which might argue we need a "uses" argument to asms...

Jeff

Comment 4 Jakub Jelinek 2010-11-15 19:54:23 UTC

(In reply to comment #3)
> > In any case, having a way to express some memory is clobbered without actually
> > forcing its address to be passed to the inline asm might be useful too.
> Yea.  I'm assuming that clobbers still force generation of the address?

Clobbers are just strings, so they don't force generation of address, but can't express that some particular memory is read or written.
  
> And presumably we can't model the use of a memory location in the 
> clobber which might argue we need a "uses" argument to asms...

Perhaps.

Comment 5 Andrew Pinski 2021-08-12 05:37:21 UTC

LRA fixed this in GCC 4.8.0.