Bug 87733 - local register variable not honored with earlyclobber
Summary: local register variable not honored with earlyclobber
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: inline-asm (show other bugs)
Version: 9.0
: P3 normal
Target Milestone: 9.0
Assignee: Not yet assigned to anyone
URL:
Keywords: ra
Depends on:
Blocks:
 
Reported: 2018-10-24 13:22 UTC by Alexander Monakov
Modified: 2020-03-16 19:03 UTC (History)
5 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2018-12-06 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Alexander Monakov 2018-10-24 13:22:24 UTC
GCC fails to allocate %rsi for the first operand of the asm at -O2:

int test()
{
    register int r0 asm("rsi");
    asm("# %0 %1" : "=&r"(r0) : "r"(0));
    return r0;
}

test:
        .cfi_startproc
        xorl    %esi, %esi
#APP
        # %eax %esi
#NO_APP
        ret
Comment 1 Segher Boessenkool 2018-12-06 16:26:19 UTC
This is fixed on trunk.  Peter, does that need a backport still?
Comment 2 Peter Bergner 2018-12-06 16:54:10 UTC
None of my recent IRA or LRA patches were back ported and I wasn't planning on it, given the fallout they caused.  Do we know for sure this doesn't work on GCC 8?

Given when Alexander opened this bug, I'm guessing he just got caught in the middle of all of my changes.  If this works on GCC 8, we can probably just close that as already fixed.
Comment 3 Segher Boessenkool 2018-12-06 17:11:01 UTC
It fails on everything at least as far back as GCC 5.  So it is not a
regression.  I'd say we don't backport your changes, given how hard it
is to get it right, and no doubt it will need changes to backport, too.
Comment 4 Peter Bergner 2018-12-06 17:28:42 UTC
Ok, so we can mark this as already fixed then and leave it at that.  That's fine with me.
Comment 5 Segher Boessenkool 2018-12-06 17:36:20 UTC
Fixed on trunk.
Comment 6 Alexander Monakov 2018-12-06 17:57:09 UTC
If this is fixed then the testcase should be added to the testsuite to make sure we catch it if we regress again (this worked before e.g. on gcc-4.1).

Peter, can you please clarify, is this actually fixed (as in, one of your patches addressed this particular issue) or just happens to be not reproducible anymore?
Comment 7 Segher Boessenkool 2018-12-06 18:05:15 UTC
Feel free to send a patch.

Yes it is fixed.
Comment 8 Peter Bergner 2018-12-06 19:08:37 UTC
(In reply to Alexander Monakov from comment #6)
> Peter, can you please clarify, is this actually fixed (as in, one of your
> patches addressed this particular issue) or just happens to be not
> reproducible anymore?

Segher and I had patches to combine and RA respectively that addressed this specific issue.
Comment 9 Kaz Kylheku 2019-03-16 02:13:34 UTC
Can someone kindly point out where this is fixed? What commit(s)?
Comment 10 Rich Felker 2020-03-14 00:11:36 UTC
This is a rather huge bug to have been fixed silently. Could someone who knows the commit that fixed it and information on what versions are affected attach the info to the tracker here? And ideally some information on working around it for older GCCs?

From what I can tell experimenting so far, adding a dummy "0"(r0) constraint, or using + instead of =, makes the problem go away, but potentially has other ill effects from use of an uninitialized object..?
Comment 11 Segher Boessenkool 2020-03-14 08:13:35 UTC
(In reply to Rich Felker from comment #10)
> This is a rather huge bug to have been fixed silently. Could someone who
> knows the commit that fixed it and information on what versions are affected
> attach the info to the tracker here? And ideally some information on working
> around it for older GCCs?
> 
> From what I can tell experimenting so far, adding a dummy "0"(r0)
> constraint, or using + instead of =, makes the problem go away, but
> potentially has other ill effects from use of an uninitialized object..?

This wasn't silent.  There was a whole bunch of commits, all before
this PR was filed.  You can start looking at PR87600 maybe, and from
there follow the trails.

AFAIR all older versions (well, of this decade, anyway) have these
problems, but the patches are much to involved to backport.

You can work around it on older GCC by simply not using a register var
for more than one asm operand, I think?
Comment 12 Rich Felker 2020-03-14 15:09:28 UTC
> You can work around it on older GCC by simply not using a register var
> for more than one asm operand, I think?

Nope. Making a syscall inherently requires binding specific registers for all of the inputs/outputs, unless you want to spill everything to an explicit structure in memory and load them all explicitly in the asm block. So it really is a big deal.

In particular, all mips variants need an earlyclobber constraint for the output register $2 because the old Linux kernel syscall contract was that, when restartable syscalls were interrupted, the syscall number passed in through $2 was lost, and the kernel returned to $pc-8 and expected a userspace instruction to reload $2 with the syscall number from an immediate or another register. If the input to load into $2 were itself passed in $2 (possible without earlyclobber), the reloading would be ineffective and restarting syscalls would execute the wrong syscall.

The original mips port of musl had undocumented and seemingly useless "0"(r2) input constraints that were suppressing this bug, using the input to bind the register where the earlyclobber output failed to do so. After some recent changes broke compatibility with older kernels requiring the above contract, I manually reverted them (due to intervening conflicting diffs) and omitted the seemingly useless constraint, and it broke horribly. Eventually I found this bug searching the tracker. My plan for now is just to add back the "0"(r2) constraint, but since r2 is uninitialized, it's not clear that having it as an input constraint is even well-defined. Is this the best thing to do?
Comment 13 Peter Bergner 2020-03-14 16:39:41 UTC
(In reply to Rich Felker from comment #12)
> > You can work around it on older GCC by simply not using a register var
> > for more than one asm operand, I think?
> 
> Nope. Making a syscall inherently requires binding specific registers for
> all of the inputs/outputs, unless you want to spill everything to an
> explicit structure in memory and load them all explicitly in the asm block.
> So it really is a big deal.

As Segher said, there were a lot of dependent patches that fixed latent bugs that the previous patches had exposed.  My best guess of the patch that fixed this specific problem (ie, LRA breaking the contract of using the user defined/assigned register in inline asm) would be:

commit 2f0b80c7a4ab4254f57ba63de26ebb7896e3742d
Author:     Peter Bergner <bergner@linux.ibm.com>
AuthorDate: Thu Nov 8 22:39:45 2018 +0000
Commit:     Peter Bergner <bergner@gcc.gnu.org>
CommitDate: Thu Nov 8 16:39:45 2018 -0600

    re PR rtl-optimization/87600 (Fix for PRs 86939 and 87479 causes build issues for several targets)
    
    gcc/
            PR rtl-optimization/87600
            * cfgexpand.c (expand_asm_stmt): Catch illegal asm constraint usage.
            * lra-constraints.c (process_alt_operands): Skip illegal hard
            register usage.  Prefer reloading non hard register operands.
    
    gcc/testsuite/
            PR rtl-optimization/87600
            * gcc.dg/pr87600.h: New file.
            * gcc.dg/pr87600-1.c: New test.
            * gcc.dg/pr87600-2.c: Likewise.
    
    From-SVN: r265942


Specifically, the "Prefer reloading non hard register operands." part of the patch.  Previous versions of GCC would sometimes silently spill user defined/assigned hard registers used in inline asm, which is a no-no.
Comment 14 Alexander Monakov 2020-03-14 16:54:05 UTC
Just to clarify, the two testcases added in the quoted commit don't try to catch the issue discussed here: that the operand is passed in a wrong register.
Comment 15 Segher Boessenkool 2020-03-14 17:50:40 UTC
(In reply to Rich Felker from comment #12)
> > You can work around it on older GCC by simply not using a register var
> > for more than one asm operand, I think?
> 
> Nope. Making a syscall inherently requires binding specific registers for
> all of the inputs/outputs, unless you want to spill everything to an
> explicit structure in memory and load them all explicitly in the asm block.
> So it really is a big deal.

I didn't say this very well...  The only issue is using the same hard
register for two different operands.  You don't need to do this for
syscalls (and you do not *need* that *ever*, of course).

Can you post some code that fails?  If you think this is a GCC bug (in
some older branch?) that we should fix, please open a new PR for it.
Comment 16 Rich Felker 2020-03-14 18:06:45 UTC
> I didn't say this very well...  The only issue is using the same hard
> register for two different operands.  You don't need to do this for
> syscalls (and you do not *need* that *ever*, of course).

I hit the bug without using the same hard register for two operands. At least I'm pretty sure it's the same bug because the behavior matches and it's present in 6.3.0 but not 9.2.0.

> Can you post some code that fails?  If you think this is a GCC bug (in
> some older branch?) that we should fix, please open a new PR for it.

Here's the relevant code extracted out of musl:

#define SYSCALL_CLOBBERLIST \
	"$1", "$3", "$11", "$12", "$13", \
	"$14", "$15", "$24", "$25", "hi", "lo", "memory"

long syscall6(long n, long a, long b, long c, long d, long e, long f)
{
	register long r4 __asm__("$4") = a;
	register long r5 __asm__("$5") = b;
	register long r6 __asm__("$6") = c;
	register long r7 __asm__("$7") = d;
	register long r8 __asm__("$8") = e;
	register long r9 __asm__("$9") = f;
	register long r2 __asm__("$2");
	__asm__ __volatile__ (
		"subu $sp,$sp,32 ; sw $8,16($sp) ; sw $9,20($sp) ; "
		"addu $2,$0,%4 ; syscall ;"
		"addu $sp,$sp,32"
		: "=&r"(r2), "+r"(r7), "+r"(r8), "+r"(r9)
		: "ir"(n), "r"(r4), "r"(r5), "r"(r6)
		: SYSCALL_CLOBBERLIST, "$10");
	return r7 && r2>0 ? -r2 : r2;
}

Built with gcc 6.3.0, %4 ends up expanding to $2, violating the earlyclobber, and %0 gets bound to $16 rather than $2 (which is why the violation is allowed, it seems).

With "0"(r2) added to input constraints, the bug goes away.

I don't particularly think this bug is something that needs to be fixed in older branches, especially if doing so is hard, but I do think it's something we need a solid reliable workaround for.
Comment 17 Peter Bergner 2020-03-14 18:52:51 UTC
(In reply to Rich Felker from comment #16)
> long syscall6(long n, long a, long b, long c, long d, long e, long f)
> {
> 	register long r4 __asm__("$4") = a;
> 	register long r5 __asm__("$5") = b;
> 	register long r6 __asm__("$6") = c;
> 	register long r7 __asm__("$7") = d;
> 	register long r8 __asm__("$8") = e;
> 	register long r9 __asm__("$9") = f;
> 	register long r2 __asm__("$2");
> 	__asm__ __volatile__ (
> 		"subu $sp,$sp,32 ; sw $8,16($sp) ; sw $9,20($sp) ; "
> 		"addu $2,$0,%4 ; syscall ;"
> 		"addu $sp,$sp,32"
> 		: "=&r"(r2), "+r"(r7), "+r"(r8), "+r"(r9)
> 		: "ir"(n), "r"(r4), "r"(r5), "r"(r6)
> 		: SYSCALL_CLOBBERLIST, "$10");
> 	return r7 && r2>0 ? -r2 : r2;
> }

This looks like bad inline asm.  You seem to be using $2, $8, $9 and $sp explicitly and not letting the compiler know you are using them.  I think you want to change those to %0, %2 and %3 and adding one for $sp?   I could guess the compiler might ignore your inputs/outputs that you specify if you don't have any %<N> usages for them.
Comment 18 Peter Bergner 2020-03-14 18:58:57 UTC
(In reply to Rich Felker from comment #16)
> long syscall6(long n, long a, long b, long c, long d, long e, long f)
[snip]
> 		: "ir"(n), "r"(r4), "r"(r5), "r"(r6)

...and "n" is an argument register, so why use "ir" for n's constraint?  Shouldn't that just be "r"?  Maybe that is confusing IRA/LRA/reload?
Comment 19 Rich Felker 2020-03-14 19:21:04 UTC
> This looks like bad inline asm.  You seem to be using $2, $8, $9 and $sp explicitly and not letting the compiler know you are using them.

$2, $8, and $9 are all explicitly outputs. All changes to $sp are reversed before the asm ends and there are no memory operands which could be sp-based and thereby invalidated by temp changes to it.

> I think you want to change those to %0, %2 and %3 and adding one for $sp?

All that does it make the code harder to read and more fragile against changes to the order the constraints are written in.

> ...and "n" is an argument register, so why use "ir" for n's constraint? Shouldn't that just be "r"?  Maybe that is confusing IRA/LRA/reload?

The code has been reduced as a standalone example that still reproduced the bug, from a static inline function that was inlined into a function with exactly the same signature. The static inline has a constant n after constant propagation for almost all places it gets inlined, so it "ir" constraint makes sense there. However, removing the "i" does not make the problem go away anyway.
Comment 20 Segher Boessenkool 2020-03-14 19:27:24 UTC
Confirmed with various 7 and 8 (I don't have a mips 6 around).

Don't reopen this bug, it is not necessarily related.  Instead, please
open a new PR.  Thanks!
Comment 21 Alexander Monakov 2020-03-14 19:29:32 UTC
> I could guess the compiler might ignore your inputs/outputs that you specify if you don't have any %<N> usages for them.

Are you seriously suggesting that examples in the GCC manual are invalid and every such usage out there should go and add mentions of referenced registers in the comment in the inline asm template?

https://gcc.gnu.org/onlinedocs/gcc/Local-Register-Variables.html
Comment 22 Rich Felker 2020-03-14 19:42:51 UTC
What should I call the new bug? The description sounds the same as this one, and it's fixed in gcc 9.x, just not earlier versions, so it seems to be the same bug.
Comment 23 Segher Boessenkool 2020-03-14 20:06:35 UTC
It is harder for us to track it in an older bug with many older patches
for it, including stuff that needed fixups later.  And, the "correct"
older bug for it would not be this one, anyway!


Before RA we have asm inputs
                    [
                        (reg/v:SI 196 [ n ])
                        (reg/v:SI 4 $4 [ r4 ])
                        (reg/v:SI 5 $5 [ r5 ])
                        (reg/v:SI 6 $6 [ r6 ])
                        (reg/v:SI 7 $7 [ r7 ])
                        (reg/v:SI 8 $8 [ r8 ])
                        (reg/v:SI 9 $9 [ r9 ])
                    ]
(and $2 an earlyclobber).  But then IRA decides

Disposition:
    0:r195 l0     2    1:r196 l0     2


Peter, do you know which patch fixed this (is that the one you quoted above?),
and if it could feasibly be backported separately?

Only GCC 8 and later branches are still open, fwiw (but some distros still
have older maintained versions).


Rich, forcing "n" to be in "$r10" seems to do the trick?  Is that a reasonable
solution for you?
Comment 24 Rich Felker 2020-03-14 20:18:09 UTC
The reasons I was hesitant to force n to a particular register through an extra register __asm__ temp var was that I was unsure how it would interact with the "i" constraint (maybe prevent it from being used?) and that this is code that needs to be inlined all over the place, and adding more specific-register constraints usually hurts register allocation in all functions where it's used.

If the "0"(r2) input constraint seems unsafe to rely on with r2 being uninitialized (is this a real concern I should have?) just writing 0 or n to r2 before the asm would only waste one instruction and shouldn't really hurt.
Comment 25 Peter Bergner 2020-03-14 20:49:43 UTC
(In reply to Segher Boessenkool from comment #23)
> Before RA we have asm inputs
>                     [
>                         (reg/v:SI 196 [ n ])
>                         (reg/v:SI 4 $4 [ r4 ])
>                         (reg/v:SI 5 $5 [ r5 ])
>                         (reg/v:SI 6 $6 [ r6 ])
>                         (reg/v:SI 7 $7 [ r7 ])
>                         (reg/v:SI 8 $8 [ r8 ])
>                         (reg/v:SI 9 $9 [ r9 ])
>                     ]
> (and $2 an earlyclobber).  But then IRA decides
> 
> Disposition:
>     0:r195 l0     2    1:r196 l0     2

Vlad can verify, but I don't think IRA takes early clobber constraints into consideration when computing conflicts, so I'm not surprised pseudo 196 (ie, n) is assigned hard register $2.  In that case, LRA will notice the illegal constraint usage and will insert reloads. 


> Peter, do you know which patch fixed this (is that the one you quoted
> above?),
> and if it could feasibly be backported separately?

Yes, I think the patch I mentioned above should stop LRA from spilling the hard register assigned by the user in %0 and should instead spill pseudo 196 by reassigning it to another hard register.  Whether it's feasible to backport, maybe?  :-)   The lra-constraints.c hunk applies to gcc-8 (with fuzz), so maybe that's enough to fix this?  Can someone try applying that hunk and does it fix this?
Comment 26 Rich Felker 2020-03-14 22:04:32 UTC
Indeed, I just confirmed that binding the n input to a particular register prevents the "i" part of the "ir" alternative from working.
Comment 27 Rich Felker 2020-03-14 23:36:10 UTC
Also just realized:

> Rich, forcing "n" to be in "$r10" seems to do the trick?  Is that a reasonable
solution for you?

It doesn't even work, because the syscall clobbers basically all call-clobbered registers. Current kernels are preserving at least $25 (t9) and $28 (gp) and the syscall argument registers, so $25 may be usable, but it was deemed not clear in 2012. I'm looking back through musl git history, and this is actually why the "i" alternative was wanted -- in basically all uses, "i" is satisfiable, and avoids needing to setup a stack frame and spill a call-saved register to the stack in order to use it to hold the syscall number to reload on restart.
Comment 28 Rich Felker 2020-03-14 23:40:26 UTC
And it looks like I actually hit this exact bug back in 2012 but misattributed it:

https://git.musl-libc.org/cgit/musl/commit/?id=4221f154ff29ab0d6be1e7beaa5ea2d1731bc58e

I assumed things went haywire from using two separate "r" constraints, rather than "r" and "0", to bind the same register, but it seems the real problem was that the "=&r"(r2) was not binding at all, and the "0"(r2) served to fix that.
Comment 29 Segher Boessenkool 2020-03-15 00:20:57 UTC
(In reply to Rich Felker from comment #27)
> Also just realized:
> 
> > Rich, forcing "n" to be in "$r10" seems to do the trick?  Is that a reasonable
> solution for you?
> 
> It doesn't even work, because the syscall clobbers basically all
> call-clobbered registers. Current kernels are preserving at least $25 (t9)
> and $28 (gp) and the syscall argument registers, so $25 may be usable, but
> it was deemed not clear in 2012. I'm looking back through musl git history,
> and this is actually why the "i" alternative was wanted -- in basically all
> uses, "i" is satisfiable, and avoids needing to setup a stack frame and
> spill a call-saved register to the stack in order to use it to hold the
> syscall number to reload on restart.

You need to make $r10 not a clobber but an inout, of course.  And not
allowing the "i" just costs one more register move, not so bad imo.
So you do have a workaround now.  Of course we should see if this can
actually be fixed instead ;-)
Comment 30 Rich Felker 2020-03-15 00:27:30 UTC
> You need to make $r10 not a clobber but an inout, of course.  And not

That's not a correct constraint, because it's clobbered by the kernel between the first syscall instruction's execution and the second execution of the addu instruction after the kernel returns to restart it. $10 absolutely needs to be a clobber because the kernel clobbers it. The asm block can't use any registers the kernel clobbers.

> allowing the "i" just costs one more register move, not so bad imo.
> So you do have a workaround now.  Of course we should see if this can
> actually be fixed instead ;-)

I don't follow. As long as the "i" gets chosen, the asm inlines nicely. If not, it forces a gratuitous stack frame to spill a non-clobberlisted register to use as the input.

The code has been working for the past 8 years with the "0"(r2) input constraint added, and would clearly be valid if r2 were pre-initialized with something.
Comment 31 Segher Boessenkool 2020-03-15 00:36:22 UTC
An asm clobber just means "may be an output", and no operand will be assigned
a register mentioned in a clobber.  There is no magic.
Comment 32 Segher Boessenkool 2020-03-15 00:47:27 UTC
===
#define SYSCALL_CLOBBERLIST \
        "$1", "$3", "$11", "$12", "$13", \
        "$14", "$15", "$24", "$25", "hi", "lo", "memory"

long syscall6(long n, long a, long b, long c, long d, long e, long f)
{
        register long r4 __asm__("$4") = a;
        register long r5 __asm__("$5") = b;
        register long r6 __asm__("$6") = c;
        register long r7 __asm__("$7") = d;
        register long r8 __asm__("$8") = e;
        register long r9 __asm__("$9") = f;
        register long r10 __asm__("$10") = n;
        register long r2 __asm__("$2");
        __asm__ __volatile__ (
                "subu $sp,$sp,32 ; sw $8,16($sp) ; sw $9,20($sp) ; "
                "addu $2,$0,%4 ; syscall ;"
                "addu $sp,$sp,32   # %0 %1 %2 %3 %4 %5 %6 %7"
                : "=&r"(r2), "+r"(r7), "+r"(r8), "+r"(r9)
                : "ir"(r10), "r"(r4), "r"(r5), "r"(r6)
                : SYSCALL_CLOBBERLIST);
        return r7 && r2>0 ? -r2 : r2;
}

long syscall69(long a, long b, long c, long d, long e, long f) { return syscall6(9, a, b, c, d, e, f); }
===

This is inlined just fine?
Comment 33 Rich Felker 2020-03-15 00:54:05 UTC
> An asm clobber just means "may be an output", and no operand will be assigned
> a register mentioned in a clobber.  There is no magic.

This, plus the compiler cannot assume the value in any of the clobbered registers is preserved across the asm statement.

> This is inlined just fine?

It produces *wrong code* so it doesn't matter if it inlines fine. $10 is modified by the kernel in the event the syscall is restarted, so the wrong value will be loaded on restart.
Comment 34 Segher Boessenkool 2020-03-15 18:29:57 UTC
Oh, your real code is different, and $10 doesn't work for that?  I see.
Comment 35 Rich Felker 2020-03-15 18:44:55 UTC
> Oh, your real code is different, and $10 doesn't work for that?  I see.

No, the real code is exactly that. What you're missing is that the kernel, entered through syscall, has a jump back to the addu after it's clobbered all the registers in the clobberlist if the syscall is interrupted and needs to be restarted.
Comment 36 Segher Boessenkool 2020-03-15 22:55:09 UTC
The kernel jumps back to an instruction *before* the syscall?!?

(I do not want to know about that any more, then, I am sure :-) )
Comment 37 Peter Bergner 2020-03-16 19:03:38 UTC
(In reply to Rich Felker from comment #30)
> The code has been working for the past 8 years with the "0"(r2) input
> constraint added, and would clearly be valid if r2 were pre-initialized with
> something.

I think using "0"(r2) is a valid workaround for older compilers and I can't think of a reason why there should be a problem with it.  Currently, IRA doesn't look at early clobbers when computing conflicts, so "r2" and "n" do not conflict, since "n" is an input operand of the asm and never used again and "r2" is an output operand of the asm (ie, their lifetimes don't overlap...ignoring the early clobber).  Since they don't conflict (in IRA), it's possible they can be assigned the same hard register depending on the phase of the moon, etc., which they are here.  LRA which does look at early clobbers sees the problem and emits reloads to correct it, but with the old unfixed bug, it spills the wrong thing.

Adding that extra input operand will make the pseudo for "r2" conflict with the pseudo for "n", even in IRA.  That will force them to get different hard registers and since "r2" is preassigned by you to $2, "n" will be assigned to something else.  Problem solved.

Now the extra input operand usage will cause the lifetime of "r2" to reach back to the start of the function this is inlined in, so it won't be able to share the same hard register with any other pseudo that is used then (maybe just the basic block it's in?).  You could mitigate that by inserting a simple "r2 = 0;" just before the inline asm, which will truncate its live range.  Yes, it's an extra insn, but it is a simple/inexpensive one.