This is the mail archive of the
mailing list for the GCC project.
Re: [PATCH] MIPS: Fix GCC `noreorder' for undefined R5900 short loops
- From: Fredrik Noring <noring at nocrew dot org>
- To: "Maciej W. Rozycki" <macro at linux-mips dot org>
- Cc: Paul Burton <paul dot burton at mips dot com>, Matthew Fortune <mfortune at gmail dot com>, Jürgen Urban <JuergenUrban at gmx dot de>, gcc-patches at gcc dot gnu dot org
- Date: Tue, 23 Jul 2019 18:54:46 +0200
- Subject: Re: [PATCH] MIPS: Fix GCC `noreorder' for undefined R5900 short loops
- References: <20190721171726.GA47580@sx9> <alpine.LFD.email@example.com> <20190722155506.GA2726@sx9> <alpine.LFD.firstname.lastname@example.org>
> > How can one reasonably prevent the bug when compiling a whole Linux
> > distribution with thousands of packages if GAS merely warns and proceeds
> > in many cases?
> I think the tools must not set a policy. By using `.set noreorder' the
> user told the toolchain he knows better and asked to keep its hands away.
> As I say you can use `-Werror' for code auditing. And in most cases
> manually scheduled delay slots in handcoded assembly are a sign of a
> problem with the piece of code affected, regardless of the R5900 erratum.
Well, it seems practical to use unofficial tools and a patched GAS to fix
this assembly bug, then. It's a grave problem for the R5900 and it needs to
be reliably detected.
> > > > [ In theory, GAS could actually insert NOPs prior to the noreorder directive,
> > > > to make the loop longer that six instructions, but GAS does not have that
> > > > kind of capability. Another option that GCC prevents is to place a NOP in
> > > > the delay slot. ]
> > >
> > > Well, GAS does have that capability, although of course it is not enabled
> > > for `noreorder' code.
> > Hmm? I'm unable to make sense of that, since such NOPs are unaffected by
> > noreorder. This is what I meant:
> > loop: addiu $5,$5,1
> > addiu $4,$4,1
> > lb $2,-1($5)
> > nop /* These two NOPs would prevent the */
> > nop /* bug while preserving noreorder. */
> > .set noreorder
> > .set nomacro
> > bne $2,$3,loop
> > sb $2,-1($4)
> > .set macro
> > .set reorder
> See `nops_for_insn'. However again, as I noted, `.set noreorder' tells
> GAS not to analyse the dependencies for code within. There is no need to
> schedule this delay slot manually, and if this is generated code, then the
> producer (GCC) should have taken care of the hazards, be it architectural
> or errata. If this is manually written code, then the author asked for
I'm using the principle above to unobtrusively adjust problematic kernel
code, via a short_loop_war macro. Here is one patched instance:
@@ -13,6 +13,7 @@
@@ -29,6 +30,7 @@ start:
PTR_LA a0, _edata
PTR_LA a2, _end
1: sw zero, 0(a0)
bne a2, a0, 1b
addiu a0, a0, 4
@@ -48,6 +50,7 @@ start:
The short_loop_war macro is designed to be placed just before the branch,
and it inserts NOPs as necessary for the R5900:
.macro short_loop_war loop_target
.set instruction_count,2 + (. - \loop_target) / 4
.ifgt 7 - instruction_count
.rept 7 - instruction_count
.macro short_loop_war loop_target
> Well, BogoMIPS is just an arbitrary number.
So presumably the noreorder BogoMIPS variant can be replaced with a
single reorder variant that works with all MIPS implementations?
> There is a data dependency here, so manual delay slot scheduling was
> unavoidable. I'd suggest using this as a functional equivalent for the
> addiu a0,a0,1
> 1: addiu a0,a0,-1
> bnez a0,1b
> and avoiding the irregularity for a0==0.
> > I used the suggested patch, and recompiled the kernel with it, and verified
> > with the machine code tool that there were no undefined loops. Wouldn't that
> > be enough?
> It would break if GAS was invoked without `-mfix-r5900' (e.g. by using an
> explicit `-Wa,-mno-fix-r5900' option).
I see, hmm.
> > I tried to limit the case to branch delays only, which is a rough
> > approximation. Call and jump delay slots are acceptable. Are you referring
> > to this piece?
> > ;; Can the instruction be put into a delay slot?
> > (define_attr "can_delay" "no,yes"
> > (if_then_else (and (eq_attr "type" "!branch,call,jump")
> > (eq_attr "hazard" "none")
> > (match_test "get_attr_insn_count (insn) == 1"))
> > (const_string "yes")
> > (const_string "no")))
> Yes. My suggestion does not preclude limiting the workaround to branches
> only while being precise about what the situation is, i.e. branches still
> require a delay slot instruction (unlike MIPS16 or uMIPS compact branches)
> but no real instruction is suitable. I'd prefer if GCC actually scheduled
> the required dummy NOP instruction explicitly.