This is the mail archive of the mailing list for the GCC project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] MIPS: Fix GCC `noreorder' for undefined R5900 short loops

Hi Maciej,

I'm glad to hear from you again!

>  I think that should be a GAS warning really (similarly to macros that 
> expand to multiple instructions in a delay slot) as people ought to be 
> allowed to do what they wish, and then `-Werror' can be used for code 
> quality enforcement (and possibly disabled on a case-by-case basis).

How can one reasonably prevent the bug when compiling a whole Linux
distribution with thousands of packages if GAS merely warns and proceeds
in many cases?

> > [ In theory, GAS could actually insert NOPs prior to the noreorder directive,
> > to make the loop longer that six instructions, but GAS does not have that
> > kind of capability. Another option that GCC prevents is to place a NOP in
> > the delay slot. ]
>  Well, GAS does have that capability, although of course it is not enabled 
> for `noreorder' code.

Hmm? I'm unable to make sense of that, since such NOPs are unaffected by
noreorder. This is what I meant:

loop:	addiu	$5,$5,1
	addiu	$4,$4,1
	lb	$2,-1($5)
	nop			/* These two NOPs would prevent the */
	nop			/* bug while preserving noreorder. */
	.set	noreorder
	.set	nomacro
	bne	$2,$3,loop
	sb	$2,-1($4)
	.set	macro
	.set	reorder

> For generated code I think however that usually it 
> will be cheaper performance-wise if a non-trivial delay-slot instruction 
> is never produced in the affected cases (i.e. a dummy NOP is always used).

I suppose so too. One observation is that R5900 BogoMIPS went down from
584 to 195 when switching from the generic kernel variant

	.set	noreorder
1:	bnez	a0,1b
	addiu	a0,a0,-1
	.set	reorder

that is undefined for R5900 in arch/mips/lib/delay.c, to a special variant

	beqz	a0,2f
1:	addiu	a0,a0,-1
	bnez	a0,1b

for the R5900 where GAS will place a NOP in the branch delay slot. With
loop unrolling BogoMIPS could be inflated again, of course. In general,
unrolling is a bit better for the R5900 than general MIPS. Perhaps GCC
could be informed about that, possibly via profile-guided optimisation.

> > A reasonable fix for GCC is perhaps to update gcc/config/mips/ to not
> > make explicit use of the branch delay slot, as suggested by the patch below?
> > Then GCC will emit
> > 
> > loop:	addiu	$5,$5,1
> > 	addiu	$4,$4,1
> > 	lb	$2,-1($5)
> > 	sb	$2,-1($4)
> > 	bne	$2,$3,loop
> > 
> > that GAS will adjust in the ELF object to
> > 
> >    4:	24a50001 	addiu	a1,a1,1
> >    8:	24840001 	addiu	a0,a0,1
> >    c:	80a2ffff 	lb	v0,-1(a1)
> >   10:	a082ffff 	sb	v0,-1(a0)
> >   14:	1443fffb 	bne	v0,v1,4
> >   18:	00000000 	nop
> > 
> > where a NOP is placed in the delay slot to avoid the bug. For longer loops,
> > this relies on GAS making appropriate use of the delay slot. I'm not sure if
> > GAS is good at that, though?
>  I'm sort-of surprised that GCC has produced `reorder' code here, making 
> it rely on GAS for delay slot scheduling.  Have you used an unusual set of 
> options that prevents GCC from making `noreorder' code, which as I recall 
> is the usual default (not for the MIPS16 mode IIRC, as well as some 
> obscure corner cases)?

I used the suggested patch, and recompiled the kernel with it, and verified
with the machine code tool that there were no undefined loops. Wouldn't that
be enough?

> > diff --git a/gcc/config/mips/ b/gcc/config/mips/
> > index e17b1d522f0..acd31a8960c 100644
> > --- a/gcc/config/mips/
> > +++ b/gcc/config/mips/
> > @@ -1091,6 +1091,7 @@
> >  
> >  (define_delay (and (eq_attr "type" "branch")
> >  		   (not (match_test "TARGET_MIPS16"))
> > +		   (not (match_test "TARGET_FIX_R5900"))
> >  		   (eq_attr "branch_likely" "yes"))
> >    [(eq_attr "can_delay" "yes")
> >     (nil)
> > @@ -1100,6 +1101,7 @@
> >  ;; not annul on false.
> >  (define_delay (and (eq_attr "type" "branch")
> >  		   (not (match_test "TARGET_MIPS16"))
> > +		   (not (match_test "TARGET_FIX_R5900"))
> >  		   (ior (match_test "TARGET_CB_NEVER")
> >  			(and (eq_attr "compact_form" "maybe")
> >  			     (not (match_test "TARGET_CB_ALWAYS")))
> > 
>  I think you need to modify the default `can_delay' attribute definition 
> instead (in the same way).

I tried to limit the case to branch delays only, which is a rough
approximation. Call and jump delay slots are acceptable. Are you referring
to this piece?

;; Can the instruction be put into a delay slot?
(define_attr "can_delay" "no,yes"
  (if_then_else (and (eq_attr "type" "!branch,call,jump")
		     (eq_attr "hazard" "none")
		     (match_test "get_attr_insn_count (insn) == 1"))
		(const_string "yes")
		(const_string "no")))

> An improved future version might determine the 
> exact conditions as noted in your proposed commit description, however I'd 
> suggest making this simple change first.

Learning the exact conditions, in a hardware sense, would be interesting
indeed, as some short loops actually do appear to work despite being labeled
as undefined. A fairly deep understanding of the R5900 implementation is
essential for such an undertaking. :)


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]