82150 – Produces a branch prefetch which causes a hang

Bug 82150 - Produces a branch prefetch which causes a hang

Summary: Produces a branch prefetch which causes a hang

Status:	RESOLVED WONTFIX

Alias:	None

Product:	gcc
Classification:	Unclassified
Component:	target (show other bugs)
Version:	10.2.0

Importance:	P3 normal
Target Milestone:	---
Assignee:	Not yet assigned to anyone

URL:
Keywords:

Depends on:
Blocks:

Reported:	2017-09-08 17:14 UTC by david.welch
Modified:	2021-01-29 12:24 UTC (History)
CC List:	1 user (show)

See Also:
Host:
Target:	arm-none-eabi
Build:
Known to work:
Known to fail:
Last reconfirmed:	2021-01-29 00:00:00

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description david.welch 2017-09-08 17:14:33 UTC

export TARGET=arm-none-eabi

../gcc-$GCCVER/configure --target=$TARGET --prefix=$PREFIX --without-headers --with-newlib  --with-gnu-as --with-gnu-ld --enable-languages='c'

but we first found this on a 4.8.3, dont have a reason to assume it applies to all versions.

take something like this

unsigned int more_fun ( unsigned int );
unsigned int fun ( void )
{
    return(more_fun(0x12344700)+1);
}

arm-none-eabi-gcc -mthumb -march=armv6 -O2 -c so.c -o so.o

00000000 <fun>:
   0:	b510      	push	{r4, lr}
   2:	4802      	ldr	r0, [pc, #8]	; (c <fun+0xc>)
   4:	f7ff fffe 	bl	0 <more_fun>
   8:	3001      	adds	r0, #1
   a:	bd10      	pop	{r4, pc}
   c:	12344700 	eorsne	r4, r4, #0, 14

And there is the problem.

This is not limited to thumb mode

unsigned int more_fun ( unsigned int );
unsigned int fun ( void )
{
    return(more_fun(0xe12fff10)+1);
}

00000000 <fun>:
   0:	e92d4010 	push	{r4, lr}
   4:	e59f0008 	ldr	r0, [pc, #8]	; 14 <fun+0x14>
   8:	ebfffffe 	bl	0 <more_fun>
   c:	e2800001 	add	r0, r0, #1
  10:	e8bd8010 	pop	{r4, pc}
  14:	e12fff10 	bx	r0

same problem.

Found on an arm11 mpcore but assume that the older arm11s and possibly even armv7s have this issue, will see when I get there.

The core does not see pop pc as an unconditional branch it continues to process the instructions in the pipe while the pop is finishing, it prefetches the address in r0 in both of the above cases, because the DATA that follows the pop happens to resemble an instruction, specifically bx but I wonder if other instructions are a problem as well.  The prefetch reads the fetch line at whatever address that register that happens to be encoded.  This can cause a read of perpherals which are clear on read, or pull a byte out of a uart, or in our case touch an address that doesnt answer on the axi bus and hang the processor.

Now because the armv4t didnt support mode switching with a pop using -march=armv4t produces code that doesnt cause the processor to fail.

00000000 <fun>:
   0:	b510      	push	{r4, lr}
   2:	4803      	ldr	r0, [pc, #12]	; (10 <fun+0x10>)
   4:	f7ff fffe 	bl	0 <more_fun>
   8:	3001      	adds	r0, #1
   a:	bc10      	pop	{r4}
   c:	bc02      	pop	{r1}
   e:	4708      	bx	r1
  10:	12344700 	eorsne	r4, r4, #0, 14

I cant possibly be the first person to see this after all of these years 
(and although I cant think off hand of another instruction set where the pc is also treated like a GPR, there are other targets that are affected), so I am hoping there is already a command line switch other than downgrading wholesale to armvt.  If not can we add a command line switch to avoid this problem?  I would think a branch to self instruction following the pop would work or like armv4t dont pop into the pc but in arm pop to lr and then bx lr or thumb as you do in armv4t pop to r0-r3 and bx to that.

Comment 1 Andrew Pinski 2017-09-08 17:31:05 UTC

This sounds like an errata in the core you are using.  There seems like the best way to fix this is via an option which worksaround this errata if there is not one already.

Comment 2 david.welch 2017-09-08 17:44:01 UTC

ARM does not have an errata on this for this core from what I was given.  Dont know why they would, at best it would fall into the "unpredictable results" category.  Errata or not was hoping there could be an option if not one already.  the armv4t one is an option but affects more than just this one thing I would assume but dont know gcc internals, so to big of a hammer.

Comment 3 david.welch 2017-09-12 11:38:13 UTC

The problem exists as well with ldr pc,[something].  I have not dug through gcc but did some compilation experiments, not nearly enough to be 100% sure, but for switch statements the code generated always appears to do a comparison (perhaps after a subtract or other modification, an ldrls pc,[], then an unconditional branch to deal with the last item (or a default).  If that is always the rule that is safe.  And for a function table, an array of function pointers, it did the math using gprs and then a mov lr,pc ; bx rn.

an 

ldr pc,[]
literal pool data

will cause this undesired prefetch.

Comment 4 mgretton 2017-09-19 15:01:38 UTC

As you suggest in your original comment this hang could be coming from the instruction pre-fetch going to some place in memory that is mapped (and executable) but the memory system is not giving a response to memory accesses to that location.

As a general point all read sensitive devices must be marked as XN to prevent speculative corruption of those read sensitive devices by instruction fetch (this is true on future versions of the architecture as well).

Can you ensure that the XN-bit is set on memory pages mapped to read-sensitive devices?

(XN description ARM11 MP Core TRM: http://infocenter.arm.com/help/topic/com.arm.doc.ddi0360f/CACHFICI.html)

Comment 5 david.welch 2017-09-19 18:50:26 UTC

it is definitely doing prefetching by not realizing those instructions are unconditional branches.  most likely going with strongly ordered rather than the XN bit but noted as a workaround.  Since the armv4t does not support the pop pc and there are runtime flags, wanted to first know what options are there or would they have to be added.  What other cores have been reported as having this issue, where there any compiler additions made for them?

Comment 6 Ramana Radhakrishnan 2017-09-20 09:28:21 UTC

To answer the question open, no options have been added to "avoid" this behaviour.

The code generated by the compiler is as per the architecture specification, there's nothing wrong here and thus this is not a valid report. In order to prevent such speculation by the ARM11 MPCore, one needs to set the XN bit as Matt referred to above.

Look also at the execute-never bit in the ARM-ARM - ARMv7-AR Section B3.7.2 (Issue C - DDI0406C, page B3-1351) where it covers what happens with Speculation and no-execute, and thus given that's in the architecture implementations have to follow that.

 
regards
Ramana

Comment 7 david.welch 2017-09-20 11:31:47 UTC

This is an armv6 not an armv7.

So far I have not seen that the mmu or cache or branch prediction is required for proper operation of the core.  I have so far not see this on other cores, but still working on that it is very much present on this core.  I would rather not have to use the mmu as a kludge.

Comment 8 david.welch 2017-09-20 11:36:27 UTC

gcc is treating these instructions as unconditional branches, but the core does NOT treat these instructions as unconditional branches.  The disconnect is quite clear between the code produced and the core behavior, kludges and workarounds are interesting, but the volume of other similar situations that gcc has responded to in its code generation is confusing here.  Why generate code that works for the core in one case but not in another.  Can you please elaborate?

Comment 9 david.welch 2017-09-20 17:07:10 UTC

Basically gcc is generating a sequence where data starts to execute in the pipe.  I cant imagine that is a good idea to let the processor execute data when you can avoid it

instead of a pop {...pc} ; some data  a pop { ... lr} ; bx lr creates a data hazard, the bx doesnt execute until the register change has resolved.  Other cores might not execute the words after a pop in the pipeline if pc is one of the popped values but this core does.  Patching this instruction sequence after the execution has started is just a kludge.

Comment 10 david.welch 2017-10-10 14:59:56 UTC

How do I get some feedback on this?  Do I need to create a new ticket?  This is not about a system hang, this is about GCC output that causes data to be executed as code in the pipeline.  Was detected through a hang, but perfectly valid address spaces are affected.  Quite clearly a gcc bug.  The root cause is GCC is feeding data into the pipeline to be executed.  Just because ARM didnt publish it doesnt mean their core is without other undocumented problems.  The MMU is too late the data has started to execute, so that at best is a hack, not a solution.

Comment 11 david.welch 2021-01-26 14:58:55 UTC

I wish I had know this when I filed this ticket, there is an ARM Errata for this issue that was issued before or in 2009.  

720247: Speculative Instruction fetches can be made anywhere in the memory map

I have researched this bug on this core and provided a workaround that ARM was not able or willing.  (put a nop after unconditional branch instructions that modify the pc like pop {r4,pc}, but not bx lr...,anything other than another branch instruction that causes a speculative fetch).

So if you require an ARM Errata in order to fix something, there you go it exists.

It is still present in gcc 10 (has been present all this time).  I have not examined gcc 11 yet as it has not been formally released.

unsigned int more_fun ( unsigned int );
unsigned int fun ( void )
{
    return(more_fun(0x12344700)+1);
}

Disassembly of section .text:

00000000 <fun>:
   0:	b510      	push	{r4, lr}
   2:	4802      	ldr	r0, [pc, #8]	; (c <fun+0xc>)
   4:	f7ff fffe 	bl	0 <more_fun>
   8:	3001      	adds	r0, #1
   a:	bd10      	pop	{r4, pc}
   c:	12344700 	.word	0x12344700


.thumb
.inst.n 0x4700


Disassembly of section .text:

00000000 <.text>:
   0:	4700      	bx	r0

and there is the speculative execution that causes a read (that can be anywhere in the address space)


arm-none-eabi-gcc --version
arm-none-eabi-gcc (GCC) 10.2.0
Copyright (C) 2020 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

One could examine everything after a branch like this for another branch as a real instruction or embedded in the top of the pool a nop may be simpler after each of the at-risk instructions.

Comment 12 david.welch 2021-01-26 15:02:29 UTC

I my case this was found with a hang, but the problem exists as a read, which means it can cause a read to a read sensitive peripheral causing adverse affects.

Comment 13 david.welch 2021-01-26 21:06:52 UTC

Very sorry it has been years since I did this research, a simple nop wont fix it but a branch to self will.

bad

TEST:
	push {r4,lr}
	pop {r4,pc}
	bx r0 /*.hword 0x4700*/
	nop
	nop

bad

TEST:
	push {r4,lr}
	pop {r4,pc}
        nop
	bx r0 /*.hword 0x4700*/
	nop
	nop


good

TEST:
	push {r4,lr}
	pop {r4,pc}
	b .
	bx r0 /*.hword 0x4700*/
	nop
	nop

Comment 14 Richard Earnshaw 2021-01-28 18:44:30 UTC

The code generated is architecturally correct.  If your core is prefetching from addresses that are not valid then this is indicative that the MMU is incorrectly configured for your system.  Prefetches will NOT be attempted from unmapped pages, or pages that are mapped as device memory.

So you need to find out why your memory system has not been correctly set up.

There's no bug in GCC here.

Comment 15 david.welch 2021-01-28 20:12:59 UTC

Please read the errata and not blow off this ticket.

The MMU is not being used, this is a verified problem, acknowledge by ARM as well as being independently discovered.  The problem has been present and known by ARM for years, as well as being reported a while ago to gnu/gcc.

Use the mmu is not a valid solution to fix a known, demonstrable, bug in the compiler.

Comment 16 Richard Earnshaw 2021-01-29 11:47:34 UTC

What's the erratum number?

Comment 17 Richard Earnshaw 2021-01-29 12:24:57 UTC

OK, I've found the erratum number, it's 720247; but it's specific to the 11MPcore and is fixed in r2p1 silicon.

The erratum workaround description states that there is *no* robust software fix when the MMU is disabled.  Some hardware fixes on your platform might be possible, but are off-topic here.  Also, using r2p1 silicon fixes the problem.

Either way, there's nothing we can do in GCC to address this, given the nature of the problem.