Created attachment 44731 [details] Minimum Test Case Hi, I've faced a weird behavior on my ARM MCU about a year ago and reported it on the launchpad page. In the meantime I tried to report the issue at here but your register process is longer than expected and here I'm reporting this issue almost a year later. We'd discussed the issue on this page: https://bugs.launchpad.net/gcc-arm-embedded/+bug/1738730 I think this looks like a bug, but with my limited internal GCC knowledge, I don't want to be rude and call it like that directly. If I should summarize the discussion; The GCC versions from 5.2 to 7, the compiler generates unexpected code while handling packed structs. Normally the behavior should be like 4.9 but somehow it got broken since then. Misaligned access should help but this does not seem to be the case. I've attached a minimum example which reproduces the issue. About the seriousness; I can not update my compiler and still use 4.9 for our products. This is a annoying drawback for me because I know GCC has received many improvements since then. The IDE which I use is still deployed with 4.9 and this is unacceptable from my point of view. MCU Core: ARM Cortex M3
with trunk i see below output for the attached case . main: @ args = 0, pretend = 0, frame = 8 @ frame_needed = 0, uses_anonymous_args = 0 @ link register save eliminated. sub sp, sp, #8 movw r3, #1025 str r3, [sp] ldr r0, [sp] ldr r3, .L4 uxtb r0, r0 str sp, [r3] add sp, sp, #8 @ sp needed bx lr $ arm-windriver-linux-gnueabi-gcc -mcpu=cortex-m3 -mthumb '-DDEBUG=1' -O2 -pedantic -Wall -Wextra -fmessage-length=0 -mno-sched-prolog -fno-builtin -fshort-enums test.c -S
Hi Umesh, Could you test it with the following options: -g3 -gdwarf-2 -mcpu=cortex-m3 -mthumb -std=c++1y '-DDEBUG=1' -O0 -pedantic -Wall -Wextra -c -fmessage-length=0 -fno-rtti -fno-exceptions -mno-sched-prolog -fno-builtin -fpack-struct -fshort-enums -ffunction-sections -fdata-sections Thanks in advance.
With -O0 , i see the byte load /store like push {r7} sub sp, sp, #12 add r7, sp, #0 ldr r2, .L3 mov r3, r7 str r3, [r2] ldr r3, .L3 ldr r3, [r3] ldrb r2, [r3] movs r2, #0 orr r2, r2, #1 strb r2, [r3] ldrb r2, [r3, #1] movs r2, #0 orr r2, r2, #4 strb r2, [r3, #1] ldrb r2, [r3, #2] movs r2, #0 strb r2, [r3, #2] ldrb r2, [r3, #3] movs r2, #0 strb r2, [r3, #3] ldr r3, .L3 ldr r3, [r3] ldr r2, [r3] @ unaligned ldr r3, .L3 ldr r3, [r3] uxtb r2, r2 strb r2, [r3, #4] ldr r3, .L3 ldr r3, [r3] ldrb r3, [r3, #4] @ zero_extendqisi2 mov r0, r3 adds r7, r7, #12 mov sp, r7 @ sp needed pop {r7} bx lr but ,why you are compiling with -O0 , any reason such ?
(In reply to Murat Ursavaş from comment #0) > I've faced a weird behavior on my ARM MCU about a year ago and reported it > on the launchpad page. In the meantime I tried to report the issue at here > but your register process is longer than expected and here I'm reporting > this issue almost a year later. Umm, you got a bugzilla account here on Wed, 20 Dec 2017, so any delay since then is not because of our process :-)
It's not clear what behaviour you think is 'proper' for a packed struct with a volatile member. Since packed is a GNU extension, there's nothing in the C (or C++) standards that you can call upon to reach a reasonable expectation, and anything that is left must be down to either the GNU manuals or good luck. In this case I don't think the GNU manuals say anything specific about this case either (since that would imply knowing how each supported architecture handles misaligned memory accesses.
Hi Jonathan, I just wanted a dramatic entrance :) (There was a discussion about GCC bugzilla on reddit recently) Of course it hasn't took that long. But this is like missing a call. You would answer that at that time but if you miss it, it becomes difficult to call back :)
(In reply to Murat Ursavaş from comment #6) > Hi Jonathan, > > I just wanted a dramatic entrance :) (There was a discussion about GCC > bugzilla on reddit recently) Of course it hasn't took that long. But this is > like missing a call. You would answer that at that time but if you miss it, > it becomes difficult to call back :) Unfortunately we had to disable automatic account creation due to spammers abusing it. This is a volunteer project and we can't have admins sitting around just waiting for real people to ask for accounts so that they can respond instantly.
Yes, I read the discussion on reddit, it was annoying. It took 7h18m for your account to be created after you sent an email requesting it. Then it took 9 months for you to use the account. On the other hand, since we started the manual registration process we've had no spam attacks. The process is working IMHO.
Umesh, The reason is step-by-step debugging. I'd like to debug it first with -O0, than pack it with -Os for the release. Otherwise with a low resource MCU, things become messy really fast.
Jonathan, I don't blame any of you and very well aware of the volunteering effort. Please don't get me wrong. It's just me attempted multiple times to open the case but get distracted with something else. Let's concentrate on the case, OK? :) Thanks for your efforts again, I appreciate that and try to make more people appreciate it.
Richard, I don't know about the standards as you are and please accept me as a newbie. The peripheral parameters of the manufacturer library are all defined as volatile structs and accessed with pointers. This is working till 4.9 but not after 5.2. So point me to the right direction, who should fix what and how? I can't fix my software or use a workaround because I need packed structs. I tried many other options but none of them worked. To understand the matter I'd like to ask, the trunk is creating the code Umesh just sent and this doesn't look like I expect. Is the code correct? As far as I know I should see a shorter one with enabled unaligned access, which looks default on Cortex-M3 architecture. Of course I would like to have shorter image but longer image was not helping and assigning wrong values to the variables. Minimum test case has been written to show that.
Richard, Ok I remembered things with reading the old posts on launchpad. The compiler was generating normal code if I use the struct variable directly. But if I use a pointer to access it, it assigns not what I try to assign. Isn't that a compiler bug?
Richard, Also as far as I remember GNU manual was indeed saying something on this case. It was saying that "if the struct is not packed, it would access to members word by word. But if unaligned access is disabled, it would access the variables byte by byte and create longer and slower images. With unaligned access, it should access to the struct members again word by word." I can't show where it's written but I remember it. With the direct access, it is indeed accessing word by word no matter -pack-struct option is. But if I access it indirectly with a pointer, it tries to access byte by byte and still can't assign the values developer want.
(In reply to Murat Ursavaş from comment #13) > Richard, > > Also as far as I remember GNU manual was indeed saying something on this > case. It was saying that "if the struct is not packed, it would access to > members word by word. But if unaligned access is disabled, it would access > the variables byte by byte and create longer and slower images. With > unaligned access, it should access to the struct members again word by word." > > I can't show where it's written but I remember it. > > With the direct access, it is indeed accessing word by word no matter > -pack-struct option is. But if I access it indirectly with a pointer, it > tries to access byte by byte and still can't assign the values developer > want. That's because in the direct case it can see that the object is on the stack and aligned, so the value can be read simply without an unaligned object. Once the object is accessed through the pointer the knowledge of the alignment is lost and accesses have to be done more conservatively.
(In reply to Murat Ursavaş from comment #12) > Richard, > > Ok I remembered things with reading the old posts on launchpad. The compiler > was generating normal code if I use the struct variable directly. But if I > use a pointer to access it, it assigns not what I try to assign. > > Isn't that a compiler bug? I'm still not sure what you're saying is incorrect, based on your testcase. Are you talking about the value returned from the function, or the types of memory access being used? ie, if we lay aside for the moment that this is accessing a device, does the program behave correctly in terms of the values read and written?
OK I understand conservative action and not wait for word by word access. But the resulting value is not 0x401 on the test case, but it should be. In my original case this was effecting a USART peripheral register and it was activating different switches and making the peripheral useless. To make the case worse, if you assign some important physical pins on the same port, this bug can make them work differently. Let's say, it can pull a trigger without intention, because GPIO peripherals simply waits for bit assignments.
(In reply to Murat Ursavaş from comment #16) > OK I understand conservative action and not wait for word by word access. > But the resulting value is not 0x401 on the test case, but it should be. Is not 0x401 at what point? This part of the sequence: ldr r3, .L3 < &testStructPtr ldr r3, [r3] < testStructPtr ldrb r2, [r3] < testStructPtr->one[byte 0] (dead) movs r2, #0 < 0 orr r2, r2, #1 < 1 strb r2, [r3] < testStructPtr->one[byte 0] = 0x01 ldrb r2, [r3, #1] < testStructPtr->one[byte 1] (dead) movs r2, #0 < 0 orr r2, r2, #4 < 4 strb r2, [r3, #1] < testStructPtr->one[byte 1] = 0x04 ldrb r2, [r3, #2] movs r2, #0 strb r2, [r3, #2] < testStructPtr->one[byte 2] = 0x00 ldrb r2, [r3, #3] movs r2, #0 strb r2, [r3, #3] < testStructPtr->one[byte 3] = 0x00 so successive bytes starting at &(testStructPtr->one) contain 0x01, 0x04, 0x00, 0x00 which is the value in the source code.
BTW, are you really trying to say that your hardware has a register that isn't naturally aligned? That's really weird if so. If not, there are much easier ways to handle this sort of stuff.
Hi Richard, This source code had been designed to see word by word access and may create expected results. I'm not sure about that. Let me use latest stable and see what happens. It wasn't plug and play last time but like I said I have to make sure I'm not the root cause. (Looks like I'm the usual suspect at this :) Thanks for verifying that this is the expected code, just longer due to more conservative access style. This could take a while as I'm struggling to find some extra time to pursue these.
By the way, the hardware peripheral registers are aligned to 32bits.
(In reply to Murat Ursavaş from comment #20) > By the way, the hardware peripheral registers are aligned to 32bits. So why don't you define your struct as struct TestStructType { volatile unsigned one; unsigned char two; unsigned short three __attribute__((packed)); }; And get rid of the pragma entirely?
Or #pragma pack(push, 1) struct TestStructType { volatile unsigned one; unsigned char two; unsigned short three; } __attribute__((aligned(32))); #pragma pack(pop)
(In reply to Richard Earnshaw from comment #22) > Or > #pragma pack(push, 1) > > struct TestStructType > { > volatile unsigned one; > unsigned char two; > unsigned short three; > } __attribute__((aligned(32))); Err, that should be aligned(4), the alignment is in bytes! > > #pragma pack(pop)
(In reply to Murat Ursavaş from comment #6) > Hi Jonathan, > > I just wanted a dramatic entrance :) (There was a discussion about GCC > bugzilla on reddit recently) Link to the reddit discussion? I searched and can't seem to find it.
(In reply to Eric Gallager from comment #24) > (In reply to Murat Ursavaş from comment #6) > > Hi Jonathan, > > > > I just wanted a dramatic entrance :) (There was a discussion about GCC > > bugzilla on reddit recently) > > Link to the reddit discussion? I searched and can't seem to find it. I've found it via twitter. Here; https://twitter.com/blelbach/status/1032586866162196481
(In reply to Richard Earnshaw from comment #21) > (In reply to Murat Ursavaş from comment #20) > > By the way, the hardware peripheral registers are aligned to 32bits. > > So why don't you define your struct as > > struct TestStructType > { > volatile unsigned one; > unsigned char two; > unsigned short three __attribute__((packed)); > }; > > And get rid of the pragma entirely? Richard, Some of the structs are not under my control, since they belong to manufacturer libraries. I need pack-1 for some due to storage and communication needs. And I didn't know that I could pack individual struct members. Please correct me if I'm wrong. This structs size is total 10 bytes, one:4, two:4 and three:2, right?
There doesn't seem to be any bug here just discussion belonging to gcc-help.
Hi, I've created the time and added 7.3.1 to my toolchain list. (It is really annoyingly hard to add a new toolchain in my configuration due to a bug in the IDE). Anyway right now I can compare 4.9.3 and 7.3.1 side by side and my application is not working with the 7 series. That is exactly how it started at the beginning. I've cleared some issues that could interfere with this issue but now I can reproduce the issue on my target. I'm not sure this is due to packed structs or not but I've found a difference which should not happen. Please bear with me on this. Here's the disassembly of a problematic part: 4.9.3 121 NVM_SPI->ROUTE = USART_ROUTE_TXPEN | USART_ROUTE_RXPEN | USART_ROUTE_CLKPEN | NVM_SPI_LOCATION; 00029e38: ldr r3,[pc,#0x4c] ; 0x29e84 00029e3a: ldr r2,[r3,#0x54] 00029e3c: movs r2,#0x0 00029e3e: orr r2,r2,#0xb 00029e42: str r2,[r3,#0x54] 7.3.1 121 NVM_SPI->ROUTE = USART_ROUTE_TXPEN | USART_ROUTE_RXPEN | USART_ROUTE_CLKPEN | NVM_SPI_LOCATION; 0000572e: ldr r3,[pc,#0x70] ; 0x579c 00005730: ldrb.w r2,[r3,#0x54] 00005734: movs r2,#0x0 00005736: orr r2,r2,#0xb 0000573a: strb.w r2,[r3,#0x54] 0000573e: ldrb.w r2,[r3,#0x55] 00005742: movs r2,#0x0 00005744: strb.w r2,[r3,#0x55] 00005748: ldrb.w r2,[r3,#0x56] 0000574c: movs r2,#0x0 0000574e: strb.w r2,[r3,#0x56] 00005752: ldrb.w r2,[r3,#0x57] 00005756: movs r2,#0x0 00005758: strb.w r2,[r3,#0x57] 4.9.3 sets the ROUTE register as 0xB correctly. But 7.3.1 sets it as 0x30B. The correct value is 0xB (calculated from the bit values). This maps the USART to the wrong pins and makes the peripheral physically useless and also cripples other pins. Like I said, this may not be a bug, could be my error or vendor libraries but something doesn't look right. Please let me know if you need further info. I may need some guidance to collect more data. P.S: I'm trying to improve GCC, otherwise I'm just fine with 4.9.3.
And just out of curiosity, why the compiler loads zero to the register and then OR's with the value? 00029e3c: movs r2,#0x0 00029e3e: orr r2,r2,#0xb Why doesn't it load directly the necessary value? Like, 00029e3c: movs r2,#0xb I know ARM arch needs load/store mechanism for the RAM but why this additional task for a register?
OK, looks like it is possible like this: ldr r2, =0x0000000b Source: https://stackoverflow.com/questions/38689886/loading-32-bit-values-to-a-register-in-arm-assembly
(In reply to Murat Ursavaş from comment #28) > > Here's the disassembly of a problematic part: > > 4.9.3 > > 121 NVM_SPI->ROUTE = USART_ROUTE_TXPEN | USART_ROUTE_RXPEN | > USART_ROUTE_CLKPEN | NVM_SPI_LOCATION; > 00029e38: ldr r3,[pc,#0x4c] ; 0x29e84 > 00029e3a: ldr r2,[r3,#0x54] > 00029e3c: movs r2,#0x0 > 00029e3e: orr r2,r2,#0xb > 00029e42: str r2,[r3,#0x54] > > 7.3.1 > > 121 NVM_SPI->ROUTE = USART_ROUTE_TXPEN | USART_ROUTE_RXPEN | > USART_ROUTE_CLKPEN | NVM_SPI_LOCATION; > 0000572e: ldr r3,[pc,#0x70] ; 0x579c > 00005730: ldrb.w r2,[r3,#0x54] > 00005734: movs r2,#0x0 > 00005736: orr r2,r2,#0xb > 0000573a: strb.w r2,[r3,#0x54] > 0000573e: ldrb.w r2,[r3,#0x55] > 00005742: movs r2,#0x0 > 00005744: strb.w r2,[r3,#0x55] > 00005748: ldrb.w r2,[r3,#0x56] > 0000574c: movs r2,#0x0 > 0000574e: strb.w r2,[r3,#0x56] > 00005752: ldrb.w r2,[r3,#0x57] > 00005756: movs r2,#0x0 > 00005758: strb.w r2,[r3,#0x57] My limited assembler knowledge says new one is byte by byte access and should set the register correctly, but somehow it's not. Could actual object code be different than what I see in the disassembly? I'll try to verify it via inspecting the code space.
OK, dug down into Thumb-2 reference manual and verified. The code space shows correct instructions. For this line; strb.w r2,[r3,#0x54] It shows; 0xF883 0x2054 If I've read correctly, this is exactly what the disassembly says. I guess from GCC perspective, this is a perfectly valid situation and this ticket should be closed. If you have any additional ideas what could cause this, I'm all ears (Peripheral, Core, GDB). Otherwise, thanks for your time. It was enlightening for me to chase this issue.
One thing though. Would you accept this a regression and get back to 4.9 style? Yes, GCC is doing everything by the book but the result is not perfect (due to other undocumented issues not related to GNU team).
I think I've got what's going on. (I know this case turned to a monologue, but I would like to improve it for the future search references. In ARM architecture we have one simple linear address space for everything, flash, RAM and other hardware like peripherals. This makes many things quite easy. If you would like to setup a USART, you just write some information on this address space and you get what you want, like in this case. But If you are like me, trying to make everything deterministic, you may want to enable packed-structs. No problem with that. GCC takes care of the rest. It can access the RAM unaligned anyway as default. But, one thing can stay under the radar. We see peripheral registers as usual RAM addresses, but they are not. They may have limitations like no unaligned access. In this case with GCC 4.9.3, if I access to the register it uses this: ldr r3,[pc,#0x4c] ldr r2,[r3,#0x54] movs r2,#0x0 orr r2,r2,#0xb str r2,[r3,#0x54] ;<<<< Important instruction This part easily sets a 32bit register and everything works as expected. But after GCC 5+, It uses byte by byte access an uses the instructions below; ldr r3,[pc,#0x70] ldrb.w r2,[r3,#0x54] movs r2,#0x0 orr r2,r2,#0xb strb.w r2,[r3,#0x54] ;<<<< Important instruction ldrb.w r2,[r3,#0x55] movs r2,#0x0 strb.w r2,[r3,#0x55] ldrb.w r2,[r3,#0x56] movs r2,#0x0 strb.w r2,[r3,#0x56] ldrb.w r2,[r3,#0x57] movs r2,#0x0 strb.w r2,[r3,#0x57] There is nothing wrong, if it was a normal RAM location. It would set the register as 0x0000000b. But since this is a peripheral location, and has to be accessed as aligned, it takes just the first strb.w instruction into consideration, and leaves further ones useless. 0 - 7bits are OK, but 8-31 bits are left to decide by entropy. In my case the entropy wants to move the physical pins to a different location. I'm not sure whether this is a GCC regression, or must be taken care by the hardware manufacturer, but this is my conclusion at the end. So what will be my workaround; Project wide packed structs are dangerous, I'll remove it from the project settings and limit it down to necessary structs, leaving others relaxed. This should make the peripheral access aligned. P.S: I'm reopening this record for one final evaluation by the GNU team. From my perspective, this looks like a regression, but it's up to you guys.
I haven't researched the history of this change, but it was probably deliberate. Device memory on processors (a memory classification that implies that the underlying contents may contain memory mapped devices that have side effects) is generally intolerant of misaligned accesses. As such, emitting an unaligned access to it may cause the system to fault. If that was the reason for the change to GCC, it won't be reversed. (The fact that in your example the optimizers discover the alignment and change the code back to an aligned access does not alter this general observation.)