User account creation filtered due to spam.

Bug 45813 - [4.5 Regression] alias analysis problem with -mthumb
Summary: [4.5 Regression] alias analysis problem with -mthumb
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: rtl-optimization (show other bugs)
Version: 4.4.4
: P2 normal
Target Milestone: 4.6.0
Assignee: Not yet assigned to anyone
URL:
Keywords: alias, wrong-code
Depends on:
Blocks:
 
Reported: 2010-09-28 01:50 UTC by Darren Jenkins
Modified: 2012-07-02 10:46 UTC (History)
1 user (show)

See Also:
Host:
Target: arm-elf, arm-linux-gnueabi
Build:
Known to work: 4.3.5, 4.6.0
Known to fail:
Last reconfirmed: 2010-09-30 09:56:46


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Darren Jenkins 2010-09-28 01:50:31 UTC
I think I am having some wrong code generated due to a compiler bug.
I am compiling for ARM/Thumb.

USB_INT16U  ReadLE16U ( volatile USB_INT08U  *pmem )
{
    USB_INT16U  val;
    USB_INT08U *bytes = (USB_INT08U *)&val;

    bytes[0] = pmem[0];
    bytes[1] = pmem[1];

    return (val);
}

gives me

    B580        push {r7, lr}
    B081        sub sp, #4
    AF00        add r7, sp, #0
    1CBB        adds r3, r7, #2
    7802        ldrb r2, [r0, #0]
    701A        strb r2, [r3, #0]
    7842        ldrb r2, [r0, #1]
    8818        ldrh r0, [r3, #0]
    46BD        mov sp, r7
    B001        add sp, #4
    BC80        pop {r7}
    BC02        pop {r1}
    4708        bx r1

Which reads the second byte into r2 but does not store it on the stack, so the second byte of val is returned uninitialized.
It looks like the alias analysis knows that bytes points to val but doesn't know that bytes+1 points to a part of val also, so optimizes it away.

Yell out if you need any more information
Comment 1 Richard Biener 2010-09-28 09:17:33 UTC
I can't reproduce this on x86_64-linux with GCC 4.4.4 and -O2 (what options
did you use?).  A cross to arm I still have around for GCC 4.5.0 also doesn't
show the problem.

I used

unsigned short ReadLE16U ( volatile unsigned char *pmem)
{
  unsigned short val;
  unsigned char *bytes = (unsigned char *)&val;
  bytes[0] = pmem[0];
  bytes[1] = pmem[1];
  return val;
}

if that is not enough to reproduce the problem please provide
complete preprocessed source that can be compiled.
Comment 2 Mikael Pettersson 2010-09-28 11:24:18 UTC
I can't reproduce this with a freshly-built vanilla 4.4.4 for arm-linux-gnueabi.
Please tell us (a) exactly how your compiler was configured, and (b) exactly how you invoked it on the test case.
Comment 3 Darren Jenkins 2010-09-28 23:07:08 UTC
unsigned short ReadLE16U( volatile unsigned char * ptr )
{
    unsigned short value;
    unsigned char * bytes = (unsigned char *)&value;

    bytes[0] = ptr[0];
    bytes[1] = ptr[1];

    return value;
}

Gives me the same erroneous results.

The compiler I am using is part of Rowley "CrossWorks for ARM" 2.0.7 which claims to be an unmodified GCC 4.4.4 http://www.rowley.co.uk/crossworks/gpl_sources.htm

cc1 --version
GNU C (GCC) version 4.4.4 (arm-unknown-elf)
        compiled by GNU C version 3.4.4 (mingw special), GMP version 4.3.2
 version 2.4.2.
GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072

What gets passed to the compiler seems to be :

"C:/Program Files (x86)/Rowley Associates Limited/CrossWorks for ARM 2.0/gcc/bin/cc1" -fmessage-length=0 -mcpu=arm7tdmi-s -mthumb -mthumb-interwork -mlittle-endian -mfpu=vfp -mfloat-abi=soft -nostdinc "-isystemC:/Program Files (x86)/Rowley Associates Limited/CrossWorks for ARM 2.0/include" "-isystemC:/Users/DarrenJenkins/AppData/Local/Rowley Associates Limited/CrossWorks for ARM/packages/include" -I. -I../.. -I../../include -I../../system -I../../LwIP -I../../LwIP/include -I../../LwIP/include/lwip -I../../LwIP/include/ipv4 -I../../NXP_Lib -I../../fat_file_system -I../../usb -I../../usb/UsbHost/Include -I../../Macro -D__ARM_ARCH_4T__ -D__CROSSWORKS_ARM -D__CROSSWORKS_MAJOR_VERSION=2 -D__CROSSWORKS_MINOR_VERSION=0 -D__CROSSWORKS_REVISION=7 -D__TARGET_PROCESSOR=LPC2468 -DNESTED_INTERRUPTS -DSRAM_EXCEPTIONS -D__THUMB -D__FLASH_BUILD -DOSCILLATOR_CLOCK_FREQUENCY=12000000 -DDEBUG -MD "THUMB Flash Debug/usbhost_lpc2468.d" -MQ "THUMB Flash Debug/usbhost_lpc2468.o" -quiet -Wall -fno-omit-frame-pointer -fno-schedule-insns2 -gdwarf-2 -Os -fno-dwarf2-cfi-asm -fno-builtin -ffunction-sections -fdata-sections C:/Users/DarrenJenkins/Documents/NXP_darren/src/projects/Darren/../../usb/UsbHost/Host/usbhost_lpc2468.c -o "C:/Users/DarrenJenkins/Documents/NXP_darren/src/projects/Darren/THUMB Flash Debug/usbhost_lpc2468.asm"

and 

"C:/Program Files (x86)/Rowley Associates Limited/CrossWorks for ARM 2.0/gcc/bin/as" --traditional-format -mcpu=arm7tdmi-s -mthumb -mthumb-interwork -EL -mfpu=vfp -mfloat-abi=soft "C:/Users/DarrenJenkins/Documents/NXP_darren/src/projects/Darren/THUMB Flash Debug/usbhost_lpc2468.asm" -o "C:/Users/DarrenJenkins/Documents/NXP_darren/src/projects/Darren/THUMB Flash Debug/usbhost_lpc2468.o"


Yell out if you would like to know anything else.
Comment 4 Mikael Pettersson 2010-09-28 23:21:27 UTC
Please show us the output of "gcc -v", we want to see how the compiler was configured (wrt certain defaults etc).  Also, when you invoke the compiler we want to see the exact options YOU supply to it; right now it's difficult to separate your options from what the compiler driver passes to the backend.
Comment 5 Darren Jenkins 2010-09-29 03:33:27 UTC
> Please show us the output of "gcc -v", we want to see how the compiler was
> configured (wrt certain defaults etc).  Also, when you invoke the compiler we
> want to see the exact options YOU supply to it; right now it's difficult to
> separate your options from what the compiler driver passes to the backend.

This is a little difficult as the compiler is part of a whole "development/build environment" thing, and I can't find an executable called "gcc" in it.

I will investigate how it is put together and see what I can find out.
Comment 6 Darren Jenkins 2010-09-29 03:55:39 UTC
Also if I don't use -fno-omit-frame-pointer the code seems to be generated correctly
Comment 7 Darren Jenkins 2010-09-29 07:46:54 UTC
OK CrossWorks don't use/ship the GCC compiler driver.
So I guess all the usefull information I can give is from the c compiler:

cc1 --version
GNU C (GCC) version 4.4.4 (arm-unknown-elf)
        compiled by GNU C version 3.4.4 (mingw special), GMP version 4.3.2
 version 2.4.2.
GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072
Comment 8 Richard Biener 2010-09-29 09:14:46 UTC
Should be enough information for arm people to confirm.  I still suggest
to report the problem to Rowley CrossWorks instead.
Comment 9 Mikael Pettersson 2010-09-29 19:41:08 UTC
I can reproduce it with crosses to arm-elf and arm-linux-gnueabi.  The combination of -mthumb (1 not 2) and -fno-omit-frame-pointer is the trigger. GCC 4.3.5 works, 4.4 and 4.5 generate broken code, 4.6 works.

gcc-4.3.5 -march=armv5te -mthumb -fno-omit-frame-pointer -Os generates:

ReadLE16U:
        push    {r7, lr}
        ldrb    r3, [r0]
        sub     sp, sp, #8
        add     r7, sp, #0
        add     r2, r7, #6
        strb    r3, [r2]
        ldrb    r3, [r0, #1]
        mov     sp, r7
        add     sp, sp, #8
        strb    r3, [r2, #1]
        ldrh    r0, [r2]
        @ sp needed for prologue
        pop     {r7, pc}

The second byte is correctly stored before the short is read and returned.

gcc-4.4-20100928 and gcc-4.5-20100923 generate:

ReadLE16U:
        push    {r7, lr}
        ldrb    r2, [r0]
        sub     sp, sp, #8
        add     r7, sp, #0
        add     r3, r7, #6
        mov     sp, r7
        strb    r2, [r3]
        add     sp, sp, #8
        ldrb    r2, [r0, #1]
        @ sp needed for prologue
        ldrh    r0, [r3]
        pop     {r7, pc}

The second byte is read but not written before the short is read.

gcc-4.6-20100925 generates:

ReadLE16U:
        push    {r0, r1, r7, lr}
        ldrb    r3, [r0]
        add     r7, sp, #0
        mov     sp, r7
        strb    r3, [r7, #6]
        ldrb    r3, [r0, #1]
        @ sp needed for prologue
        strb    r3, [r7, #7]
        ldrh    r0, [r7, #6]
        pop     {r1, r2, r7, pc}

This is correct.
Comment 10 Mikael Pettersson 2010-09-29 20:59:17 UTC
This was fixed for 4.6 by r160260:
http://gcc.gnu.org/viewcvs?view=revision&revision=160260

Applying that to 4.5 changes 4.5's output to:

ReadLE16U:
        push    {r7, lr}
        ldrb    r3, [r0]
        sub     sp, sp, #8
        add     r7, sp, #0
        strb    r3, [r7, #6]
        ldrb    r3, [r0, #1]
        mov     sp, r7
        add     sp, sp, #8
        strb    r3, [r7, #7]
        ldrh    r0, [r7, #6]
        @ sp needed for prologue
        pop     {r7, pc}

which is much better.  (It still runs into PR38644, but that can be avoided by -fno-schedule-insns2 or by applying my proposed patch for PR38644.)
Comment 11 Richard Biener 2010-09-30 09:56:46 UTC
Thus, confirmed.
Comment 12 Richard Biener 2011-04-19 09:06:03 UTC
The quoted revision fixing this for 4.6 doesn't look like a wrong-code fix
but a cost adjustment.
Comment 13 Jakub Jelinek 2012-03-13 12:47:13 UTC
4.4 branch is being closed, moving to 4.5.4 target.
Comment 14 Bernd Schmidt 2012-03-13 14:47:02 UTC
This really just needs -fno-strict-aliasing, doesn't it? Removing myself from Cc.
Comment 15 Mikael Pettersson 2012-03-15 08:10:55 UTC
-fno-strict-aliasing makes no difference to the code generated by 4.5-20120308.  Replacing the type-punning with a union results in both correct-looking and much tighter code:

unsigned short ReadLE16U_union( volatile unsigned char * ptr )
{
    union {
        unsigned short value;
        unsigned char bytes[2];
    } u;

    u.bytes[0] = ptr[0];
    u.bytes[1] = ptr[1];

    return u.value;
}

ReadLE16U_union:
        push    {r7, lr}
        ldrb    r3, [r0]
        ldrb    r0, [r0, #1]
        add     r7, sp, #0
        mov     sp, r7
        lsl     r0, r0, #8
        orr     r0, r0, r3
        @ sp needed for prologue
        pop     {r7, pc}
        .size   ReadLE16U_union, .-ReadLE16U_union
        .ident  "GCC: (GNU) 4.5.4 20120308 (prerelease)"
Comment 16 Richard Biener 2012-07-02 10:46:12 UTC
Fixed in 4.6.0, the 4.5 branch is being closed.