Bug 67507 - Code size increase with -Os from GCC 4.8.x to GCC 4.9.x for ARM thumb1
Summary: Code size increase with -Os from GCC 4.8.x to GCC 4.9.x for ARM thumb1
Status: UNCONFIRMED
Alias: None
Product: gcc
Classification: Unclassified
Component: target (show other bugs)
Version: 4.9.4
: P3 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-09-08 21:33 UTC by Fredrik Hederstierna
Modified: 2015-09-14 11:06 UTC (History)
1 user (show)

See Also:
Host:
Target: arm
Build:
Known to work:
Known to fail:
Last reconfirmed:


Attachments
Example code (3.93 KB, application/zip)
2015-09-08 21:33 UTC, Fredrik Hederstierna
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Fredrik Hederstierna 2015-09-08 21:33:43 UTC
Created attachment 36308 [details]
Example code

Starting with GCC 4.9.x the code size increase with arm-eabi thumb for attached example code. It seems related to alignment and is still present in GCC 5.2.0.

Example C Code (see attachment for more and list):
The code does cause possible alignment data abort, but still it should compile consistent and fine assuming user give aligned data. Example 5 gives type-punned warning for all compilations, neither of the other examples gives warnings.

extern void func(int data);
char global_data_unaligned[4];
void test_unaligned_1(void) {
  int *idata = (int*)global_data_unaligned;
  func(*idata);
}

Compiles to GCC 4.8.5 arm-none-eabi cortex-m0  -Os

00000000 <test_unaligned_1>:
   0:   b508            push    {r3, lr}
   2:   4b07            ldr     r3, [pc, #28]   ; (20 <test_unaligned_1+0x20>)
   4:   7858            ldrb    r0, [r3, #1]
   6:   781a            ldrb    r2, [r3, #0]
   8:   0200            lsls    r0, r0, #8
   a:   4310            orrs    r0, r2
   c:   789a            ldrb    r2, [r3, #2]
   e:   78db            ldrb    r3, [r3, #3]
  10:   0412            lsls    r2, r2, #16
  12:   4310            orrs    r0, r2
  14:   061b            lsls    r3, r3, #24
  16:   4318            orrs    r0, r3
  18:   f7ff fffe       bl      0 <func>
  1c:   bd08            pop     {r3, pc}
  1e:   46c0            nop                     ; (mov r8, r8)
  20:   00000000        .word   0x00000000

Compiles GCC 5.2.0 arm-none-eabi cortex-m0 -Os,  +4 bytes

00000000 <test_unaligned_1>:
   0:   b510            push    {r4, lr}
   2:   4c08            ldr     r4, [pc, #32]   ; (24 <test_unaligned_1+0x24>)
   4:   7863            ldrb    r3, [r4, #1]
   6:   7821            ldrb    r1, [r4, #0]
   8:   78a0            ldrb    r0, [r4, #2]
   a:   021b            lsls    r3, r3, #8
   c:   430b            orrs    r3, r1
   e:   0400            lsls    r0, r0, #16
  10:   001a            movs    r2, r3      // ???
  12:   0003            movs    r3, r0      // ???
  14:   78e0            ldrb    r0, [r4, #3]
  16:   4313            orrs    r3, r2
  18:   0600            lsls    r0, r0, #24
  1a:   4318            orrs    r0, r3
  1c:   f7ff fffe       bl      0 <func>
  20:   bd10            pop     {r4, pc}
  22:   46c0            nop                     ; (mov r8, r8)
  24:   00000000        .word   0x00000000

With GCC 4.8.5 arm-none-eabi cortex-m0  -O2, code gets shorter,
no alignment check when compile for speed?

00000000 <test_unaligned_1>:
   0:   b508            push    {r3, lr}
   2:   4b02            ldr     r3, [pc, #8]    ; (c <test_unaligned_1+0xc>)
   4:   6818            ldr     r0, [r3, #0]
   6:   f7ff fffe       bl      0 <func>
   a:   bd08            pop     {r3, pc}
   c:   00000000        .word   0x00000000

------------------------------
  
Example3 compiled with GCC 4.8.5 arm-none-eabi cortex-m0 -Os
  
00000048 <test_unaligned_3>:
  48:   b508            push    {r3, lr}
  4a:   4b03            ldr     r3, [pc, #12]   ; (58 <test_unaligned_3+0x10>)
  4c:   2201            movs    r2, #1
  4e:   4393            bics    r3, r2
  50:   6818            ldr     r0, [r3, #0]
  52:   f7ff fffe       bl      0 <func>
  56:   bd08            pop     {r3, pc}
  58:   00000000        .word   0x00000000

Same code compiled with GCC 5.2.0 arm-none-eabi cortex-m0 -Os

00000028 <test_unaligned_3>:
  28:   2201            movs    r2, #1
  2a:   4b05            ldr     r3, [pc, #20]   ; (40 <test_unaligned_3+0x18>)
  2c:   b510            push    {r4, lr}
  2e:   4393            bics    r3, r2
  30:   8858            ldrh    r0, [r3, #2]    // ?? why ldrh
  32:   881a            ldrh    r2, [r3, #0]    // ?? why ldrh
  34:   0400            lsls    r0, r0, #16
  36:   4310            orrs    r0, r2
  38:   f7ff fffe       bl      0 <func>
  3c:   bd10            pop     {r4, pc}
  3e:   46c0            nop                     ; (mov r8, r8)
  40:   00000000        .word   0x00000000

Seems to be some issue with assumtions on alignment, causing larger code size.
I checked IRA dump for IRA-coloring/build and some examples seems to assign more hardregs with new threads code added in GCC 4.9. Though I haven't digged further into this yet.

Toolchain was build with GNU Build Buddy for arm-none-eabi, softfloat, see scripts at https://github.com/fredrikhederstierna/buildbuddy

/Fredrik
Comment 1 Richard Biener 2015-09-14 11:06:24 UTC
IIRC even though the C standard says the access has to be aligned according to int when GCC knows better (you access char aligned global_data_unaligned) it
will use the smaller alignment.

This is done to be friendlier to "legacy" code.