Bug 42895 - Low registers are preferred than register ip in thumb2 mode
Summary: Low registers are preferred than register ip in thumb2 mode
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: target (show other bugs)
Version: 4.5.0
: P3 enhancement
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords: missed-optimization
Depends on:
Blocks: 16996
  Show dependency treegraph
 
Reported: 2010-01-29 00:13 UTC by Carrot
Modified: 2010-06-11 22:36 UTC (History)
2 users (show)

See Also:
Host:
Target: arm-eabi
Build:
Known to work:
Known to fail:
Last reconfirmed: 2010-01-29 07:53:09


Attachments
test case (285 bytes, text/x-csrc)
2010-01-29 00:14 UTC, Carrot
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Carrot 2010-01-29 00:13:30 UTC
Compile the attached source code with options -march=armv7-a -mthumb -Os, gcc generates:

        ldr     r3, [r0, #0]
        push    {r4, r5, lr}
        ldrb    r0, [r3, #0]    @ zero_extendqisi2
        cmp     r0, #127
        bgt     .L2
        ldrb    ip, [r3, #1]    @ zero_extendqisi2
        adds    r4, r3, #1
        sbfx    r0, r0, #0, #7
        cmp     ip, #127
        ble     .L2
        add     ip, r4, #1
        ldrb    r4, [r4, #1]    @ zero_extendqisi2
        and     r5, r4, #127
        cmp     r4, #127
        orr     r0, r0, r5, lsl #14
        ble     .L2
        add     r4, ip, #1
        ldrb    ip, [ip, #1]    @ zero_extendqisi2
        cmp     ip, #127
        and     r5, ip, #127
        it      gt
        ldrbgt  ip, [r4, #1]    @ zero_extendqisi2
        orr     r0, r0, r5, lsl #21
        it      gt
        orrgt   r0, r0, ip, lsl #28
.L2:
        cbz     r1, .L3
        cmp     r3, r1
        bls     .L3
        movs    r3, #0
        str     r3, [r2, #0]
.L3:
        pop     {r4, r5, pc}

Register ip is used in many instructions, usually these instructions are 32 bit. If we use a low register(r0 - r7), many of these instructions can be 16 bit. In this code snippet r6 and r7 are available, so we should use them first.
Comment 1 Carrot 2010-01-29 00:14:04 UTC
Created attachment 19744 [details]
test case
Comment 2 Ramana Radhakrishnan 2010-01-29 07:53:09 UTC
Confirmed - I'm marking this as an enhancement.
Comment 3 Steven Bosscher 2010-01-29 08:42:56 UTC
See arm.c:#define REG_ALLOC_ORDER. Comment before it says: " It is good to use ip since no saving is required (though calls clobber it) and it never contains function parameters. It is quite good to use lr since other calls may clobber it anyway."

So in REG_ALLOC_ORDER, ip (reg 12) and lr (reg 14) come before before r4-r7:

#define REG_ALLOC_ORDER                         \
{                                               \
     3,  2,  1,  0, 12, 14,  4,  5,             \
     6,  7,  8, 10,  9, 11, 13, 15,             \
    16, 17, 18, 19, 20, 21, 22, 23,             \
    27, 28, 29, 30, 31, 32, 33, 34,             \
    35, 36, 37, 38, 39, 40, 41, 42,             \
    43, 44, 45, 46, 47, 48, 49, 50,             \
    51, 52, 53, 54, 55, 56, 57, 58,             \
    59, 60, 61, 62,                             \
    24, 25, 26,                                 \

So the register allocator faithfully does what the IBTK has asked it to :-)
Comment 4 Carrot 2010-01-29 19:42:28 UTC
(In reply to comment #3)
> See arm.c:#define REG_ALLOC_ORDER. Comment before it says: " It is good to use
> ip since no saving is required (though calls clobber it) and it never contains
> function parameters. It is quite good to use lr since other calls may clobber
> it anyway."
> 
> So in REG_ALLOC_ORDER, ip (reg 12) and lr (reg 14) come before before r4-r7:
> 
> #define REG_ALLOC_ORDER                         \
> {                                               \
>      3,  2,  1,  0, 12, 14,  4,  5,             \
>      6,  7,  8, 10,  9, 11, 13, 15,             \
>     16, 17, 18, 19, 20, 21, 22, 23,             \
>     27, 28, 29, 30, 31, 32, 33, 34,             \
>     35, 36, 37, 38, 39, 40, 41, 42,             \
>     43, 44, 45, 46, 47, 48, 49, 50,             \
>     51, 52, 53, 54, 55, 56, 57, 58,             \
>     59, 60, 61, 62,                             \
>     24, 25, 26,                                 \
> 
> So the register allocator faithfully does what the IBTK has asked it to :-)
> 

When I set the option -march=armv5te, gcc can generate the expected result, use r6 and r7 instead of ip. Then I noticed the function arm_order_regs_for_local_alloc which over writes the default REG_ALLOC_ORDER with

/* Order of allocation of core registers for Thumb: this allocation is
   written over the corresponding initial entries of the array
   initialized with REG_ALLOC_ORDER.  We allocate all low registers
   first.  Saving and restoring a low register is usually cheaper than
   using a call-clobbered high register.  */

static const int thumb_core_reg_alloc_order[] =
{
   3,  2,  1,  0,  4,  5,  6,  7,
  14, 12,  8,  9, 10, 11, 13, 15
};

But the confusion part is:

1. It rewrites the order when the target is TARGET_THUMB, it should also impact THUMB2.
2. When I set a breakpoint to arm_order_regs_for_local_alloc and try to find when the order is changed, the breakpoint is never triggered, even with option -march=armv5te.
Comment 5 Steven Bosscher 2010-01-29 19:49:13 UTC
This isn't confusing at all. ORDER_REGS_FOR_LOCAL_ALLOC is unused. All targets that define it should be fixed.

Comment 6 Carrot 2010-01-29 22:47:03 UTC
I tried to change the register order in REG_ALLOC_ORDER, moved ip and lr after r4/r5/r6/r7, but I still got the same result.
Comment 7 Bernd Schmidt 2010-04-29 21:37:28 UTC
Subject: Bug 42895

Author: bernds
Date: Thu Apr 29 21:37:01 2010
New Revision: 158911

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=158911
Log:
	PR target/42895
	* doc/tm.texi (ADJUST_REG_ALLOC_ORDER): Renamed from
	ORDER_REGS_FOR_LOCAL_ALLOC.  All instances of this macro changed.
	(HONOR_REG_ALLOC_ORDER): Describe new macro.
	* ira.c (setup_alloc_regs): Use ADJUST_REG_ALLOC_ORDER if defined.
	* ira-color.c (assign_hard_reg): Take prologue/epilogue costs into
	account only if HONOR_REG_ALLOC_ORDER is not defined.
	* config/arm/arm.h (HONOR_REG_ALLOC_ORDER): Define.
	* system.h (ORDER_REGS_FOR_LOCAL_ALLOC): Poison.


Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/config/arm/arm.h
    trunk/gcc/config/avr/avr.h
    trunk/gcc/config/i386/i386.h
    trunk/gcc/config/mips/mips.h
    trunk/gcc/config/picochip/picochip.h
    trunk/gcc/config/sparc/sparc.h
    trunk/gcc/config/xtensa/xtensa.h
    trunk/gcc/doc/tm.texi
    trunk/gcc/ira-color.c
    trunk/gcc/ira.c
    trunk/gcc/system.h

Comment 8 Bernd Schmidt 2010-06-11 22:36:00 UTC
Fixed.