18560 – better optimalization of EOR/MOV block.

Bug 18560 - better optimalization of EOR/MOV block.

Summary: better optimalization of EOR/MOV block.

Status:	RESOLVED FIXED

Alias:	None

Product:	gcc
Classification:	Unclassified
Component:	rtl-optimization (show other bugs)
Version:	3.4.3

Importance:	P2 enhancement
Target Milestone:	---
Assignee:	Not yet assigned to anyone

URL:
Keywords:	missed-optimization, ra

Depends on:	18427
Blocks:
	Show dependency tree / graph

Reported:	2004-11-19 08:13 UTC by Pawel Sikora
Modified:	2007-11-09 20:40 UTC (History)
CC List:	1 user (show)

See Also:
Host:
Target:	arm-pld-linux
Build:
Known to work:	4.2.0 4.3.0
Known to fail:
Last reconfirmed:	2005-11-13 16:43:52

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Pawel Sikora 2004-11-19 08:13:23 UTC

/* 
    This function reverses the bytes in a word. The method was discovered in 
    1986 following a competition between ARM programmers; it requires just 4 
    instructions and 1 work register. A method using only 3 instructions per 
    word reversed was also found, but it has some set-up overhead and uses 
    a 2nd register. 
*/ 
 
unsigned long reverse(unsigned long v) 
{ 
    unsigned long t; 
    t = v ^ ((v << 16) | (v >> 16));    //      EOR r1,r0,r0,ROR #16    [1] 
    t &= ~0xff0000;                     //      BIC r1,r1,#&ff0000 
    v = (v << 24) | (v >> 8);           //      MOV r0,r0,ROR #8 
    return v ^ (t >> 8);                //      EOR r0,r0,r1,LSR #8 
} 
 
The gcc-3.4.3 with -O2 produces: 
 
reverse: 
        mov     r3, r0                  [2] 
        mov     r0, r0, ror #16         [2] 
        eor     r0, r0, r3              [2] 
        bic     r0, r0, #16711680 
        mov     r3, r3, ror #8 
        eor     r0, r3, r0, lsr #8 
        mov     pc, lr 
 
Itc doesn't seem to produce [2] optimal code [1].

Comment 1 Andrew Pinski 2004-11-19 14:50:52 UTC

Confirmed, the mainline produces better code already:
        mov     r3, r0
        eor     r0, r0, r0, ror #16
        bic     r0, r0, #16711680
        mov     r0, r0, lsr #8
        eor     r0, r0, r3, ror #8
        @ lr needed for prologue
        bx      lr

Comment 2 Richard Earnshaw 2004-12-14 15:16:32 UTC

As Andrew pointed out, the merge of the eor and the rotate is now done on
mainline in 4.0.  The initial redundant MOV is a register allocation artifact. 
This particular testcase compiles optimally with the new register allocator:

reverse:
        eor     r3, r0, r0, ror #16
        bic     r3, r3, #16711680
        mov     r3, r3, lsr #8
        eor     r0, r3, r0, ror #8
        bx      lr

but for other reasons, I wouldn't recommend you use that option (too many other
bugs).

Comment 3 Pawel Sikora 2005-02-10 20:17:51 UTC

(In reply to comment #2) 
> As Andrew pointed out, the merge of the eor and the rotate is now done on 
> mainline in 4.0. 
 
Hmm, it doesn't work on my gcc. 
 
# arm-pld-linux-gcc reversing_the_bytes_in_word.c -s -S -O2 
 
        .file   "reversing_the_bytes_in_word.c" 
        .text 
        .align  2 
        .global reverse 
        .type   reverse, %function 
reverse: 
        @ args = 0, pretend = 0, frame = 0 
        @ frame_needed = 0, uses_anonymous_args = 0 
        @ link register save eliminated. 
        mov     r3, r0 
        eor     r0, r0, r0, ror #16 
        bic     r0, r0, #16711680 
        mov     r0, r0, lsr #8 
        eor     r0, r0, r3, ror #8 
        @ lr needed for prologue 
        mov     pc, lr 
        .size   reverse, .-reverse 
        .ident  "GCC: (GNU) 4.0.0 20050130 (experimental)" 
 
> The initial redundant MOV is a register allocation artifact.  
> This particular testcase compiles optimally with the new register allocator: 
 
Is there a special option I need to set?

Comment 4 Richard Earnshaw 2005-02-11 10:15:04 UTC

The new register allocator (new-ra) has been removed because it was buggy and
there were no plans to fix it.  I was using it to show that the initial MOV was
an unrelated issue.

Your code snippet shows that the rotate and the EOR are being merged, which is
what this PR was about.

Comment 5 Pawel Sikora 2005-02-11 10:47:12 UTC

(In reply to comment #4)
> The new register allocator (new-ra) has been removed because it was buggy and
> there were no plans to fix it.
> I was using it to show that the initial MOV was an unrelated issue.
> Your code snippet shows that the rotate and the EOR are being merged,
> which is what this PR was about.

The PR is about initial eor/mov opt., not eor/rotate ;-)

At this moment the comercial IAR compiler generates good code.
The GCC should be good too :)

reverse:
 0x00000D04 E0201860  EOR          R1, R0, R0, ROR #16
 0x00000D08 E3C118FF  BIC          R1, R1, #0xFF0000
 0x00000D0C E1A01421  MOV          R1, R1, LSR #8
 0x00000D10 E0210460  EOR          R0, R1, R0, ROR #8
 0x00000D14 E12FFF1E  BX           LR

Comment 6 Richard Earnshaw 2005-02-11 11:13:12 UTC

I've linked this to the register-allocator meta-bug

Comment 7 Rask Ingemann Lambertsen 2007-11-09 20:40:45 UTC

This has been fixed for more than a year:

reverse:
	@ args = 0, pretend = 0, frame = 0
	@ frame_needed = 0, uses_anonymous_args = 0
	@ link register save eliminated.
	eor	r3, r0, r0, ror #16	@ 12	*arith_shiftsi	[length = 4]
	bic	r3, r3, #16711680	@ 13	*arm_andsi3_insn/2	[length = 4]
	mov	r0, r0, ror #8		@ 15	*arm_shiftsi3	[length = 4]
	eor	r0, r0, r3, lsr #8	@ 23	*arith_shiftsi	[length = 4]
	@ lr needed for prologue	@ 36	prologue_use	[length = 4]
	bx	lr	@ 39	return	[length = 12]
	.size	reverse, .-reverse
	.ident	"GCC: (GNU) 4.2.0 20060729 (experimental)"

4.3.0 (revision 129967) generates the same code.