This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

[Bug middle-end/27663] missed-optimization transforming a byte array to unsigned long

From: "gjl at gcc dot gnu.org" <gcc-bugzilla at gcc dot gnu dot org>
To: gcc-bugs at gcc dot gnu dot org
Date: Mon, 16 May 2011 15:15:29 +0000
Subject: [Bug middle-end/27663] missed-optimization transforming a byte array to unsigned long
Auto-submitted: auto-generated
References: <bug-27663-4@http.gcc.gnu.org/bugzilla/>

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27663

Georg-Johann Lay <gjl at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |gjl at gcc dot gnu.org
      Known to fail|                            |

--- Comment #7 from Georg-Johann Lay <gjl at gcc dot gnu.org> 2011-05-16 15:05:41 UTC ---
The patch tries to fix the middle-end flaw in the BE by introducing some
combine patterns that recognize byte-insert.

Wouldn't it be possible to recognize such situations in the middle-end and map
them to something like (set (zero_extract:QI (reg:SI) ...)) or (set (subreg:QI
(reg:SI) ...)?

Even if the bytes inserted do not come from consecutive memory locations, such
a recognition would help.

The patch does not lead to optimal code, there is still room for improvement:

With -Os -mmcu=atmega8:

f:
    push r16
    push r17
/* prologue: function */
/* frame size = 0 */
/* stack size = 2 */
.L__stack_usage = 2
    movw r30,r24
    ldd r24,Z+1
    ldd r16,Z+2
    ldi r17,lo8(0)
    ldi r18,lo8(0)
    ldi r19,hi8(0)
    movw r18,r16
    clr r17
    clr r16
    or r19,r24
    ldd r24,Z+4
    or r16,r24
    ldd r24,Z+3
    or r17,r24
    movw r22,r16
    movw r24,r18
/* epilogue start */
    pop r17
    pop r16
    ret

The usage of r16/r17 might be an artifact of IRA because only half of a SI reg
is call-saved, the other half is call-used. There is the following comment in
ira-color.c:

    /* We need to save/restore the hard register in
       epilogue/prologue.  Therefore we increase the cost.  */
    {
      /* ??? If only part is call clobbered.  */

Despite subreg lowering, the call-used r26/r27 are not used.

Maybe you should also try to disable subreg lowering by means of
-fno-split-wide-types. For the code in question that gives:

With -Os -mmcu=atmega8 -fno-split-wide-types:

f:
/* prologue: function */
/* frame size = 0 */
/* stack size = 0 */
.L__stack_usage = 0
    movw r30,r24
    ldd r18,Z+1
    ldd r22,Z+2
    mov r24,r22
    ldi r25,lo8(0)
    ldi r26,lo8(0)
    ldi r27,hi8(0)
    clr r23
    clr r22
    or r25,r18
    ldd r18,Z+4
    or r22,r18
    ldd r18,Z+3
    or r23,r18
/* epilogue start */
    ret

What I do not understand are the insns clearing r26/r27 because they are dead
(which is not detected). It is an HI insn that looks like that:

; (insn 32 34 42 (set (reg:HI 26 r26 [ MEM[(unsigned char *)P_1(D) + 2B]+2 ])
;         (const_int 0 [0])) insert-byte.c:5 10 {*movhi}
;      (nil))
    ldi r26,lo8(0)     ;  32    *movhi/1    [length = 2]
    ldi r27,hi8(0)

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]