This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug middle-end/27663] missed-optimization transforming a byte array to unsigned long
- From: "gjl at gcc dot gnu.org" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: Mon, 16 May 2011 15:15:29 +0000
- Subject: [Bug middle-end/27663] missed-optimization transforming a byte array to unsigned long
- Auto-submitted: auto-generated
- References: <bug-27663-4@http.gcc.gnu.org/bugzilla/>
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27663
Georg-Johann Lay <gjl at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |gjl at gcc dot gnu.org
Known to fail| |
--- Comment #7 from Georg-Johann Lay <gjl at gcc dot gnu.org> 2011-05-16 15:05:41 UTC ---
The patch tries to fix the middle-end flaw in the BE by introducing some
combine patterns that recognize byte-insert.
Wouldn't it be possible to recognize such situations in the middle-end and map
them to something like (set (zero_extract:QI (reg:SI) ...)) or (set (subreg:QI
(reg:SI) ...)?
Even if the bytes inserted do not come from consecutive memory locations, such
a recognition would help.
The patch does not lead to optimal code, there is still room for improvement:
With -Os -mmcu=atmega8:
f:
push r16
push r17
/* prologue: function */
/* frame size = 0 */
/* stack size = 2 */
.L__stack_usage = 2
movw r30,r24
ldd r24,Z+1
ldd r16,Z+2
ldi r17,lo8(0)
ldi r18,lo8(0)
ldi r19,hi8(0)
movw r18,r16
clr r17
clr r16
or r19,r24
ldd r24,Z+4
or r16,r24
ldd r24,Z+3
or r17,r24
movw r22,r16
movw r24,r18
/* epilogue start */
pop r17
pop r16
ret
The usage of r16/r17 might be an artifact of IRA because only half of a SI reg
is call-saved, the other half is call-used. There is the following comment in
ira-color.c:
/* We need to save/restore the hard register in
epilogue/prologue. Therefore we increase the cost. */
{
/* ??? If only part is call clobbered. */
Despite subreg lowering, the call-used r26/r27 are not used.
Maybe you should also try to disable subreg lowering by means of
-fno-split-wide-types. For the code in question that gives:
With -Os -mmcu=atmega8 -fno-split-wide-types:
f:
/* prologue: function */
/* frame size = 0 */
/* stack size = 0 */
.L__stack_usage = 0
movw r30,r24
ldd r18,Z+1
ldd r22,Z+2
mov r24,r22
ldi r25,lo8(0)
ldi r26,lo8(0)
ldi r27,hi8(0)
clr r23
clr r22
or r25,r18
ldd r18,Z+4
or r22,r18
ldd r18,Z+3
or r23,r18
/* epilogue start */
ret
What I do not understand are the insns clearing r26/r27 because they are dead
(which is not detected). It is an HI insn that looks like that:
; (insn 32 34 42 (set (reg:HI 26 r26 [ MEM[(unsigned char *)P_1(D) + 2B]+2 ])
; (const_int 0 [0])) insert-byte.c:5 10 {*movhi}
; (nil))
ldi r26,lo8(0) ; 32 *movhi/1 [length = 2]
ldi r27,hi8(0)