This is GCC Bugzilla
This is GCC Bugzilla Version 2.20+
View Bug Activity | Format For Printing | Clone This Bug
Hi, this code: unsigned long f (unsigned char *P) { unsigned long C; C = ((unsigned long)P[1] << 24) | ((unsigned long)P[2] << 16) | ((unsigned long)P[3] << 8) | ((unsigned long)P[4] << 0); return C; } compiles to this: 00000000 <f>: 0: f9 2f mov r31, r25 2: e8 2f mov r30, r24 4: 61 81 ldd r22, Z+1 ; 0x01 6: 77 27 eor r23, r23 8: 88 27 eor r24, r24 a: 99 27 eor r25, r25 c: 96 2f mov r25, r22 e: 88 27 eor r24, r24 10: 77 27 eor r23, r23 12: 66 27 eor r22, r22 14: 22 81 ldd r18, Z+2 ; 0x02 16: 33 27 eor r19, r19 18: 44 27 eor r20, r20 1a: 55 27 eor r21, r21 1c: 53 2f mov r21, r19 1e: 42 2f mov r20, r18 20: 33 27 eor r19, r19 22: 22 27 eor r18, r18 24: 62 2b or r22, r18 26: 73 2b or r23, r19 28: 84 2b or r24, r20 2a: 95 2b or r25, r21 2c: 24 81 ldd r18, Z+4 ; 0x04 2e: 33 27 eor r19, r19 30: 44 27 eor r20, r20 32: 55 27 eor r21, r21 34: 62 2b or r22, r18 36: 73 2b or r23, r19 38: 84 2b or r24, r20 3a: 95 2b or r25, r21 3c: 23 81 ldd r18, Z+3 ; 0x03 3e: 33 27 eor r19, r19 40: 44 27 eor r20, r20 42: 55 27 eor r21, r21 44: 54 2f mov r21, r20 46: 43 2f mov r20, r19 48: 32 2f mov r19, r18 4a: 22 27 eor r18, r18 4c: 62 2b or r22, r18 4e: 73 2b or r23, r19 50: 84 2b or r24, r20 52: 95 2b or r25, r21 54: 08 95 ret using this cmd line: avr-gcc -c -Os f.c IMO, most of the or, eor and mov instructions are unnecessary.
Bernd, what mcu type was this compiled for?
Confirmed on 4.2.1.
Confirmed on 4.3.2 - it's a bit different and actually worse (longer): (Please add 4.3.2 to known to fail - i cant) 00000000 <f>: 0: e8 2f mov r30, r24 2: f9 2f mov r31, r25 4: 21 81 ldd r18, Z+1 ; 0x01 6: 30 e0 ldi r19, 0x00 ; 0 8: 40 e0 ldi r20, 0x00 ; 0 a: 50 e0 ldi r21, 0x00 ; 0 c: 52 2f mov r21, r18 e: 44 27 eor r20, r20 10: 33 27 eor r19, r19 12: 22 27 eor r18, r18 14: 82 81 ldd r24, Z+2 ; 0x02 16: 90 e0 ldi r25, 0x00 ; 0 18: a0 e0 ldi r26, 0x00 ; 0 1a: b0 e0 ldi r27, 0x00 ; 0 1c: a8 2f mov r26, r24 1e: b9 2f mov r27, r25 20: 99 27 eor r25, r25 22: 88 27 eor r24, r24 24: 28 2b or r18, r24 26: 39 2b or r19, r25 28: 4a 2b or r20, r26 2a: 5b 2b or r21, r27 2c: 84 81 ldd r24, Z+4 ; 0x04 2e: 90 e0 ldi r25, 0x00 ; 0 30: a0 e0 ldi r26, 0x00 ; 0 32: b0 e0 ldi r27, 0x00 ; 0 34: 28 2b or r18, r24 36: 39 2b or r19, r25 38: 4a 2b or r20, r26 3a: 5b 2b or r21, r27 3c: 83 81 ldd r24, Z+3 ; 0x03 3e: 90 e0 ldi r25, 0x00 ; 0 40: a0 e0 ldi r26, 0x00 ; 0 42: b0 e0 ldi r27, 0x00 ; 0 44: ba 2f mov r27, r26 46: a9 2f mov r26, r25 48: 98 2f mov r25, r24 4a: 88 27 eor r24, r24 4c: 28 2b or r18, r24 4e: 39 2b or r19, r25 50: 4a 2b or r20, r26 52: 5b 2b or r21, r27 54: 62 2f mov r22, r18 56: 73 2f mov r23, r19 58: 84 2f mov r24, r20 5a: 95 2f mov r25, r21 5c: 08 95 ret
It's a target independent issue. On non-strict alignment targets we can do a 32bit load instead of 4 byte loads. This is what you ask for, correct? combine unfortunately does not see enough insns to catch it. Within a single stmt we can teach fold to do it, otherwise forwprop is our tree level combiner.
Maybe it fits within the byteswap recognition pass (certainly the loads may be because on-the-fly byteswap is requested).