lower-subreg.c causes extreme code bloat for some memory accesses because it does not take into account the additional costs caused by the split. Example code for AVR address spaces: long readx (const __memx long *p) { return *p; } long read1 (const __flash1 long *p) { return *p; } long read0 (const __flash long *p) { return *p; } Reason is that read from these address spaces need one preparation sequence (set up segment register) per read. Thus: For one 4-byte read there will be one preparation sequence. For 4 one-byte reads there will be 4 preparation sequences. For the __flash address space no preparation is needed, but a 32-bit read can use post-increment addressing whereas a split to 4 byte moves won't use post-increment because GCC is completely afraid of pre-/post-modify addressing. The only place to hook in could be mode_dependent_address, however, that hook just passes the address down to the backend but omits the address space in use. As all 16-bit address spaces (including generic) use HImode as pointer mode, the target cannot take a decision based on the address alone. Configured with: ../../gcc.gnu.org/trunk/configure --target=avr --prefix=/local/gnu/install/gcc-4.7 --disable-nls --with-dwarf2 --enable-checking=yes,rtl --enable-languages=c,c++ gcc version 4.8.0 20120307 (experimental) (GCC)
Created attachment 26862 [details] lower.c: C source
Created attachment 26863 [details] lower.s: Assembler output Compiled with avr-gcc lower.c -Os -S -dp -mmcu=atmega128
Created attachment 26864 [details] lower.s: Assembler output with -fno-split-wide-types Compiled with avr-gcc lower.c -Os -S -dp -mmcu=atmega128 -fno-split-wide-types The code size for this module is 38 bytes whereas with lobreg lowering the code size is 142 bytes, which is a factor of 3.7
Author: gjl Date: Wed Mar 21 13:20:20 2012 New Revision: 185605 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=185605 Log: PR rtl-optimization/52543 PR target/52461 * config/avr/avr-protos.h (avr_load_lpm): New prototype. * config/avr/avr.c (avr_mode_dependent_address_p): New function. (TARGET_MODE_DEPENDENT_ADDRESS_P): New define. (avr_load_libgcc_p): Restrict to __flash loads. (avr_out_lpm): Only handle 1-byte loads from __flash. (avr_load_lpm): New function. (avr_find_unused_d_reg): Remove. (avr_out_lpm_no_lpmx): Remove. (adjust_insn_length): Handle ADJUST_LEN_LOAD_LPM. * config/avr/avr.md (unspec): Add UNSPEC_LPM. (load_<mode>_libgcc): Use UNSPEC_LPM instead of MEM. (load_<mode>, load_<mode>_clobber): New insns. (mov<mode>): For multi-byte move from non-generic 16-bit address spaces: Expand to load_<mode> resp. load_<mode>_clobber. (load<mode>_libgcc): Remove expander. (split-lpmx): Remove split. Modified: trunk/gcc/ChangeLog trunk/gcc/config/avr/avr-protos.h trunk/gcc/config/avr/avr.c trunk/gcc/config/avr/avr.md
Author: gjl Date: Thu Mar 22 15:06:57 2012 New Revision: 185697 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=185697 Log: libgcc/ Backport from 2012-03-07 mainline r185033. PR target/52507 * config/avr/lib1funcs.S (__movmemx_hi): Fix loop label in RAM-part. Backport from 2012-03-07 mainline r185031. PR target/52505 * config/avr/lib1funcs.S (__xload_1): Don't read unintentionally from RAM. Backport from 2012-03-07 mainline r185030. PR target/52461 PR target/52508 * config/avr/lib1funcs.S (__do_copy_data): Clear RAMPZ after usage if RAMPZ affects reading from RAM. (__tablejump_elpm__): Ditto. (.xload): Ditto. (__movmemx_hi): Ditto. (__do_global_ctors): Right condition for RAMPZ usage is "have ELPM". (__do_global_dtors): Ditto. (__xload_1, __xload_2, __xload_3, __xload_4): Ditto. (__movmemx_hi): Ditto. gcc/ Backport from 2012-03-22 mainline r185692. PR target/52496 * config/avr/avr.md (unspec): Remove UNSPEC_MEMORY_BARRIER. (unspecv): Add UNSPECV_MEMORY_BARRIER. (cli_sei): Use unspec_volatile instead of unspec for memory barrier. (delay_cycles_1, delay_cycles_2): Ditto. (delay_cycles_3, delay_cycles_4): Ditto. (nopv, *nopv): Ditto. (sleep, *sleep): Ditto. (wdr, *wdr): Ditto. Backport from 2012-03-21 mainline r185605. PR rtl-optimization/52543 PR target/52461 * config/avr/avr-protos.h (avr_load_lpm): New prototype. * config/avr/avr.c (avr_mode_dependent_address_p): New function. (TARGET_MODE_DEPENDENT_ADDRESS_P): New define. (avr_load_libgcc_p): Restrict to __flash loads. (avr_out_lpm): Only handle 1-byte loads from __flash. (avr_load_lpm): New function. (avr_find_unused_d_reg): Remove. (avr_out_lpm_no_lpmx): Remove. (adjust_insn_length): Handle ADJUST_LEN_LOAD_LPM. * config/avr/avr.md (unspec): Add UNSPEC_LPM. (load_<mode>_libgcc): Use UNSPEC_LPM instead of MEM. (load_<mode>, load_<mode>_clobber): New insns. (mov<mode>): For multi-byte move from non-generic 16-bit address spaces: Expand to load_<mode> resp. load_<mode>_clobber. (load<mode>_libgcc): Remove expander. (split-lpmx): Remove split. Backport from 2012-03-13 mainline r185329. PR target/52488 * config/avr/avr.c (avr_prologue_setup_frame): Cut down stack offset (size) to a value the insns can deal with. (expand_epilogue): Ditto. Backport from 2012-03-12 mainline r185256. PR target/52499 * config/avr/avr.c (avr_mode_code_base_reg_class): Change return type from reg_class_t to enum reg_class. * config/avr/avr-protos.h (avr_mode_code_base_reg_class): Ditto. Backport from 2012-03-12 mainline r185253. PR target/52148 * config/avr/avr.c (avr_out_movmem): Fix typo in output template for the case ADDR_SPACE_FLASH and AVR_HAVE_LPMX introduced in r184615 from 2012-02-28. Backport from 2012-03-08 mainline r185105. * config/avr/avr.md (*addhi3, addhi3_clobber): Add "w" alternative for constants in [-63,63]. Backport from 2012-03-08 mainline r185100. PR target/52496 * config/avr/avr.c (avr_mem_clobber): New static function. (avr_expand_delay_cycles): Add memory clobber operand to delay_cycles_1, delay_cycles_2, delay_cycles_3, delay_cycles_4. * config/avr/avr.md (unspec): Add UNSPEC_MEMORY_BARRIER. (enable_interrupt, disable_interrupt): New expander. (nopv, sleep, wdr): New expanders. (delay_cycles_1): Add memory clobber. (delay_cycles_2): Add memory clobber. (delay_cycles_3): Add memory clobber. (delay_cycles_4): Add memory clobber. (cli_sei): New insn from former "enable_interrupt", "disable_interrupt" with memory clobber. (*wdt): New insn from former "wdt" with memory clobber. (*nopv): Similar, but for "nopv". (*sleep): Similar, but for "sleep". Backport from 2012-03-07 mainline r185043. PR target/52484 * config/avr/avr.md (xload<mode>_A): Add R22... to register footprint. Backport from 2012-03-07 mainline r185032. PR target/52506 * gcc/config/avr/avr.c (expand_epilogue): Fix order of restoration to: RAMPZ, RAMPY, RAMPX, RAMPD. (expand_prologue): Only clear RAMPZ if it has effect on RAM-read. Backport from 2012-03-07 mainline r185031. PR target/52505 * config/avr/avr.c (avr_out_xload): Don't read unintentionally from RAM. * config/avr/avr.md (xload_8): Adjust insn length. Backport from 2012-03-07 mainline r185030. PR target/52461 * gcc/config/avr/avr.c (avr_out_lpm): Clear RAMPZ after usage if RAMPZ affects reading from RAM. Backport from 2012-03-05 mainline r184919. * config/avr/avr.md (*umaddqihi4.2): New insn-and-split. Modified: branches/gcc-4_7-branch/gcc/ChangeLog branches/gcc-4_7-branch/gcc/config/avr/avr-protos.h branches/gcc-4_7-branch/gcc/config/avr/avr.c branches/gcc-4_7-branch/gcc/config/avr/avr.md branches/gcc-4_7-branch/libgcc/ChangeLog branches/gcc-4_7-branch/libgcc/config/avr/lib1funcs.S
FIXME: It works for avr now, but just because that backend hacks around by using UNSPEC instead of MEM. This is legitimate because the affected address-spaces read from Flash, i.e. from non-changing memory. For respective FIXMEs see avr.c and avr.md as introduced by the patches above.
Author: rsandifo Date: Thu May 3 17:04:41 2012 New Revision: 187110 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=187110 Log: Add PR rtl-optimization/52543 to changelog. Modified: trunk/gcc/ChangeLog
lower-subreg patched committed here: http://gcc.gnu.org/ml/gcc-cvs/2012-05/msg00011.html
Fixed.
(In reply to comment #9) > Fixed. NACK. I cannot confirm that anything changed: If I revert the Hack from comment #4 that represents a memory access as UNSPEC instead of as MEM (so that lower-subreg won't split), the code bloat returns. Just suppose the examples given in comment #0 long readx (const __memx long *p) { return *p; } long read0 (const __flash long *p) { return *p; } with "GCC: (GNU) 4.8.0 20120802 (experimental)" and -S -Os -dp -mmcu=avr5 -fno-split-wide-types the result is fine and there are 32-bit accesses: readx: movw r30,r22 ; 17 *movhi/1 [length = 1] mov r21,r24 ; 18 movqi_insn/1 [length = 1] call __xload_4 ; 19 xload_si_libgcc [length = 2] ret ; 24 return [length = 1] read0: movw r30,r24 ; 19 *movhi/1 [length = 1] lpm r22,Z+ ; 20 *movsi/3 [length = 4] lpm r23,Z+ lpm r24,Z+ lpm r25,Z+ ret ; 24 return [length = 1] Without -fno-split-wide-types the bloat is still there: readx: push r12 ; 54 pushqi1/1 [length = 1] push r13 ; 55 pushqi1/1 [length = 1] push r14 ; 56 pushqi1/1 [length = 1] mov r26,r24 ; 2 *movpsi/1 [length = 2] movw r24,r22 movw r30,r24 ; 35 *movhi/1 [length = 1] sbrc r26,7 ; 37 xload_8/1 [length = 4] ld r22,Z sbrs r26,7 lpm r22,Z ldi r18,lo8(1) ; 49 *movpsi/5 [length = 3] ldi r19,0 ldi r20,0 add r18,r24 ; 19 addpsi3/1 [length = 3] adc r19,r25 adc r20,r26 movw r30,r18 ; 38 *movhi/1 [length = 1] sbrc r20,7 ; 40 xload_8/1 [length = 4] ld r23,Z sbrs r20,7 lpm r23,Z ldi r18,lo8(2) ; 64 *reload_inpsi [length = 4] mov r12,r18 mov r13,__zero_reg__ mov r14,__zero_reg__ add r12,r24 ; 22 addpsi3/1 [length = 3] adc r13,r25 adc r14,r26 ldi r18,lo8(3) ; 51 *movpsi/5 [length = 3] ldi r19,0 ldi r20,0 add r18,r24 ; 25 addpsi3/1 [length = 3] adc r19,r25 adc r20,r26 movw r30,r12 ; 41 *movhi/1 [length = 1] sbrc r14,7 ; 43 xload_8/1 [length = 4] ld r24,Z sbrs r14,7 lpm r24,Z movw r30,r18 ; 44 *movhi/1 [length = 1] sbrc r20,7 ; 46 xload_8/1 [length = 4] ld r25,Z sbrs r20,7 lpm r25,Z pop r14 ; 59 popqi [length = 1] pop r13 ; 60 popqi [length = 1] pop r12 ; 61 popqi [length = 1] ret ; 62 return_from_epilogue [length = 1] read0: movw r30,r24 ; 35 *movhi/1 [length = 1] lpm r22,Z+ ; 17 movqi_insn/4 [length = 1] lpm r23,Z ; 20 movqi_insn/4 [length = 1] movw r18,r24 ; 38 *movhi/1 [length = 1] subi r18,-2 ; 22 addhi3_clobber/2 [length = 2] sbci r19,-1 movw r20,r24 ; 39 *movhi/1 [length = 1] subi r20,-3 ; 25 addhi3_clobber/2 [length = 2] sbci r21,-1 movw r30,r18 ; 40 *movhi/1 [length = 1] lpm r24,Z ; 33 movqi_insn/4 [length = 1] movw r30,r20 ; 41 *movhi/1 [length = 1] lpm r25,Z ; 34 movqi_insn/4 [length = 1] ret ; 44 return [length = 1] Each 32-bit move is still split into 4 8-bit moves even though that is extremely expensive. This happens both for PSImode and for HImode addresses.
Created attachment 27928 [details] /local/gnu/patches/pr52543-undo-185605.diff Just for reference: Here is the patch that undoes the MEM->UNSPEC workaround from SVN 185605, comment #4 that I used.
CCing Andrew.
I'm not sure, but this looks similar to what is happening in PR 54386, although there the mem load splitting happens way before lower-subreg.
(In reply to comment #0) > > For the __flash address space no preparation is needed, but a 32-bit read can > use post-increment addressing whereas a split to 4 byte moves won't use > post-increment because GCC is completely afraid of pre-/post-modify addressing. > BTW, the reason for the auto-inc-dec problem is described in PR 50749. I've started writing a replacement for the auto-inc-dec pass, but it will take some time.
Author: gjl Date: Fri Sep 28 08:21:06 2012 New Revision: 191820 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=191820 Log: PR rtl-optimization/52543 * config/avr/avr.c (avr_mode_dependent_address_p): Return true for all non-generic address spaces. (TARGET_SECONDARY_RELOAD): New hook define to... (avr_secondary_reload): ...this new static function. * config/avr/avr.md (reload_in<mode>): New insns. Undo r185605 (mostly): * config/avr/avr-protos.h (avr_load_lpm): Remove. * config/avr/avr.c (avr_load_libgcc_p): Don't restrict to __flash loads. (avr_out_lpm): Also handle loads > 1 byte. (avr_load_lpm): Remove. (avr_find_unused_d_reg): New static function. (avr_out_lpm_no_lpmx): New static function. (adjust_insn_length): Remove ADJUST_LEN_LOAD_LPM. * config/avr/avr.md (unspec): Remove UNSPEC_LPM. (load_<mode>_libgcc): Use MEM instead of UNSPEC_LPM. (load_<mode>, load_<mode>_clobber): Remove. (mov<mode>): For multi-byte move from non-generic 16-bit address spaces: Expand to *mov<mode> again. (load<mode>_libgcc): New expander. (split-lpmx): Remove split. Modified: trunk/gcc/ChangeLog trunk/gcc/config/avr/avr-protos.h trunk/gcc/config/avr/avr.c trunk/gcc/config/avr/avr.md