Each data access through constant pool requires 2 instructions: (1) to access constant pool element (data address) and (2) to access the data. Storing the constant pool base address in a register could save one instruction per subsequent data accesses. (See attached files.) Release: gcc version 3.3 20030210 (prerelease) Environment: BUILD & HOST: Linux 2.4.20 i686 unknown TARGET: arm-unknown-elf How-To-Repeat: Use the command line below to compile the specified source code: arm-elf-gcc -S -g0 -Os -o 02.s ./02.c Attached is the original assembly output where accessing globalvar1 can be solved in one less instructions as shown in the manually edited assembly file. // 01.c: # 1 "01.c" # 1 "<built-in>" # 1 "<command line>" # 1 "01.c" unsigned int globvar1=0; unsigned int globvar2=0; +C1 int f1() { globvar1=11; globvar2=12; return globvar1; } int main () { return f1(); }
Hello, with gcc 3.3 branch and mainline (20030509) I get the following code. Could you confirm whether this still exhibits the problem you noted? Thanks, Dara .file "01.i" .global globvar1 .bss .global globvar1 .align 2 .type globvar1, %object .size globvar1, 4 globvar1: .space 4 .global globvar2 .global globvar2 .align 2 .type globvar2, %object .size globvar2, 4 globvar2: .space 4 .text .align 2 .global f1 .type f1, %function f1: @ args = 0, pretend = 0, frame = 0 @ frame_needed = 0, uses_anonymous_args = 0 @ link register save eliminated. ldr r3, .L2 mov r2, #12 str r2, [r3, #0] ldr r3, .L2+4 mov r0, #11 @ lr needed for prologue str r0, [r3, #0] mov pc, lr .L3: .align 2 .L2: .word globvar2 .word globvar1 .size f1, .-f1 .align 2 .global main .type main, %function main: @ args = 0, pretend = 0, frame = 0 @ frame_needed = 0, uses_anonymous_args = 0 @ link register save eliminated. @ lr needed for prologue b f1 .size main, .-main .ident "GCC: (GNU) 3.3 20030508 (prerelease)"
See Dara's question.
Subject: Re: [arm] Accessing data through constant pool more times could be solved in less instructions Hello! Thanks for your question! The problem still exists... I commented your code below. The relevant part is function f1. The ldr insn is unnecessary at (*) when in (**) offset is used. With sequential cpool accesses 1 insn per access should be dropped. Regards, Laszlo On Mon, 26 May 2003, dhazeghi@yahoo.com wrote: > PLEASE REPLY TO gcc-bugzilla@gcc.gnu.org ONLY, *NOT* gcc-bugs@gcc.gnu.org. > > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=9703 > > > > ------- Additional Comments From dhazeghi@yahoo.com 2003-05-26 20:59 ------- > Hello, > > with gcc 3.3 branch and mainline (20030509) I get the following code. Could you confirm whether > this still exhibits the problem you noted? Thanks, > > Dara > > .file "01.i" > .global globvar1 > .bss > .global globvar1 > .align 2 > .type globvar1, %object > .size globvar1, 4 > globvar1: > .space 4 > .global globvar2 > .global globvar2 > .align 2 > .type globvar2, %object > .size globvar2, 4 > globvar2: > .space 4 > .text > .align 2 > .global f1 > .type f1, %function > f1: > @ args = 0, pretend = 0, frame = 0 > @ frame_needed = 0, uses_anonymous_args = 0 > @ link register save eliminated. > ldr r3, .L2 > mov r2, #12 > str r2, [r3, #0] > ldr r3, .L2+4 ; (*) can be removed > mov r0, #11 > @ lr needed for prologue > str r0, [r3, #0] ; (**) with offset instead of #0 > mov pc, lr > .L3: > .align 2 > .L2: > .word globvar2 > .word globvar1 > .size f1, .-f1 > .align 2 > .global main > .type main, %function > main: > @ args = 0, pretend = 0, frame = 0 > @ frame_needed = 0, uses_anonymous_args = 0 > @ link register save eliminated. > @ lr needed for prologue > b f1 > .size main, .-main > .ident "GCC: (GNU) 3.3 20030508 (prerelease)" > > > > ------- You are receiving this mail because: ------- > You reported the bug, or are watching the reporter. >
I was working on some code to help with this sometime last year. I'll try to look at it again.
This is still exhibited on mainline gcc.
Should be helped or almost ready to be fixed by: http://gcc.gnu.org/wiki/Section%20Anchor%20Optimisations
Patch posted here: http://gcc.gnu.org/ml/gcc-patches/2006-02/msg00133.html
Subject: Bug 9703 Author: rsandifo Date: Sat Feb 18 22:06:53 2006 New Revision: 111254 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=111254 Log: * cselib.c (cselib_init): Change RTX_SIZE to RTX_CODE_SIZE. * emit-rtl.c (copy_rtx_if_shared_1): Use shallow_copy_rtx. (copy_insn_1): Likewise. Don't copy each field individually. Reindent. * read-rtl.c (apply_macro_to_rtx): Use RTX_CODE_SIZE instead of RTX_SIZE. * reload1.c (eliminate_regs): Use shallow_copy_rtx. * rtl.c (rtx_size): Rename variable to... (rtx_code_size): ...this. (rtx_size): New function. (rtx_alloc_stat): Use RTX_CODE_SIZE instead of RTX_SIZE. (copy_rtx): Use shallow_copy_rtx. Don't copy each field individually. Reindent. (shallow_copy_rtx_stat): Use rtx_size instead of RTX_SIZE. * rtl.h (rtx_code_size): New variable. (rtx_size): Change from a variable to a function. (RTX_SIZE): Rename to... (RTX_CODE_SIZE): ...this. PR target/9703 PR tree-optimization/17106 * doc/tm.texi (TARGET_USE_BLOCKS_FOR_CONSTANT_P): Document. (Anchored Addresses): New section. * doc/invoke.texi (-fsection-anchors): Document. * doc/rtl.texi (SYMBOL_REF_IN_BLOCK_P, SYMBOL_FLAG_IN_BLOCK): Likewise. (SYMBOL_REF_ANCHOR_P, SYMBOL_FLAG_ANCHOR): Likewise. (SYMBOL_REF_BLOCK, SYMBOL_REF_BLOCK_OFFSET): Likewise. * hooks.c (hook_bool_mode_rtx_false): New function. * hooks.h (hook_bool_mode_rtx_false): Declare. * gengtype.c (create_optional_field): New function. (adjust_field_rtx_def): Add the "block_sym" field for SYMBOL_REFs when SYMBOL_REF_IN_BLOCK_P is true. * target.h (output_anchor, use_blocks_for_constant_p): New hooks. (min_anchor_offset, max_anchor_offset): Likewise. (use_anchors_for_symbol_p): New hook. * toplev.c (compile_file): Call output_object_blocks. (target_supports_section_anchors_p): New function. (process_options): Check that -fsection-anchors is only used on targets that support it and when -funit-at-a-time is in effect. * tree-ssa-loop-ivopts.c (prepare_decl_rtl): Only create DECL_RTL if the decl doesn't have one. * dwarf2out.c: Remove instantiations of VEC(rtx,gc). * expr.c (emit_move_multi_word, emit_move_insn): Pass the result of force_const_mem through use_anchored_address. (expand_expr_constant): New function. (expand_expr_addr_expr_1): Call it. Use the same modifier when calling expand_expr for INDIRECT_REF. (expand_expr_real_1): Pass DECL_RTL through use_anchored_address for all modifiers except EXPAND_INITIALIZER. Use expand_expr_constant. * expr.h (use_anchored_address): Declare. * loop-unroll.c: Don't declare rtx vectors here. * explow.c: Include output.h. (validize_mem): Call use_anchored_address. (use_anchored_address): New function. * common.opt (-fsection-anchors): New switch. * varasm.c (object_block_htab, anchor_labelno): New variables. (hash_section, object_block_entry_eq, object_block_entry_hash) (use_object_blocks_p, get_block_for_section, create_block_symbol) (use_blocks_for_decl_p, change_symbol_section): New functions. (get_variable_section): New function, split out from assemble_variable. (make_decl_rtl): Create a block symbol if use_object_blocks_p and use_blocks_for_decl_p say so. Use change_symbol_section if the symbol has already been created. (assemble_variable_contents): New function, split out from... (assemble_variable): ...here. Don't output any code for block symbols; just pass them to place_block_symbol. Use get_variable_section and assemble_variable_contents. (get_constant_alignment, get_constant_section, get_constant_size): New functions, split from output_constant_def_contents. (build_constant_desc): Create a block symbol if use_object_blocks_p says so. Or into SYMBOL_REF_FLAGS. (assemble_constant_contents): New function, split from... (output_constant_def_contents): ...here. Don't output any code for block symbols; just pass them to place_section_symbol. Use get_constant_section and get_constant_alignment. (force_const_mem): Create a block symbol if use_object_blocks_p and use_blocks_for_constant_p say so. Or into SYMBOL_REF_FLAGS. (output_constant_pool_1): Add an explicit alignment argument. Don't switch sections here. (output_constant_pool): Adjust call to output_constant_pool_1. Switch sections here instead. Don't output anything for block symbols; just pass them to place_block_symbol. (init_varasm_once): Initialize object_block_htab. (default_encode_section_info): Keep the old SYMBOL_FLAG_IN_BLOCK. (default_asm_output_anchor, default_use_aenchors_for_symbol_p) (place_block_symbol, get_section_anchor, output_object_block) (output_object_block_htab, output_object_blocks): New functions. * target-def.h (TARGET_ASM_OUTPUT_ANCHOR): New macro. (TARGET_ASM_OUT): Include it. (TARGET_USE_BLOCKS_FOR_CONSTANT_P): New macro. (TARGET_MIN_ANCHOR_OFFSET, TARGET_MAX_ANCHOR_OFFSET): New macros. (TARGET_USE_ANCHORS_FOR_SYMBOL_P): New macro. (TARGET_INITIALIZER): Include them. * rtl.c (rtl_check_failed_block_symbol): New function. * rtl.h: Include vec.h. Declare heap and gc rtx vectors. (block_symbol, object_block): New structures. (rtx_def): Add a block_symbol field to the union. (BLOCK_SYMBOL_CHECK): New macro. (rtl_check_failed_block_symbol): Declare. (SYMBOL_FLAG_IN_BLOCK, SYMBOL_FLAG_ANCHOR): New SYMBOL_REF flags. (SYMBOL_REF_IN_BLOCK_P, SYMBOL_REF_ANCHOR_P): New predicates. (SYMBOL_FLAG_MACH_DEP_SHIFT): Bump by 2. (SYMBOL_REF_BLOCK, SYMBOL_REF_BLOCK_OFFSET): New accessors. * output.h (output_section_symbols): Declare. (object_block): Name structure. (place_section_symbol, get_section_anchor, default_asm_output_anchor) (default_use_anchors_for_symbol_p): Declare. * Makefile.in (RTL_BASE_H): Add vec.h. (explow.o): Depend on output.h. * config/rs6000/rs6000.c (TARGET_MIN_ANCHOR_OFFSET): Override default. (TARGET_MAX_ANCHOR_OFFSET): Likewise. (TARGET_USE_BLOCKS_FOR_CONSTANT_P): Likewise. (rs6000_use_blocks_for_constant_p): New function. Modified: trunk/gcc/ChangeLog trunk/gcc/Makefile.in trunk/gcc/common.opt trunk/gcc/config/rs6000/rs6000.c trunk/gcc/cselib.c trunk/gcc/doc/invoke.texi trunk/gcc/doc/rtl.texi trunk/gcc/doc/tm.texi trunk/gcc/dwarf2out.c trunk/gcc/emit-rtl.c trunk/gcc/explow.c trunk/gcc/expr.c trunk/gcc/expr.h trunk/gcc/gengtype.c trunk/gcc/hooks.c trunk/gcc/hooks.h trunk/gcc/loop-unroll.c trunk/gcc/output.h trunk/gcc/read-rtl.c trunk/gcc/reload1.c trunk/gcc/rtl.c trunk/gcc/rtl.h trunk/gcc/target-def.h trunk/gcc/target.h trunk/gcc/toplev.c trunk/gcc/tree-ssa-loop-ivopts.c trunk/gcc/varasm.c
The patch I committed should provide the general infrastructure, but an ARM patch will be needed to make use of it. ARM code would also benefit if we tried to reuse addresses that the function had to calculate anyway, rather than use arbitrary anchors for everything. (I wrote a message about this that I was supposed to send to gcc-patches@. However, it isn't in either the web archives or gmane, so I suspect I sent it privately by accident.) I'm not intending to do the ARM bits myself right now, so I'll return this PR to unassigned.
This might have been implemented for 4.4 already. Section anchors now have been enabled for ARM.
(In reply to comment #10) > This might have been implemented for 4.4 already. Section anchors now have > been enabled for ARM. > 4.4 seems to enable this with section anchors turned on. This is the code generated. Here is the code generated for the function reported. ldr r3, .L3 mov r0, #11 mov r2, #12 stmia r3, {r0, r2} @ phole stm bx lr I suspect this can now be closed.
Fixed.