This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |
Other format: | [Raw text] |
This is the main part for the 4.2 section anchor project: http://gcc.gnu.org/wiki/Section%20Anchor%20Optimisations The idea is to introduce anchor symbols that can be used to access several nearby objects. For example, if we have: static int a, b, c; int foo (void) { return a + b + c; } gcc will normally perform separate symbolic address calculations for "a", "b" and "c". The idea of this patch is to introduce a new anchor symbol and access "a", "b" and "c" relative to that anchor. The main motivation is to reduce GOT size, but the patch also has the potential to make some code faster. SPEC results are attached and described below. Managing the relative positions of objects ========================================== Introduction ------------ We can only access two objects from the same anchor if we know how far apart those two objects are. gcc doesn't really have any infrastructure for this at the moment; it simply writes out each static object in isolation. A great deal of the patch is therefore about allowing objects to be grouped together so that their relative positions are fixed. The aim has been to make this infrastructure as general as possible, so that it could be used for other optimisations besides section anchors. Another goal was to support the placement of every kind of static data -- decls, tree constants, and rtx constants -- and to allow these different kinds of data to be grouped together in the same block. With that constraint, the best approach seemed to be to group objects based on their SYMBOL_REF. One of the main design decisions here was: when should we decide whether to put a SYMBOL_REF in a block, rather than treat it as a stand-alone entity? The ideal answer might seem to be "whenever we know we'll need positional information". Unfortunately, in the case of section anchors, that will be part way through compiling a function, and by that time we might already have written the object out. (Note that this is true even in unit-at-a-time mode.) The approach I took was therefore to put objects into blocks if (1) we knew enough information about them to do so and (2) the current compilation mode might make use of positional information. In other words, we now have a compilation mode in which objects are grouped together into blocks whenever possible. This mode is selected if one of the active optimisations might find it useful. Creating block symbols ---------------------- Block symbols (i.e., symbols that are put into blocks) can be created in three places: make_decl_rtl (for data decls), build_constant_desc (for tree constants) and force_const_mem (for rtx constants). A new function -- use_blocks_for_decl_p -- says whether a decl can be put into a block, while a new target hook -- use_blocks_for_constant_p -- says the same about rtx constants. Tree constants can always be grouped into blocks. We often don't know at this stage whether the object will be needed or not. For example, local data decls (C statics) might be removed or constants might not be marked. A block symbol therefore starts with an offset of -1 to indicate that its position in the block is not yet known. Writing out block symbols ------------------------- The corresponding assembly output routines are assemble_variable (for data decls), output_constant_def_contents (for tree constants) and output_constant_pool (for rtx constants). The patch makes sure these functions do not emit the definitions of block symbols; instead, they simply make sure that the symbols have been assigned a position. The whole block is then written out at the end of compilation by a new function called "output_object_blocks". Placing block symbols --------------------- The main function for placing an object is place_block_symbol. This function is called by the output functions listed above, and by any client optimisation that wants to know the position of an unplaced object. Data structures --------------- The patch introduces a new structure called "object_block" for representing a group of objects. The block is associated with a particular section and stores its contents as a sorted vector of SYMBOL_REFs. The main data structure decision here is: how should we represent a SYMBOL_REF's position within a block? Two considerations are: (1) We don't want to allocate any extra data for symbols that aren't going to be put in blocks. (2) Every object we decide to put into a block will have this information, and it will live as long as the object itself does. (1) means we can't unconditionally grow the SYMBOL_REF rtx. If the new compilation mode is not selected, the extra memory will never be used at all, while if the mode _is_ selected, the memory will only be useful for a subset of symbols. IMO, (2) argues against the use of hash tables. Hash tables would have an overhead of at least two pointers per symbol compared to something directly attached to the SYMBOL_REF; there would be one pointer for the hash table chain and one to identify the symbol. There's no temporal advantage either since the table entry will live as long as the symbol does. Because we're deciding up-front whether or not to put objects in blocks, we know when creating a SYMBOL_REF whether we want the extra information. I therefore created an extended SYMBOL_REF structure called block_symbol: ------------------------------------------------------------------------ /* This structure remembers the position of a SYMBOL_REF within an object_block structure. A SYMBOL_REF only provides this information if SYMBOL_REF_IN_BLOCK_P is true. */ struct block_symbol GTY(()) { /* The usual SYMBOL_REF fields. */ rtunion GTY ((skip)) fld[3]; /* The block that contains this object. */ struct object_block *block; /* The offset of this object from the start of its block. It is negative if the symbol has not yet been assigned an offset. */ HOST_WIDE_INT offset; }; ------------------------------------------------------------------------ and added block_symbol to the rtx union. A new SYMBOL_REF flag called SYMBOL_FLAG_IN_BLOCK indicates whether this information is available. [ The usual way of adding data to an rtx is to add new rtunion fields, so using a structure here might seem a little odd. The problem is that rtunion no longer has a HOST_WIDE_INT field that we can use for the offset: it used to, but it led to unnecessarily-bloated rtxes on ILP32 hosts compiling for need_64bit_hwint targets. ] With this approach, the memory overhead per block symbol is one pointer (the block) and one HOST_WIDE_INT (the offset within the block). The patch adds two new accessors, SYMBOL_REF_BLOCK and SYMBOL_REF_BLOCK_OFFSET, for getting this information. One potentially controversial aspect of this approach is that the size of an rtx no longer depends entirely on its code. SYMBOL_REFs with SYMBOL_FLAG_IN_BLOCK set are larger than those without. This led to the following changes: - RTX_SIZE becomes RTX_CODE_SIZE, to emphasise that it provides the base size for a particular code. - A new function, rtx_size, provides the size of an existing rtx. - shallow_copy_rtx uses rtx_size (x) instead of RTX_SIZE (GET_CODE (x)). - Code that allocates new rtxes uses RTX_CODE_SIZE instead of RTX_SIZE. - Code that copies old rtxes uses shallow_copy_rtx instead of inline copies. This is pretty trivial, and I see the last point as a clean-up. I've split out these changes to ease review, although rtx_size's use of SYMBOL_REF_IN_BLOCK_P means that it isn't a stand-alone patch. Also, the garbage collector must see the new SYMBOL_REF fields iff SYMBOL_FLAG_IN_BLOCK is true. This too is pretty trivial. A note on rtx constant pools ---------------------------- We currently maintain a separate constant pool for each function. There's not really any point doing this if use_blocks_for_constant_p returns true; if the constant is going to be written out at the end of compilation anyway, we might as well allow it to be shared between functions. However, the same is sometimes true even if !use_blocks_for_constant_p. E.g. powerpc TOC entries can be reused by different functions, even though we'd never want to put them into blocks. The powerpc backend gets around the current per-function pools by using aliases for duplicate TOC entries. I plan to submit a follow-on patch that adds a shared constant pool and that gives backends a chance to choose whether a constant goes in this shared pool or in the function-local one. Control of section anchors ========================== The use of section anchors is controlled by a new switch, -fsection-anchors. It isn't activated by any of the -O options since its benefit is so target-dependent. If it does turn out to be a consistent win for some configurations, I think the right thing to do would be to set flag_section_anchors in the backend's OPTIMIZATION_OPTIONS. There are some new hooks for controlling the use of section anchors: /* Output the definition of a section anchor. */ void (*output_anchor) (rtx); /* The minimum and maximum byte offsets for anchored addresses. */ HOST_WIDE_INT min_anchor_offset; HOST_WIDE_INT max_anchor_offset; /* True if section anchors can be used to access the given symbol. */ bool (* use_anchors_for_symbol_p) (rtx); The texinfo documentation describes these hooks in more detail. Using anchored addresses ======================== A new function, use_anchored_address, converts a MEM whose address is a block symbol into a MEM whose address uses an anchor. The rtl expanders call this function after reading a DECL_RTL, after calling force_const_to_mem, or after calling output_constant_def. Most addresses are passed through validate_mem, which acts as a convenient (and, I hope, logical) point to call use_anchored_address. Only a few places need to call use_anchored_address directly. If an anchor A is used several times in a function, we would prefer to calculate A's address once, store it in a register, and reuse that register for all accesses involving A. (It probably isn't worth using -fsection-anchors on targets where this isn't true.) However, if A is only used once in a function, and is used for a symbol at offset X from the anchor, some targets may prefer to allow CSE or combine to create an address of the form: (const (plus (symbol_ref A) (const_int X))). Other targets might not (such as powerpc with -mminimal-toc). use_anchored_address therefore forces the anchor into a register, but makes no attempt to hoist the register initialisation code, or to reuse registers between accesses. The rtl optimisers can then decide whether it's worth hoisting and reusing the registers, just as they would for ordinary symbolic accesses. Aliasing ======== I was worried that using anchors might adversely affect the alias.c base-analysis code. If we have indirect accesses like the following: int x[4], y[4]; r := &anchor int *r1 = x; ----> r1 := r + (x - &anchor); int *r2 = y; ----> r2 := r + (y - &anchor); ...indirect accesses to x and y using r1 and r2... we can no longer tell that accesses based on r1 are always to "x" and those based on r2 are always to "y". [ Note 1: I'm only talking about indirect accesses here. Direct accesses aren't a problem, because the MEM_ATTRs provide the information we need. Note 2: We can't use the offsets to work out the underlying object. Consider accesses like: r1[10000 - i] If r1 is only used once, we might rewrite this access to use the rtl equivalent of: r3 := r + (x - anchor) + 10000 r3[-i] But r3 has no relation to the variable at offset (x - anchor) + 10000. It would be difficult to stop this sort of transformation from happening, even if we wanted to. ] Fortunately, this doesn't seem to have much effect on SPEC, and hasn't had a noticable impact in the examples I've looked at in detail. Perhaps it make more difference on targets with no offset addressing? Other random points =================== - The tree-ssa-loop-ivopts.c change was discussed here: http://gcc.gnu.org/ml/gcc/2005-09/msg00850.html - I moved the rtx vector definitions to rtl.h so that they can be used in rtl.h. - The idea of writing out blocks of objects at the end of compilation is really doing things unit-at-a-time. -fsection-anchors is therefore conditional on -funit-at-a-time. - Only the powerpc backend uses section anchors at the moment. (I have plans to add MIPS at some point.) SPEC results ============ I ran SPEC on powerpc64 with the options: -mcpu=970 -O3 -fomit-frame-pointer -fno-reorder-blocks (-fno-reorder-blocks because block reordering gives worse code in several cases). The base results had "-fno-section-anchors" and the peak ones had "-fsection-anchors". As I said at the beginning, the main aim was to reduce GOT size, so the results for that are as follows: ----------------------------------------------------------------------- .got size in bytes Benchmark Before After Delta ========= ====== ===== ==== 168.wupwise 944 392 -552 41.53% 171.swim 328 272 -56 82.93% 172.mgrid 352 264 -88 75.00% 173.applu 1104 912 -192 82.61% 177.mesa 11240 10984 -256 97.72% 178.galgel 5264 3456 -1808 65.65% 179.art 824 592 -232 71.84% 183.equake 1192 920 -272 77.18% 187.facerec 1216 520 -696 42.76% 188.ammp 5096 4880 -216 95.76% 189.lucas 688 264 -424 38.37% 191.fma3d 27888 10672 -17216 38.27% 200.sixtrack 53376 20120 -33256 37.69% 301.apsi 2576 1608 -968 62.42% 164.gzip 2688 1968 -720 73.21% 175.vpr 7888 7400 -488 93.81% 176.gcc 43376 33688 -9688 77.67% 181.mcf 456 408 -48 89.47% 186.crafty 16872 16360 -512 96.97% 197.parser 6008 4864 -1144 80.96% 252.eon 25296 10112 -15184 39.97% 253.perlbmk 21056 20800 -256 98.78% 254.gap 22848 21104 -1744 92.37% 255.vortex 25848 23216 -2632 89.82% 256.bzip2 1448 1120 -328 77.35% 300.twolf 14208 14096 -112 99.21% Total 300080 210992 -89088 70.31% ----------------------------------------------------------------------- The actual SPEC results are attached below, but the summary is: ----------------------------------------------------------------------- 164.gzip 1400 187 747* 1400 184 762* 175.vpr 1400 303 462* 1400 303 462* 176.gcc 1100 138 800* 1100 127 868* 181.mcf 1800 540 334* 1800 540 334* 186.crafty 1000 85.3 1173* 1000 84.9 1178* 197.parser 1800 320 563* 1800 328 550* 252.eon 1300 102 1272* 1300 95.4 1363* 253.perlbmk 1800 264 683* 1800 259 696* 254.gap 1100 150 732* 1100 156 705* 255.vortex 1900 231 822* 1900 225 843* 256.bzip2 1500 248 606* 1500 242 619* 300.twolf 3000 602 499* 3000 602 498* SPECint_base2000 679 SPECint2000 689 168.wupwise 1600 154 1042* 1600 157 1020* 171.swim 3100 1174 264* 3100 1173 264* 172.mgrid 1800 312 576* 1800 312 577* 173.applu 2100 286 733* 2100 286 734* 177.mesa 1400 141 993* 1400 138 1016* 178.galgel 2900 230 1262* 2900 231 1254* 179.art 2600 401 648* 2600 400 650* 183.equake 1300 128 1014* 1300 127 1024* 187.facerec 1900 234 811* 1900 237 800* 188.ammp 2200 558 394* 2200 559 394* 189.lucas 2000 240 833* 2000 241 830* 191.fma3d 2100 219 959* 2100 223 943* 200.sixtrack 1100 211 520* 1100 211 521* 301.apsi 2600 453 574* 2600 453 574* SPECfp_base2000 704 SPECfp2000 703 ----------------------------------------------------------------------- Testing CSiBE with -Os -fno-section-anchors vs. -Os -fsection-anchors suggests it's a minor size win for -Os. Again, the comparison is attached below. Testing ======= Bootstrapped & regression tested on powerpc64-linux-gnu: - once as-is - once with -fsection-anchors enabled by default and with normal checking options - once with -fsection-anchors enabled by default and with rtl checking enabled (i.e. --enable-checking=yes,rtl) The only extra failures were in objc: FAIL: objc/execute/class-13.m compilation FAIL: objc/execute/class-6.m compilation FAIL: objc/execute/object_is_class.m compilation FAIL: objc/execute/object_is_meta_class.m compilation (options snipped). These failures are caused by objc creating two decls for _OBJC_METACLASS_* objects. I think it would be better to modify the old decl once the initialiser is known, but I don't know enough about objc to do that. Also bootstrapped & regression tested on i686-linux-gnu. OK for trunk? Richard * cselib.c (cselib_init): Change RTX_SIZE to RTX_CODE_SIZE. * emit-rtl.c (copy_rtx_if_shared_1): Use shallow_copy_rtx. (copy_insn_1): Likewise. Don't copy each field individually. Reindent. * read-rtl.c (apply_macro_to_rtx): Use RTX_CODE_SIZE instead of RTX_SIZE. * reload1.c (eliminate_regs): Use shallow_copy_rtx. * rtl.c (rtx_size): Rename variable to... (rtx_code_size): ...this. (rtx_size): New function. (rtx_alloc_stat): Use RTX_CODE_SIZE instead of RTX_SIZE. (copy_rtx): Use shallow_copy_rtx. Don't copy each field individually. Reindent. (shallow_copy_rtx_stat): Use rtx_size instead of RTX_SIZE. * rtl.h (rtx_code_size): New variable. (rtx_size): Change from a variable to a function. (RTX_SIZE): Rename to... (RTX_CODE_SIZE): ...this.
Attachment:
rtx-code-size.diff
Description: Text document
* doc/tm.texi (TARGET_USE_BLOCKS_FOR_CONSTANT_P): Document. (Anchored Addresses): New section. * doc/invoke.texi (-fsection-anchors): Document. * doc/rtl.texi (SYMBOL_REF_IN_BLOCK_P, SYMBOL_FLAG_IN_BLOCK): Likewise. (SYMBOL_REF_ANCHOR_P, SYMBOL_FLAG_ANCHOR): Likewise. (SYMBOL_REF_BLOCK, SYMBOL_REF_BLOCK_OFFSET): Likewise. * hooks.c (hook_bool_mode_rtx_false): New function. * hooks.h (hook_bool_mode_rtx_false): Declare. * gengtype.c (create_optional_field): New function. (adjust_field_rtx_def): Add the "block_sym" field for SYMBOL_REFs when SYMBOL_REF_IN_BLOCK_P is true. * target.h (output_anchor, use_blocks_for_constant_p): New hooks. (min_anchor_offset, max_anchor_offset): Likewise. (use_anchors_for_symbol_p): New hook. * toplev.c (compile_file): Call output_object_blocks. (target_supports_section_anchors_p): New function. (process_options): Check that -fsection-anchors is only used on targets that support it and when -funit-at-a-time is in effect. * tree-ssa-loop-ivopts.c (prepare_decl_rtl): Only create DECL_RTL if the decl doesn't have one. * dwarf2out.c: Remove instantiations of VEC(rtx,gc). * expr.c (emit_move_multi_word, emit_move_insn): Pass the result of force_const_mem through use_anchored_address. (expand_expr_constant): New function. (expand_expr_addr_expr_1): Call it. Use the same modifier when calling expand_expr for INDIRECT_REF. (expand_expr_real_1): Pass DECL_RTL through use_anchored_address for all modifiers except EXPAND_INITIALIZER. Use expand_expr_constant. * expr.h (use_anchored_address): Declare. * loop-unroll.c: Don't declare rtx vectors here. * explow.c: Include output.h. (validize_mem): Call use_anchored_address. (use_anchored_address): New function. * common.opt (-fsection-anchors): New switch. * varasm.c (object_block_htab, anchor_labelno): New variables. (hash_section, object_block_entry_eq, object_block_entry_hash) (use_object_blocks_p, get_block_for_section, create_block_symbol) (use_blocks_for_decl_p, change_symbol_section): New functions. (get_variable_section): New function, split out from assemble_variable. (make_decl_rtl): Create a block symbol if use_object_blocks_p and use_blocks_for_decl_p say so. Use change_symbol_section if the symbol has already been created. (assemble_variable_contents): New function, split out from... (assemble_variable): ...here. Don't output any code for block symbols; just pass them to place_block_symbol. Use get_variable_section and assemble_variable_contents. (get_constant_alignment, get_constant_section, get_constant_size): New functions, split from output_constant_def_contents. (build_constant_desc): Create a block symbol if use_object_blocks_p says so. Or into SYMBOL_REF_FLAGS. (assemble_constant_contents): New function, split from... (output_constant_def_contents): ...here. Don't output any code for block symbols; just pass them to place_section_symbol. Use get_constant_section and get_constant_alignment. (force_const_mem): Create a block symbol if use_object_blocks_p and use_blocks_for_constant_p say so. Or into SYMBOL_REF_FLAGS. (output_constant_pool_1): Add an explicit alignment argument. Don't switch sections here. (output_constant_pool): Adjust call to output_constant_pool_1. Switch sections here instead. Don't output anything for block symbols; just pass them to place_block_symbol. (init_varasm_once): Initialize object_block_htab. (default_encode_section_info): Keep the old SYMBOL_FLAG_IN_BLOCK. (default_asm_output_anchor, default_use_aenchors_for_symbol_p) (place_block_symbol, get_section_anchor, output_object_block) (output_object_block_htab, output_object_blocks): New functions. * target-def.h (TARGET_ASM_OUTPUT_ANCHOR): New macro. (TARGET_ASM_OUT): Include it. (TARGET_USE_BLOCKS_FOR_CONSTANT_P): New macro. (TARGET_MIN_ANCHOR_OFFSET, TARGET_MAX_ANCHOR_OFFSET): New macros. (TARGET_USE_ANCHORS_FOR_SYMBOL_P): New macro. (TARGET_INITIALIZER): Include them. * rtl.c (rtl_check_failed_block_symbol): New function. * rtl.h: Include vec.h. Declare heap and gc rtx vectors. (block_symbol, object_block): New structures. (rtx_def): Add a block_symbol field to the union. (BLOCK_SYMBOL_CHECK): New macro. (rtl_check_failed_block_symbol): Declare. (SYMBOL_FLAG_IN_BLOCK, SYMBOL_FLAG_ANCHOR): New SYMBOL_REF flags. (SYMBOL_REF_IN_BLOCK_P, SYMBOL_REF_ANCHOR_P): New predicates. (SYMBOL_FLAG_MACH_DEP_SHIFT): Bump by 2. (SYMBOL_REF_BLOCK, SYMBOL_REF_BLOCK_OFFSET): New accessors. * output.h (output_section_symbols): Declare. (object_block): Name structure. (place_section_symbol, get_section_anchor, default_asm_output_anchor) (default_use_anchors_for_symbol_p): Declare. * Makefile.in (RTL_BASE_H): Add vec.h. (explow.o): Depend on output.h. * config/rs6000/rs6000.c (TARGET_MIN_ANCHOR_OFFSET): Override default. (TARGET_MAX_ANCHOR_OFFSET): Likewise. (TARGET_USE_BLOCKS_FOR_CONSTANT_P): Likewise. (rs6000_use_blocks_for_constant_p): New function.
Attachment:
section-anchor.diff
Description: Text document
Attachment:
CINT2000.txt
Description: Text document
Attachment:
CFP2000.txt
Description: Text document
Attachment:
CSiBE.txt
Description: Text document
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |