This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |
Other format: | [Raw text] |
We're still spending an unusually large amount of time in the GCSE code. Consider for example the profile when we compile fold-const.i with the mainline sources: CPROP 1 : 0.12 ( 1%) usr 0.00 ( 0%) sys 0.08 ( 1%) PRE : 0.81 ( 8%) usr 0.02 ( 2%) sys 0.86 ( 7%) CPROP 2 : 0.10 ( 1%) usr 0.01 ( 1%) sys 0.11 ( 1%) TOTAL : 10.45 Most of that .81 seconds is burned in compute_transp doing some list walking and lots of unnecessary loads/test sequences. This change significantly improves the situation with fold-const.i: CPROP 1 : 0.10 ( 1%) usr 0.00 ( 0%) sys 0.13 ( 1%) PRE : 0.21 ( 2%) usr 0.02 ( 2%) sys 0.27 ( 2%) CPROP 2 : 0.10 ( 1%) usr 0.00 ( 0%) sys 0.08 ( 1%) TOTAL : 9.90 Not bad at all. On a larger sample of codes our total speedup is just shy of 1%. How do we get this nice little speedup.... :-) gcse.c has an array indexed by block numbers which contains a list of all the insns which might modify memory in the given block. When we compute the locally transitive property we walk those lists to determine which MEM expressions are transparent through each block with code that looks like this: FOR_EACH_BB (bb) { rtx list_entry = canon_modify_mem_list[bb->index]; while (list_entry) { rtx dest, dest_addr; if (CALL_P (XEXP (list_entry, 0))) { if (set_p) SET_BIT (bmap[bb->index], indx); else RESET_BIT (bmap[bb->index], indx); break; } /* LIST_ENTRY must be an INSN of some kind that sets memory. Examine each hunk of memory that is modified. */ dest = XEXP (list_entry, 0); list_entry = XEXP (list_entry, 1); dest_addr = XEXP (list_entry, 0); if (canon_true_dependence (dest, GET_MODE (dest), dest_addr, x, rtx_addr_varies_p)) { if (set_p) SET_BIT (bmap[bb->index], indx); else RESET_BIT (bmap[bb->index], indx); break; } list_entry = XEXP (list_entry, 1); } } x = XEXP (x, 0); goto repeat; Note carefully that we stop the list walk as soon as we hit a call or a conflicting memory write on our list. These walks actually get pretty expensive when you consider that we do them for every MEM expression we're tracking. We can significantly improve this with a couple of simple changes. First, we keep a bitmap of blocks which have calls. We can test that bitmap before we walk down CANON_MODIFY_MEM_LIST. This gives a small, but measurable speedup. Then we can use bitmap iterators to iterate over the blocks with calls and then blocks which write memory, but which do not have calls (instead of using FOR_EACH_BB). That gets us to just under a 1% compile-time speedup. Bootstrapped and regression tested on i686-pc-linux-gnu.
Attachment:
PPP
Description: Text document
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |