This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[tree-ssa] PATCH: Feedback-based prefetching


This patch adds a new optimization to the tree-ssa branch: feedback-based prefetching. The
optimization works on the assumption there will be two compilation passes, with a training
run in between. During the first pass, we make good guesses as to places that are likely to
benefit from the insertion of prefetches, and we insert prefetches in all those places (mostly
based on induction variables, and the current gcc induction-variable based prefetching
mechanism). In addition to inserting lots of prefetch instructions, the first pass also inserts
profile data collection code, to collect information during the training run about the
effectiveness of the inserted prefetches. During the training run, this data is collected, and
written to a file, "TRACE_FILE", which goes into the current working directory. During the
second compilation pass, we look at the prefetch profile data collected in TRACE_FILE, to
determine which prefetch instructions were really beneficial. We then insert those really
good prefetches in the final code (and omit all the others).

When running this optimization on the SPEC2000 benchmarks, we found that performance
improved anywhere from 0% to 9%, with an average improvment across all the benchmarks
of about 2%.

This optimization has been built and tested on Apple G4 and G5 machines, running Mac OS X.

It bootstraps and correctly runs all the SPEC 2000 C integer benchmarks that the tree-ssa
branch currently builds and runs (NOT perlbmk or or vortex). I have not run the DejaGnu
tests on it, as it requires two compilation passes with an intervening training run, and most
DejaGnu tests are not set up that way.

This optimization can be controlled by either of two new flags: "-fprefetch-loop-strides" and
"-fprefetch-stores-only". The former flag will insert both load and store prefetches, the latter
flag will insert only store prefetches.

Below are the ChangeLog entry for this optimization, the cvs diff of the sources I changed,
and a new file, trace_strides.c, which contains the new code for collecting and writing the
prefetch profiling data.


Is this okay to commit to the tree-ssa branch?

-- Caroline Tice
ctice@apple. com

---------------------------
Development Technologies
Apple Computer, Inc.
(408) 974-1656


Thu Dec 11 13:17:56 PST 2003 Caroline Tice <ctice@apple.com>
*Makefile.in (LIB2FUNCS_ST): Add _pm to the list
(LIBGCC_DEPS): Add trace_strides.c to the list
*common.opt: Add fprefetch-loop-strides and fprefetch-stores-only
*flags.h: Add variables flag_prefetch_loop_strides and
flag_prefetch_stores_only
*libgcc2.c: Add ifdef to include trace_strides.c if "L_pm" is
defined
*opts.c (common_handle_option): Add cases OPT_fprefetch_loop_strides
and OPT_fprefetch_stores_only.
*toplev.c: Add variables flag_prefetch_loop_strides,
flag_prefetch_stores_only
(f_options): Add options prefetch-loop-strides and prefetch-stores-
only.
(rest_of_handle_loop_optimize): Add and initialize variable
do_prefetch_loop_strides, and add it to the flags passed to the
second call to loop_optimize.
*trace_strides.c: New file. Contains code for collecting and writing
prefetch profiling data.
*loop.h (LOOP_PREFETCH_STRIDES): Define a new global constant (for
loop flags), value is 16.
*loop.c (PREFETCH_BLOCK): Change value from 32 to 128.
(PREFETCH_BLOCKS_BEFORE_LOOK_MAX): Change value from 6 to 3
(PREFETCH_ONLY_DENSE_MEM): Change value from 1 to 0
(PREFETCH_DENSE_MEM_PM): Add global constant, value is 110.
(PREFETCH_NO_EXTREME_STRIDE): Change value from 1 to 0.
(PREFETCH_EXTREME_DIFFERENCE): Change value from 4096 to 2048.
(PREFETCH_NO_REVERSE_ORDER): Change value from 1 to 0.
(MAX_CANDIDATE_PM_INSNS): New global constant, value is 3000.
(CORRECT_RATIO_THRESHOLD): New global constant, value is 0.24.
(DENSITY_THRESHOLD): New global constant, value is 0.24.
(GLOBAL_DENSITY_THRESHOLD): New global constant, value is 0.05.
(profile_info_file): New global variable.
(line_buf): New global variable.
(do_once): New global variable.
(strides_pm_info_available): New global variable.
(my_fn_id): New global variable.
(pm_cand_cnt): New global variable.
(name_idx): New global variable.
(name_length): New global variable.
(tf_fn_id): New global variable.
(pref_insn_id): New global variable.
(mem_insn_id): New global variable.
(num_invoked): New global variable.
(num_correct): New global variable.
(pm_idx): New global variable.
(candidate_mem_insns): New global variable.
(correct_count_insns): New global variable.
(pm_num_invoked): New global variable.
(this_mem_insn): New global variable.
(rtx pm_init_call): New global variable.
(LT_LDT): New macro, used to determine if a profiled prefetch was
"good enough" to keep or not.

(LT_ACCURACY_THRESHOLD): New macro used to help determine if a
profiled prefetch was "good enough" to keep or not.
(LT_GDT): New macro, used to help determine if a profiled prefetch
was "good enough" to keep or not.
(emit_library_call_pm_collect_stats_before): New macro used to
insert prefetch profiling code into code being compiled.
(emit_prefetch_instructions): Add "flags" parameter; Add code to
check for TRACE_FILE, and parse/use its contents for generating
prefetches; Add code to emit profiling code for prefetches if
TRACE_FILE doesn't exist. All this is predicated on the new flags
being used and turned on. Otherwise works in the original manner.

(strength_reduce): Added (flags & LOOP_PREFETCH_STRIDES) as an "or"
condition for calling emit_prefetch_instructions, and added "flags"
as an argument to the call.


Attachment: gcc5-prefetch-phase1.txt
Description: Text document




Attachment: trace_strides.c
Description: Text document



Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]