This is the mail archive of the
mailing list for the GCC project.
Re: [RFC] [PATCH, i386] Adjust unroll factor for bdver3 and bdver4
- From: Uros Bizjak <ubizjak at gmail dot com>
- To: "Gopalasubramanian, Ganesh" <Ganesh dot Gopalasubramanian at amd dot com>
- Cc: "gcc-patches at gcc dot gnu dot org" <gcc-patches at gcc dot gnu dot org>, "Richard Guenther <richard dot guenther at gmail dot com> (richard dot guenther at gmail dot com)" <richard dot guenther at gmail dot com>, "borntraeger at de dot ibm dot com" <borntraeger at de dot ibm dot com>, "H.J. Lu (hjl dot tools at gmail dot com)" <hjl dot tools at gmail dot com>, "Jakub Jelinek (jakub at redhat dot com)" <jakub at redhat dot com>
- Date: Fri, 22 Nov 2013 09:15:42 +0100
- Subject: Re: [RFC] [PATCH, i386] Adjust unroll factor for bdver3 and bdver4
- Authentication-results: sourceware.org; auth=none
- References: <EB4625145972F94C9680D8CADD6516155E73BF13 at SATLEXDAG02 dot amd dot com>
On Wed, Nov 20, 2013 at 7:26 PM, Gopalasubramanian, Ganesh
> Steamroller processors contain a loop predictor and a loop buffer, which may make unrolling small loops less important.
> When unrolling small loops for steamroller, making the unrolled loop fit in the loop buffer should be a priority.
> This patch uses a heuristic approach (number of memory references) to decide the unrolling factor for small loops.
> This patch has some noise in SPEC 2006 results.
> Bootstrapping passes.
> I would like to know your comments before committing.
Please split the patch to target-dependant and target-independant
part, and get target-idependant part reviewed first.
+ if (ix86_tune != PROCESSOR_BDVER3 && ix86_tune != PROCESSOR_BDVER4)
+ return nunroll;
is wrong. You should introduce tune variable (as H.J. suggested) and
check that variable here. Target dependant tuning options should be in
x86-tune.def, so everything regarding tuning can be found in one
+ if (INSN_P (insn) && INSN_CODE (insn) != -1)
+ for_each_rtx (&insn, (rtx_function) ix86_loop_memcount,
if (NONDEBUG_INSN_P (insn))
for_each_rtx (&PATTERN(insn), ...);
otherwise your heuristics will depend on -g compile option.
+ if ( (mem_count*nunroll) <= 32)
+ix86_loop_memcount (rtx *x, unsigned *mem_count)
+ if (*x != NULL_RTX && MEM_P (*x))
*x will never be null for active insns.