This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
RE: [Patch] Fix PR53397
Hi Richi,
(Snip)
> + (!cst_and_fits_in_hwi (step))
> + {
> + if( loop->inner != NULL)
> + {
> + if (dump_file && (dump_flags & TDF_DETAILS))
> + {
> + fprintf (dump_file, "Reference %p:\n", (void *) ref);
> + fprintf (dump_file, "(base " );
> + print_generic_expr (dump_file, base, TDF_SLIM);
> + fprintf (dump_file, ", step ");
> + print_generic_expr (dump_file, step, TDF_TREE);
> + fprintf (dump_file, ")\n");
No need to repeat this - all references are dumped when we gather them.
(Snip)
The dumping happens at "record_ref" which is called after these statements to record these references.
When the step is invariant we return from the function without recording the references.
so I thought of dumping the references here.
Is there a cleaner way to dump the references at one place?
Regards,
Venkat.
-----Original Message-----
From: Richard Guenther [mailto:rguenther@suse.de]
Sent: Tuesday, October 02, 2012 5:42 PM
To: Kumar, Venkataramanan
Cc: gcc-patches@gcc.gnu.org
Subject: Re: [Patch] Fix PR53397
On Mon, 1 Oct 2012, venkataramanan.kumar@amd.com wrote:
> Hi,
>
> The below patch fixes the FFT/Scimark regression caused by useless
> prefetch generation.
>
> This fix tries to make prefetch less aggressive by prefetching arrays
> in the inner loop, when the step is invariant in the entire loop nest.
>
> GCC currently tries to prefetch invariant steps when they are in the
> inner loop. But does not check if the step is variant in outer loops.
>
> In the scimark FFT case, the trip count of the inner loop varies by a
> non constant step, which is invariant in the inner loop.
> But the step variable is varying in outer loop. This makes inner loop
> trip count small (at run time varies sometimes as small as 1
> iteration)
>
> Prefetching ahead x iteration when the inner loop trip count is
> smaller than x leads to useless prefetches.
>
> Flag used: -O3 -march=amdfam10
>
> Before
> ** **
> ** SciMark2 Numeric Benchmark, see http://math.nist.gov/scimark **
> ** for details. (Results can be submitted to pozo@nist.gov) **
> ** **
> Using 2.00 seconds min time per kenel.
> Composite Score: 550.50
> FFT Mflops: 38.66 (N=1024)
> SOR Mflops: 617.61 (100 x 100)
> MonteCarlo: Mflops: 173.74
> Sparse matmult Mflops: 675.63 (N=1000, nz=5000)
> LU Mflops: 1246.88 (M=100, N=100)
>
>
> After
> ** **
> ** SciMark2 Numeric Benchmark, see http://math.nist.gov/scimark **
> ** for details. (Results can be submitted to pozo@nist.gov) **
> ** **
> Using 2.00 seconds min time per kenel.
> Composite Score: 639.20
> FFT Mflops: 479.19 (N=1024)
> SOR Mflops: 617.61 (100 x 100)
> MonteCarlo: Mflops: 173.18
> Sparse matmult Mflops: 679.13 (N=1000, nz=5000)
> LU Mflops: 1246.88 (M=100, N=100)
>
> GCC regression "make check -k" passes with x86_64-unknown-linux-gnu
> New tests that PASS:
>
> gcc.dg/pr53397-1.c scan-assembler prefetcht0 gcc.dg/pr53397-1.c
> scan-tree-dump aprefetch "Issued prefetch"
> gcc.dg/pr53397-1.c (test for excess errors) gcc.dg/pr53397-2.c
> scan-tree-dump aprefetch "loop variant step"
> gcc.dg/pr53397-2.c scan-tree-dump aprefetch "Not prefetching"
> gcc.dg/pr53397-2.c (test for excess errors)
>
>
> Checked CPU2006 and polyhedron on latest AMD processor, no regressions noted.
>
> Ok to commit in trunk?
>
> regards,
> Venkat
>
> gcc/ChangeLog
> +2012-10-01 Venkataramanan Kumar <venkataramanan.kumar@amd.com>
> +
> + * tree-ssa-loop-prefetch.c (gather_memory_references_ref):$
> + Perform non constant step prefetching in inner loop, only $
> + when it is invariant in the entire loop nest. $
> + * testsuite/gcc.dg/pr53397-1.c: New test case $
> + Checks we are prefecthing for loop invariant steps$
> + * testsuite/gcc.dg/pr53397-2.c: New test case$
> + Checks we are not prefecthing for loop variant steps
> +
>
>
> Index: gcc/testsuite/gcc.dg/pr53397-1.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/pr53397-1.c (revision 0)
> +++ gcc/testsuite/gcc.dg/pr53397-1.c (revision 0)
> @@ -0,0 +1,28 @@
> +/* Prefetching when the step is loop invariant. */
> +
> +/* { dg-do compile } */
> +/* { dg-options "-O3 -fprefetch-loop-arrays
> +-fdump-tree-aprefetch-details --param min-insn-to-prefetch-ratio=3
> +--param simultaneous-prefetches=10 -fdump-tree-aprefetch-details" }
> +*/
> +
> +
> +double data[16384];
> +void prefetch_when_non_constant_step_is_invariant(int step, int n) {
> + int a;
> + int b;
> + for (a = 1; a < step; a++) {
> + for (b = 0; b < n; b += 2 * step) {
> +
> + int i = 2*(b + a);
> + int j = 2*(b + a + step);
> +
> +
> + data[j] = data[i];
> + data[j+1] = data[i+1];
> + }
> + }
> +}
> +
> +/* { dg-final { scan-tree-dump "Issued prefetch" "aprefetch" } } */
> +/* { dg-final { scan-assembler "prefetcht0" } } */
This (and the case below) needs to be adjusted to only run on the appropriate hardware. See for example gcc.dg/tree-ssa/prefetch-8.c for how to do this.
> +/* { dg-final { cleanup-tree-dump "aprefetch" } } */
> Index: gcc/testsuite/gcc.dg/pr53397-2.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/pr53397-2.c (revision 0)
> +++ gcc/testsuite/gcc.dg/pr53397-2.c (revision 0)
> @@ -0,0 +1,29 @@
> +/* Not prefetching when the step is loop variant. */
> +
> +/* { dg-do compile } */
> +/* { dg-options "-O3 -fprefetch-loop-arrays
> +-fdump-tree-aprefetch-details --param min-insn-to-prefetch-ratio=3
> +--param simultaneous-prefetches=10 -fdump-tree-aprefetch-details" }
> +*/
> +
> +
> +double data[16384];
> +void donot_prefetch_when_non_constant_step_is_variant(int step, int
> +n) {
> + int a;
> + int b;
> + for (a = 1; a < step; a++,step*=2) {
> + for (b = 0; b < n; b += 2 * step) {
> +
> + int i = 2*(b + a);
> + int j = 2*(b + a + step);
> +
> +
> + data[j] = data[i];
> + data[j+1] = data[i+1];
> + }
> + }
> +}
> +
> +/* { dg-final { scan-tree-dump "Not prefetching" "aprefetch" } } */
> +/* { dg-final { scan-tree-dump "loop variant step" "aprefetch" } } */
> +
> +/* { dg-final { cleanup-tree-dump "aprefetch" } } */
> +
> Index: gcc/tree-ssa-loop-prefetch.c
> ===================================================================
> --- gcc/tree-ssa-loop-prefetch.c (revision 191642)
> +++ gcc/tree-ssa-loop-prefetch.c (working copy)
> @@ -523,6 +523,7 @@
> tree base, step;
> HOST_WIDE_INT delta;
> struct mem_ref_group *agrp;
> + loop_p ploop;
>
> if (get_base_address (ref) == NULL)
> return false;
> @@ -537,10 +538,50 @@
> if (may_be_nonaddressable_p (base))
> return false;
>
> - /* Limit non-constant step prefetching only to the innermost loops.
> */
> - if (!cst_and_fits_in_hwi (step) && loop->inner != NULL)
> - return false;
> + /* Limit non-constant step prefetching only to the innermost loops and
> + only when the step is invariant in the entire loop nest. */ if
> + (!cst_and_fits_in_hwi (step))
> + {
> + if( loop->inner != NULL)
> + {
> + if (dump_file && (dump_flags & TDF_DETAILS))
> + {
> + fprintf (dump_file, "Reference %p:\n", (void *) ref);
> + fprintf (dump_file, "(base " );
> + print_generic_expr (dump_file, base, TDF_SLIM);
> + fprintf (dump_file, ", step ");
> + print_generic_expr (dump_file, step, TDF_TREE);
> + fprintf (dump_file, ")\n");
No need to repeat this - all references are dumped when we gather them.
> + fprintf (dump_file, "Ignoring %p, non-constant step prefetching\
> + is limited to inner most loops \n",(void *) ref);
> + }
> + return false;
> + }
> + else
> + {
> + ploop = loop;
>
> + while (loop_outer (ploop))
> + {
> + if (!expr_invariant_in_loop_p (ploop , step))
> + {
> + if (dump_file && (dump_flags & TDF_DETAILS))
> + {
> + fprintf (dump_file, "Reference %p:\n", (void *) ref);
> + fprintf (dump_file, "(base " );
> + print_generic_expr (dump_file, base, TDF_SLIM);
> + fprintf (dump_file, ", step ");
> + print_generic_expr (dump_file, step, TDF_TREE);
> + fprintf (dump_file, ")\n");
Likewise.
> + fprintf (dump_file, "Not prefetching, ignoring %p due to loop variant step\n",(void *) ref);
> + }
> + return false;
> + }
> + ploop = loop_outer (ploop);
> + }
Instead of the loop checking expr_invariant_in_loop_p for each sub-loop you can query expr_invariant_in_loop_p once, for the outermost loop.
Please add a new function to cfgloop.h to access this outermost loop:
/* Returns the outermost loop of the loop nest that contains LOOP. */
static inline struct loop *
loop_outermost (const struct loop *loop) {
unsigned n = VEC_length (loop_p, loop->superloops);
if (n <= 1)
return loop;
return VEC_index (loop_p, loop->superloops, 1); }
Thanks,
Richard.