This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH GCC][12/13]Workaround reduction statements for distribution
On Tue, Jun 20, 2017 at 12:36 PM, Richard Biener
<richard.guenther@gmail.com> wrote:
> On Tue, Jun 20, 2017 at 11:20 AM, Bin.Cheng <amker.cheng@gmail.com> wrote:
>> On Fri, Jun 16, 2017 at 6:15 PM, Bin.Cheng <amker.cheng@gmail.com> wrote:
>>> On Fri, Jun 16, 2017 at 11:21 AM, Richard Biener
>>> <richard.guenther@gmail.com> wrote:
>>>> On Mon, Jun 12, 2017 at 7:03 PM, Bin Cheng <Bin.Cheng@arm.com> wrote:
>>>>> Hi,
>>>>> For now, loop distribution handles variables used outside of loop as reduction.
>>>>> This is inaccurate because all partitions contain statement defining induction
>>>>> vars.
>>>>
>>>> But final induction values are usually not used outside of the loop...
>>> This is in actuality for induction variable which is used outside of the loop.
>>>>
>>>> What is missing is loop distribution trying to change partition order. In fact
>>>> we somehow assume we can move a reduction across a detected builtin
>>>> (I don't remember if we ever check for validity of that...).
>>> Hmm, I am not sure when we can't. If there is any dependence between
>>> builtin/reduction partitions, it should be captured by RDG or PG,
>>> otherwise the partitions are independent and can be freely ordered as
>>> long as reduction partition is scheduled last?
>>>>
>>>>> Ideally we should factor out scev-propagation as a standalone interface
>>>>> which can be called when necessary. Before that, this patch simply workarounds
>>>>> reduction issue by checking if the statement belongs to all partitions. If yes,
>>>>> the reduction must be computed in the last partition no matter how the loop is
>>>>> distributed.
>>>>> Bootstrap and test on x86_64 and AArch64. Is it OK?
>>>>
>>>> stmt_in_all_partitions is not kept up-to-date during partition merging and if
>>>> merging makes the reduction partition(s) pass the stmt_in_all_partitions
>>>> test your simple workaround doesn't work ...
>>> I think it doesn't matter because:
>>> A) it's really workaround for induction variables. In general,
>>> induction variables are included by all partition.
>>> B) After classify partition, we immediately fuses all reduction
>>> partitions. More stmt_in_all_partitions means we are fusing
>>> non-reduction partition with reduction partition, so the newly
>>> generated (stmt_in_all_partitions) are actually not reduction
>>> statements. The workaround won't work anyway even the bitmap is
>>> maintained.
>>>>
>>>> As written it's a valid optimization but can you please note it's limitation in
>>>> some comment please?
>>> Yeah, I will add comment explaining it.
>> Comment added in new version patch. It also computes bitmap outside
>> now, is it OK?
>
> Ok. Can you add a testcase for this as well please? I think the
> series up to this
> is now fully reviewed, I defered 1/n (the new IFN) to the last one
> containing the
> runtime versioning. Can you re-post that (you can merge with the IFN patch)
> to apply after the series has been applied up to this?
Test case added.
Thanks,
bin
2017-06-20 Bin Cheng <bin.cheng@arm.com>
* tree-loop-distribution.c (classify_partition): New parameter and
better handle reduction statement.
(rdg_build_partitions): Revise comment.
(distribute_loop): Compute statements in all partitions and pass it
to classify_partition.
gcc/testsuite/ChangeLog
2017-06-20 Bin Cheng <bin.cheng@arm.com>
* gcc.dg/tree-ssa/ldist-26.c: New test.
From b16a4839f3211737dccc3ff92ab2c4f325907cd3 Mon Sep 17 00:00:00 2001
From: Bin Cheng <binche01@e108451-lin.cambridge.arm.com>
Date: Thu, 22 Jun 2017 17:16:58 +0100
Subject: [PATCH 11/13] reduction-workaround-20170607.txt
---
gcc/testsuite/gcc.dg/tree-ssa/ldist-26.c | 36 ++++++++++++++++++++++++++
gcc/tree-loop-distribution.c | 43 ++++++++++++++++++++++++--------
2 files changed, 68 insertions(+), 11 deletions(-)
create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ldist-26.c
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ldist-26.c b/gcc/testsuite/gcc.dg/tree-ssa/ldist-26.c
new file mode 100644
index 0000000..3a69884
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ldist-26.c
@@ -0,0 +1,36 @@
+/* { dg-do run } */
+/* { dg-options "-O3 -ftree-loop-distribution -fdump-tree-ldist-details" } */
+
+extern void abort (void);
+
+int a[130], b[128], c[128];
+
+int __attribute__((noinline,noclone))
+foo (int len, int x)
+{
+ int i;
+ for (i = 1; i <= len; ++i)
+ {
+ a[i] = a[i + 2] + 1;
+ b[i] = 0;
+ a[i + 1] = a[i] - 3;
+ if (i < x)
+ c[i] = a[i];
+ }
+ return i;
+}
+
+int main()
+{
+ int i;
+ for (i = 0; i < 130; ++i)
+ a[i] = i;
+ foo (127, 67);
+ if (a[0] != 0 || a[1] != 4 || a[127] != 130)
+ abort ();
+ return 0;
+}
+
+/* { dg-final { scan-tree-dump "distributed: split to 2 loops and 0 library calls" "ldist" } } */
+/* { dg-final { scan-tree-dump "distributed: split to 1 loops and 1 library calls" "ldist" } } */
+/* { dg-final { scan-tree-dump "generated memset zero" "ldist" } } */
diff --git a/gcc/tree-loop-distribution.c b/gcc/tree-loop-distribution.c
index 87fdc15..b15ec04 100644
--- a/gcc/tree-loop-distribution.c
+++ b/gcc/tree-loop-distribution.c
@@ -1254,17 +1254,18 @@ build_rdg_partition_for_vertex (struct graph *rdg, int v)
}
/* Classifies the builtin kind we can generate for PARTITION of RDG and LOOP.
- For the moment we detect only the memset zero pattern. */
+ For the moment we detect memset, memcpy and memmove patterns. Bitmap
+ STMT_IN_ALL_PARTITIONS contains statements belonging to all partitions. */
static void
-classify_partition (loop_p loop, struct graph *rdg, partition *partition)
+classify_partition (loop_p loop, struct graph *rdg, partition *partition,
+ bitmap stmt_in_all_partitions)
{
bitmap_iterator bi;
unsigned i;
tree nb_iter;
data_reference_p single_load, single_store;
- bool volatiles_p = false;
- bool plus_one = false;
+ bool volatiles_p = false, plus_one = false, has_reduction = false;
partition->kind = PKIND_NORMAL;
partition->main_dr = NULL;
@@ -1279,16 +1280,31 @@ classify_partition (loop_p loop, struct graph *rdg, partition *partition)
if (gimple_has_volatile_ops (stmt))
volatiles_p = true;
- /* If the stmt has uses outside of the loop mark it as reduction. */
+ /* If the stmt is not included by all partitions and there is uses
+ outside of the loop, then mark the partition as reduction. */
if (stmt_has_scalar_dependences_outside_loop (loop, stmt))
{
- partition->reduction_p = true;
- return;
+ /* Due to limitation in the transform phase we have to fuse all
+ reduction partitions. As a result, this could cancel valid
+ loop distribution especially for loop that induction variable
+ is used outside of loop. To workaround this issue, we skip
+ marking partition as reudction if the reduction stmt belongs
+ to all partitions. In such case, reduction will be computed
+ correctly no matter how partitions are fused/distributed. */
+ if (!bitmap_bit_p (stmt_in_all_partitions, i))
+ {
+ partition->reduction_p = true;
+ return;
+ }
+ has_reduction = true;
}
}
/* Perform general partition disqualification for builtins. */
if (volatiles_p
+ /* Simple workaround to prevent classifying the partition as builtin
+ if it contains any use outside of loop. */
+ || has_reduction
|| !flag_tree_loop_distribute_patterns)
return;
@@ -1461,9 +1477,9 @@ share_memory_accesses (struct graph *rdg,
return false;
}
-/* Aggregate several components into a useful partition that is
- registered in the PARTITIONS vector. Partitions will be
- distributed in different loops. */
+/* For each seed statement in STARTING_STMTS, this function builds
+ partition for it by adding depended statements according to RDG.
+ All partitions are recorded in PARTITIONS. */
static void
rdg_build_partitions (struct graph *rdg,
@@ -1731,10 +1747,15 @@ distribute_loop (struct loop *loop, vec<gimple *> stmts,
auto_vec<struct partition *, 3> partitions;
rdg_build_partitions (rdg, stmts, &partitions);
+ auto_bitmap stmt_in_all_partitions;
+ bitmap_copy (stmt_in_all_partitions, partitions[0]->stmts);
+ for (i = 1; partitions.iterate (i, &partition); ++i)
+ bitmap_and_into (stmt_in_all_partitions, partitions[i]->stmts);
+
any_builtin = false;
FOR_EACH_VEC_ELT (partitions, i, partition)
{
- classify_partition (loop, rdg, partition);
+ classify_partition (loop, rdg, partition, stmt_in_all_partitions);
any_builtin |= partition_builtin_p (partition);
}
--
1.9.1