Ping: [PATCH v7 2/2] Don't move cold code out of loop by checking bb count
Xionghu Luo
luoxhu@linux.ibm.com
Wed Nov 24 05:15:23 GMT 2021
Gentle ping and is this patch still suitable for stage 3? Thanks.
[PATCH v7 2/2] Don't move cold code out of loop by checking bb count
https://gcc.gnu.org/pipermail/gcc-patches/2021-November/583911.html
On 2021/11/10 11:08, Xionghu Luo via Gcc-patches wrote:
>
>
> On 2021/11/4 21:00, Richard Biener wrote:
>> On Wed, Nov 3, 2021 at 2:29 PM Xionghu Luo <luoxhu@linux.ibm.com> wrote:
>>>
>>>
>>>> + while (outmost_loop != loop)
>>>> + {
>>>> + if (bb_colder_than_loop_preheader (loop_preheader_edge
>>>> (outmost_loop)->src,
>>>> + loop_preheader_edge (cold_loop)->src))
>>>> + cold_loop = outmost_loop;
>>>> + outmost_loop = superloop_at_depth (loop, loop_depth (outmost_loop) + 1);
>>>> + }
>>>>
>>>> could be instead written as
>>>>
>>>> coldest_loop = coldest_outermost_loop[loop->num];
>>>> if (loop_depth (coldest_loop) < loop_depth (outermost_loop))
>>>> return outermost_loop;
>>>> return coldest_loop;
>>>>
>>>> ? And in the usual case coldest_outermost_loop[L] would be the loop tree root.
>>>> It should be possible to compute such cache in a DFS walk of the loop tree
>>>> (the loop iterator by default visits in such order).
>>>
>>>
>>> Thanks. Updated the patch with your suggestion. Not sure whether it strictly
>>> conforms to your comments. Though the patch passed all my added tests(coverage not enough),
>>> I am still a bit worried if pre-computed coldest_loop is outside of outermost_loop, but
>>> outermost_loop is not the COLDEST LOOP, i.e. (outer->inner)
>>>
>>> [loop tree root, coldest_loop, outermost_loop,..., second_coldest_loop, ..., loop],
>>>
>>> then function find_coldest_out_loop will return a loop NOT accord with our
>>> expectation, that should return second_coldest_loop instead of outermost_loop?
>> Hmm, interesting - yes. I guess the common case will be that the pre-computed
>> outermost loop will be the loop at depth 1 since outer loops tend to
>> be colder than
>> inner loops? That would then defeat the whole exercise.
>
> It is not easy to construct such cases, But finally I got below results,
>
> 1) many cases inner loop is hotter than outer loop, for example:
>
> loop 1's coldest_outermost_loop is 1, colder_than_inner_loop is NULL
> loop 2's coldest_outermost_loop is 1, colder_than_inner_loop is 1
> loop 3's coldest_outermost_loop is 1, colder_than_inner_loop is 2
> loop 4's coldest_outermost_loop is 1, colder_than_inner_loop is 2
>
>
> 2) But there are also cases inner loop is colder than outer loop, like:
>
> loop 1's coldest outermost loop is 1, colder_than_inner_loop is NULL
> loop 2's coldest outermost loop is 2, colder_than_inner_loop is NULL
> loop 3's coldest outermost loop is 3, colder_than_inner_loop is NULL
>
>
>>
>> To optimize the common case but not avoiding iteration in the cases we care
>> about we could instead cache the next outermost loop that is _not_ colder
>> than loop. So for your [ ... ] example above we'd have> hotter_than_inner_loop[loop] == outer (second_coldest_loop), where the
>> candidate would then be 'second_coldest_loop' and we'd then iterate
>> to hotter_than_inner_loop[hotter_than_inner_loop[loop]] to find the next
>> cold candidate we can compare against? For the common case we'd
>> have hotter_than_inner_loop[looo] == NULL (no such loop) and we then
>> simply pick 'outermost_loop'.
>
> Thanks. It was difficult to understand, but finally I got to know what you
> want to express :)
>
> We should cache the next loop that is *colder* than loop instead of '_not_ colder
> than loop', and 'hotter_than_inner_loop' should be 'colder_than_inner_loop',
> then it makes sense if the coldest loop is outside of outermost loop, continue to
> find a colder loop between outermost loop and current loop in
> colder_than_inner_loop[loop->num]? Hope I understood you correctly...
>
>>
>> One comment on the patch itself below.
>>
>
> The loop in fill_cold_out_loop is also removed in the updated v7 patch.
>
>
>
> [PATCH v7 2/2] Don't move cold code out of loop by checking bb count
>
> From: Xiong Hu Luo <luoxhu@linux.ibm.com>
>
> v7 changes:
> 1. Refine get_coldest_out_loop to replace loop with checking
> pre-computed coldest_outermost_loop and colder_than_inner_loop.
> 2. Add function fill_cold_out_loop, compute coldest_outermost_loop and
> colder_than_inner_loop recursively without loop.
>
> v6 changes:
> 1. Add function fill_coldest_out_loop to pre compute the coldest
> outermost loop for each loop.
> 2. Rename find_coldest_out_loop to get_coldest_out_loop.
> 3. Add testcase ssa-lim-22.c to differentiate with ssa-lim-19.c.
>
> v5 changes:
> 1. Refine comments for new functions.
> 2. Use basic_block instead of count in bb_colder_than_loop_preheader
> to align with function name.
> 3. Refine with simpler implementation for get_coldest_out_loop and
> ref_in_loop_hot_body::operator for better understanding.
>
> v4 changes:
> 1. Sort out profile_count comparision to function bb_cold_than_loop_preheader.
> 2. Update ref_in_loop_hot_body::operator () to find cold_loop before compare.
> 3. Split RTL invariant motion part out.
> 4. Remove aux changes.
>
> v3 changes:
> 1. Handle max_loop in determine_max_movement instead of outermost_invariant_loop.
> 2. Remove unnecessary changes.
> 3. Add for_all_locs_in_loop (loop, ref, ref_in_loop_hot_body) in can_sm_ref_p.
> 4. "gsi_next (&bsi);" in move_computations_worker is kept since it caused
> infinite loop when implementing v1 and the iteration is missed to be
> updated actually.
>
> v1: https://gcc.gnu.org/pipermail/gcc-patches/2021-August/576488.html
> v2: https://gcc.gnu.org/pipermail/gcc-patches/2021-September/579086.html
> v3: https://gcc.gnu.org/pipermail/gcc-patches/2021-September/580211.html
> v4: https://gcc.gnu.org/pipermail/gcc-patches/2021-October/581231.html
> v5: https://gcc.gnu.org/pipermail/gcc-patches/2021-October/581961.html
>
> There was a patch trying to avoid move cold block out of loop:
>
> https://gcc.gnu.org/pipermail/gcc/2014-November/215551.html
>
> Richard suggested to "never hoist anything from a bb with lower execution
> frequency to a bb with higher one in LIM invariantness_dom_walker
> before_dom_children".
>
> In gimple LIM analysis, add get_coldest_out_loop to move invariants to
> expected target loop, if profile count of the loop bb is colder
> than target loop preheader, it won't be hoisted out of loop.
> Likely for store motion, if all locations of the REF in loop is cold,
> don't do store motion of it.
>
> SPEC2017 performance evaluation shows 1% performance improvement for
> intrate GEOMEAN and no obvious regression for others. Especially,
> 500.perlbench_r +7.52% (Perf shows function S_regtry of perlbench is
> largely improved.), and 548.exchange2_r+1.98%, 526.blender_r +1.00%
> on P8LE.
>
> gcc/ChangeLog:
>
> * tree-ssa-loop-im.c (bb_colder_than_loop_preheader): New
> function.
> (get_coldest_out_loop): New function.
> (determine_max_movement): Use get_coldest_out_loop.
> (move_computations_worker): Adjust and fix iteration udpate.
> (class ref_in_loop_hot_body): New functor.
> (ref_in_loop_hot_body::operator): New.
> (can_sm_ref_p): Use for_all_locs_in_loop.
> (fill_cold_out_loop): New.
> (tree_ssa_lim_finalize): Free coldest_outermost_loop and
> colder_than_inner_loop.
> (loop_invariant_motion_in_fun): Call fill_cold_out_loop.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.dg/tree-ssa/recip-3.c: Adjust.
> * gcc.dg/tree-ssa/ssa-lim-18.c: New test.
> * gcc.dg/tree-ssa/ssa-lim-19.c: New test.
> * gcc.dg/tree-ssa/ssa-lim-20.c: New test.
> * gcc.dg/tree-ssa/ssa-lim-21.c: New test.
> * gcc.dg/tree-ssa/ssa-lim-22.c: New test.
> ---
> gcc/tree-ssa-loop-im.c | 140 ++++++++++++++++++++-
> gcc/testsuite/gcc.dg/tree-ssa/recip-3.c | 2 +-
> gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-18.c | 20 +++
> gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-19.c | 29 +++++
> gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-20.c | 25 ++++
> gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-21.c | 35 ++++++
> gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-22.c | 32 +++++
> 7 files changed, 280 insertions(+), 3 deletions(-)
> create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-18.c
> create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-19.c
> create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-20.c
> create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-21.c
> create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-22.c
>
> diff --git a/gcc/tree-ssa-loop-im.c b/gcc/tree-ssa-loop-im.c
> index 4b187c2cdaf..e9b3d0cba93 100644
> --- a/gcc/tree-ssa-loop-im.c
> +++ b/gcc/tree-ssa-loop-im.c
> @@ -146,6 +146,11 @@ public:
> enum dep_kind { lim_raw, sm_war, sm_waw };
> enum dep_state { dep_unknown, dep_independent, dep_dependent };
>
> +/* coldest outermost loop for given loop. */
> +class loop **coldest_outermost_loop;
> +/* colder outer loop nerest to given loop. */
> +class loop **colder_than_inner_loop;
> +
> /* Populate the loop dependence cache of REF for LOOP, KIND with STATE. */
>
> static void
> @@ -417,6 +422,53 @@ movement_possibility (gimple *stmt)
> return ret;
> }
>
> +/* Compare the profile count inequality of bb and preheader, it is three-state
> + as stated in profile-count.h, FALSE is returned if inequality cannot be
> + decided. */
> +bool bb_colder_than_loop_preheader (basic_block bb, basic_block preheader)
> +{
> + gcc_assert (bb && preheader);
> + return bb->count < preheader->count;
> +}
> +
> +/* Check coldest loop between OUTERMOST_LOOP and LOOP by comparing profile
> + count.
> + It does three steps check:
> + 1) Check whether CURR_BB is cold in it's own loop_father, if it is cold, just
> + return NULL which means it should not be moved out at all;
> + 2) CURR_BB is NOT cold, check if pre-computed COLDEST_LOOP is outside of
> + OUTERMOST_LOOP, if it is inside of OUTERMOST_LOOP, return the COLDEST_LOOP;
> + 3) If COLDEST_LOOP is outside of OUTERMOST_LOOP, check whether there is a
> + colder loop between OUTERMOST_LOOP and loop in pre-computed
> + COLDER_THAN_INNER_LOOP, return it, otherwise return OUTERMOST_LOOP. */
> +
> +static class loop *
> +get_coldest_out_loop (class loop *outermost_loop, class loop *loop,
> + basic_block curr_bb)
> +{
> + gcc_assert (outermost_loop == loop
> + || flow_loop_nested_p (outermost_loop, loop));
> +
> + /* If bb_colder_than_loop_preheader returns false due to three-state
> + comparision, OUTERMOST_LOOP is returned finally to preserve the behavior.
> + Otherwise, return the coldest loop between OUTERMOST_LOOP and LOOP. */
> + if (curr_bb
> + && bb_colder_than_loop_preheader (curr_bb,
> + loop_preheader_edge (loop)->src))
> + return NULL;
> +
> + class loop *coldest_loop = coldest_outermost_loop[loop->num];
> + if (loop_depth (coldest_loop) < loop_depth (outermost_loop))
> + {
> + if (colder_than_inner_loop[loop->num] != NULL
> + && loop_depth (outermost_loop)
> + < loop_depth (colder_than_inner_loop[loop->num]))
> + return colder_than_inner_loop[loop->num];
> + return outermost_loop;
> + }
> + return coldest_loop;
> +}
> +
> /* Suppose that operand DEF is used inside the LOOP. Returns the outermost
> loop to that we could move the expression using DEF if it did not have
> other operands, i.e. the outermost loop enclosing LOOP in that the value
> @@ -685,7 +737,9 @@ determine_max_movement (gimple *stmt, bool must_preserve_exec)
> level = ALWAYS_EXECUTED_IN (bb);
> else
> level = superloop_at_depth (loop, 1);
> - lim_data->max_loop = level;
> + lim_data->max_loop = get_coldest_out_loop (level, loop, bb);
> + if (!lim_data->max_loop)
> + return false;
>
> if (gphi *phi = dyn_cast <gphi *> (stmt))
> {
> @@ -1221,7 +1275,10 @@ move_computations_worker (basic_block bb)
> /* We do not really want to move conditionals out of the loop; we just
> placed it here to force its operands to be moved if necessary. */
> if (gimple_code (stmt) == GIMPLE_COND)
> - continue;
> + {
> + gsi_next (&bsi);
> + continue;
> + }
>
> if (dump_file && (dump_flags & TDF_DETAILS))
> {
> @@ -2887,6 +2944,26 @@ ref_indep_loop_p (class loop *loop, im_mem_ref *ref, dep_kind kind)
> return indep_p;
> }
>
> +class ref_in_loop_hot_body
> +{
> +public:
> + ref_in_loop_hot_body (loop *loop_) : l (loop_) {}
> + bool operator () (mem_ref_loc *loc);
> + class loop *l;
> +};
> +
> +/* Check the coldest loop between loop L and innermost loop. If there is one
> + cold loop between L and INNER_LOOP, store motion can be performed, otherwise
> + no cold loop means no store motion. get_coldest_out_loop also handles cases
> + when l is inner_loop. */
> +bool
> +ref_in_loop_hot_body::operator () (mem_ref_loc *loc)
> +{
> + basic_block curr_bb = gimple_bb (loc->stmt);
> + class loop *inner_loop = curr_bb->loop_father;
> + return get_coldest_out_loop (l, inner_loop, curr_bb);
> +}
> +
>
> /* Returns true if we can perform store motion of REF from LOOP. */
>
> @@ -2941,6 +3018,12 @@ can_sm_ref_p (class loop *loop, im_mem_ref *ref)
> if (!ref_indep_loop_p (loop, ref, sm_war))
> return false;
>
> + /* Verify whether the candidate is hot for LOOP. Only do store motion if the
> + candidate's profile count is hot. Statement in cold BB shouldn't be moved
> + out of it's loop_father. */
> + if (!for_all_locs_in_loop (loop, ref, ref_in_loop_hot_body (loop)))
> + return false;
> +
> return true;
> }
>
> @@ -3153,6 +3236,48 @@ fill_always_executed_in (void)
> fill_always_executed_in_1 (loop, contains_call);
> }
>
> +/* Find the coldest loop preheader for LOOP, also find the nearest colder loop
> + to LOOP. Then recursively iterate each inner loop. */
> +
> +void
> +fill_cold_out_loop (class loop *coldest_loop, class loop *outer_loop,
> + class loop *loop)
> +{
> + class loop *colder_loop = NULL;
> + if (outer_loop)
> + {
> + if (bb_colder_than_loop_preheader (loop_preheader_edge (outer_loop)->src,
> + loop_preheader_edge (loop)->src))
> + colder_loop = outer_loop;
> + else if (colder_than_inner_loop[outer_loop->num]
> + && bb_colder_than_loop_preheader (
> + loop_preheader_edge (colder_than_inner_loop[outer_loop->num])
> + ->src,
> + loop_preheader_edge (loop)->src))
> + colder_loop = colder_than_inner_loop[outer_loop->num];
> + }
> +
> + if (bb_colder_than_loop_preheader (loop_preheader_edge (loop)->src,
> + loop_preheader_edge (coldest_loop)->src))
> + coldest_loop = loop;
> +
> + coldest_outermost_loop[loop->num] = coldest_loop;
> + colder_than_inner_loop[loop->num] = colder_loop;
> + if (dump_enabled_p ())
> + {
> + dump_printf (MSG_NOTE, "loop %d's coldest_outermost_loop is %d, ",
> + loop->num, coldest_loop->num);
> + if (colder_loop)
> + dump_printf (MSG_NOTE, "colder_than_inner_loop is %d\n",
> + colder_loop->num);
> + else
> + dump_printf (MSG_NOTE, "colder_than_inner_loop is NULL\n");
> + }
> +
> + class loop *inner_loop;
> + for (inner_loop = loop->inner; inner_loop; inner_loop = inner_loop->next)
> + fill_cold_out_loop (coldest_loop, loop, inner_loop);
> +}
>
> /* Compute the global information needed by the loop invariant motion pass. */
>
> @@ -3237,6 +3362,9 @@ tree_ssa_lim_finalize (void)
> free_affine_expand_cache (&memory_accesses.ttae_cache);
>
> free (bb_loop_postorder);
> +
> + free (coldest_outermost_loop);
> + free (colder_than_inner_loop);
> }
>
> /* Moves invariants from loops. Only "expensive" invariants are moved out --
> @@ -3256,6 +3384,14 @@ loop_invariant_motion_in_fun (function *fun, bool store_motion)
> /* Fills ALWAYS_EXECUTED_IN information for basic blocks. */
> fill_always_executed_in ();
>
> + /* Pre-compute coldest outermost loop and nearest colder loop of each loop.
> + */
> + class loop *loop;
> + coldest_outermost_loop = XNEWVEC (class loop *, number_of_loops (cfun));
> + colder_than_inner_loop = XNEWVEC (class loop *, number_of_loops (cfun));
> + for (loop = current_loops->tree_root->inner; loop != NULL; loop = loop->next)
> + fill_cold_out_loop (loop, NULL, loop);
> +
> int *rpo = XNEWVEC (int, last_basic_block_for_fn (fun));
> int n = pre_and_rev_post_order_compute_fn (fun, NULL, rpo, false);
>
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/recip-3.c b/gcc/testsuite/gcc.dg/tree-ssa/recip-3.c
> index 638bf38db8c..641c91e719e 100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/recip-3.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/recip-3.c
> @@ -23,4 +23,4 @@ float h ()
> F[0] += E / d;
> }
>
> -/* { dg-final { scan-tree-dump-times " / " 1 "recip" } } */
> +/* { dg-final { scan-tree-dump-times " / " 5 "recip" } } */
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-18.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-18.c
> new file mode 100644
> index 00000000000..7326a230b3f
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-18.c
> @@ -0,0 +1,20 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-lim2-details" } */
> +
> +volatile int x;
> +void
> +bar (int, char *, char *);
> +void
> +foo (int *a, int n, int k)
> +{
> + int i;
> +
> + for (i = 0; i < n; i++)
> + {
> + if (__builtin_expect (x, 0))
> + bar (k / 5, "one", "two");
> + a[i] = k;
> + }
> +}
> +
> +/* { dg-final { scan-tree-dump-not "out of loop 1" "lim2" } } */
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-19.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-19.c
> new file mode 100644
> index 00000000000..51c1913d003
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-19.c
> @@ -0,0 +1,29 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-lim2-details" } */
> +
> +volatile int x;
> +void
> +bar (int, char *, char *);
> +void
> +foo (int *a, int n, int m, int s, int t)
> +{
> + int i;
> + int j;
> + int k;
> +
> + for (i = 0; i < m; i++) // Loop 1
> + {
> + if (__builtin_expect (x, 0))
> + for (j = 0; j < n; j++) // Loop 2
> + for (k = 0; k < n; k++) // Loop 3
> + {
> + bar (s / 5, "one", "two");
> + a[t] = s;
> + }
> + a[t] = t;
> + }
> +}
> +
> +/* { dg-final { scan-tree-dump-times "out of loop 2" 4 "lim2" } } */
> +/* { dg-final { scan-tree-dump-times "out of loop 1" 3 "lim2" } } */
> +
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-20.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-20.c
> new file mode 100644
> index 00000000000..bc60a040a70
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-20.c
> @@ -0,0 +1,25 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-lim2-details" } */
> +
> +/* Test that `count' is not hoisted out of loop when bb is cold. */
> +
> +int count;
> +volatile int x;
> +
> +struct obj {
> + int data;
> + struct obj *next;
> +
> +} *q;
> +
> +void
> +func (int m)
> +{
> + struct obj *p;
> + for (int i = 0; i < m; i++)
> + if (__builtin_expect (x, 0))
> + count++;
> +
> +}
> +
> +/* { dg-final { scan-tree-dump-not "Executing store motion of" "lim2" } } */
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-21.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-21.c
> new file mode 100644
> index 00000000000..ffe6f8f699d
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-21.c
> @@ -0,0 +1,35 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-lim2-details" } */
> +
> +/* Test that `data' and 'data1' is not hoisted out of inner loop and outer loop
> + when it is in cold loop. */
> +
> +int count;
> +volatile int x;
> +
> +struct obj {
> + int data;
> + int data1;
> + struct obj *next;
> +};
> +
> +void
> +func (int m, int n, int k, struct obj *a)
> +{
> + struct obj *q = a;
> + for (int j = 0; j < m; j++)
> + if (__builtin_expect (m, 0))
> + for (int i = 0; i < m; i++)
> + {
> + if (__builtin_expect (x, 0))
> + {
> + count++;
> + q->data += 3; /* Not hoisted out to inner loop. */
> + }
> + count += n;
> + q->data1 += k; /* Not hoisted out to outer loop. */
> + }
> +}
> +
> +/* { dg-final { scan-tree-dump-not "Executing store motion of" "lim2" } } */
> +
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-22.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-22.c
> new file mode 100644
> index 00000000000..16ba4ceb8ab
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-22.c
> @@ -0,0 +1,32 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-lim2-details" } */
> +
> +volatile int x;
> +volatile int y;
> +void
> +bar (int, char *, char *);
> +void
> +foo (int *a, int n, int m, int s, int t)
> +{
> + int i;
> + int j;
> + int k;
> +
> + for (i = 0; i < m; i++) // Loop 1
> + {
> + if (__builtin_expect (x, 0))
> + for (j = 0; j < n; j++) // Loop 2
> + if (__builtin_expect (y, 0))
> + for (k = 0; k < n; k++) // Loop 3
> + {
> + bar (s / 5, "one", "two");
> + a[t] = s;
> + }
> + a[t] = t;
> + }
> +}
> +
> +/* { dg-final { scan-tree-dump-times "out of loop 3" 4 "lim2" } } */
> +/* { dg-final { scan-tree-dump-times "out of loop 1" 3 "lim2" } } */
> +
> +
>
--
Thanks,
Xionghu
More information about the Gcc-patches
mailing list