This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: PING: Re: [patch] implement simd loops in trunk (OMP_SIMD)
- From: Richard Henderson <rth at redhat dot com>
- To: Aldy Hernandez <aldyh at redhat dot com>
- Cc: Jakub Jelinek <jakub at redhat dot com>, Richard Biener <richard dot guenther at gmail dot com>, "Iyer, Balaji V" <balaji dot v dot iyer at intel dot com>, gcc-patches <gcc-patches at gcc dot gnu dot org>, jason merrill <jason at redhat dot com>
- Date: Thu, 01 Aug 2013 09:38:31 -1000
- Subject: Re: PING: Re: [patch] implement simd loops in trunk (OMP_SIMD)
- References: <51DC3405 dot 4070509 at redhat dot com> <51F5F9EB dot 2070205 at redhat dot com>
> + if (simd
> + /*
> + || (fd->sched_kind == OMP_CLAUSE_SCHEDULE_STATIC
> + && !fd->have_ordered)*/)
Debugging leftovers or what?
> + /* Enforce simdlen 1 in simd loops with data sharing clauses referencing
> + variable sized vars. That is unnecessarily hard to support and very
> + unlikely to result in vectorized code anyway. */
> + if (is_simd)
> + for (c = clauses; c ; c = OMP_CLAUSE_CHAIN (c))
> + switch (OMP_CLAUSE_CODE (c))
> + {
> + case OMP_CLAUSE_REDUCTION:
> + if (OMP_CLAUSE_REDUCTION_PLACEHOLDER (c))
> + max_vf = 1;
> + /* FALLTHRU */
> + case OMP_CLAUSE_PRIVATE:
> + case OMP_CLAUSE_FIRSTPRIVATE:
> + case OMP_CLAUSE_LASTPRIVATE:
> + case OMP_CLAUSE_LINEAR:
> + if (is_variable_sized (OMP_CLAUSE_DECL (c)))
> + max_vf = 1;
> + break;
> + default:
> + continue;
> + }
Comment is wrong, code is right, adjusting max_vf not simdlen.
> + /* If max_vf is non-NULL, then we can use only vectorization factor
> + up to the max_vf we chose. So stick it into safelen clause. */
> + if (max_vf)
> + {
> + tree c = find_omp_clause (gimple_omp_for_clauses (ctx->stmt),
> + OMP_CLAUSE_SAFELEN);
First, non-zero, not non-null -- max_vf is not a pointer.
Second, I don't understand why we're adjusting safelen. The VF we choose
for optimizing the loop (even 1), does not change the fact that there are
no dependencies between loop iterations larger than that.
Did you want to be adding a _max_vf_ attribute to record stuff that we
learned while examining the omp loop? E.g. the max_vf=1 reduction above?
Given the only adjustment we make to max_vf is to disable vectorization,
did we in fact want to discard the simd attribute instead, making this a
"regular" openmp loop?
> + if (__builtin_expect (N32 cond3 N31, 0)) goto ZERO_ITER_BB;
> + if (cond3 is <)
> + adj = STEP3 - 1;
> + else
> + adj = STEP3 + 1;
> + count3 = (adj + N32 - N31) / STEP3;
> + if (__builtin_expect (N22 cond2 N21, 0)) goto ZERO_ITER_BB;
> + if (cond2 is <)
> + adj = STEP2 - 1;
> + else
> + adj = STEP2 + 1;
> + count2 = (adj + N22 - N21) / STEP2;
> + if (__builtin_expect (N12 cond1 N11, 0)) goto ZERO_ITER_BB;
CEIL_DIV_EXPR, instead of TRUNC_DIV_EXPR and adjusting by hand? Unless we
can't generate the same code in the end because generically we don't know that
the values involved must be negative for GT?
I wonder if we do better mooshing all of the BBs together, creating one larger
BB with all the computation and the unexpected jump at the end. I.e.
bool zero3, zero2, zero1, zero;
zero3 = N32 c3 N31;
count3 = (N32 - N31) /[cl] STEP3;
zero2 = N22 c2 N21;
count2 = (N22 - N21) /[cl] STEP2;
zero1 = N12 c1 N11;
count1 = (N12 - N11) /[cl] STEP1;
zero = zero3 || zero2 || zero1;
count = count1 * count2 * count3;
if (__builtin_expect(zero, false)) goto zero_iter_bb;
After all, we expect the zero=false, and thus we expect to have to evaluate all
of the comparison expressions, so short-circuiting oughtn't be a win. Since
the condition isn't protecting a denominator, we're not concerned about
divide-by-zero, so we can fully evaluate count even if a numerator turned out
to be wrong.
It seems like putting this all together would create much better scheduling
opportunities, and less pressure on the chip's branch predictor.
> @@ -291,6 +300,15 @@ vect_analyze_data_ref_dependence (struct data_dependence_re
You've two adjustments in this function. First hunk for "dont_know", which is
fine; second hunk for "data is bad", which I suppose is really a form of
dont_know and thus also fine.
But it seems like there should be a third hunk in the "know" section, in which
we perform some MAX(dist, loop->safelen) computation inside that loop.
Alternately, why are we patching this function at all, rather than whatever bit
of code creates the data_dependence_relation data?
> + hash_table <simduid_to_vf> simduid_to_vf_htab;
> + hash_table <decl_to_simduid> decl_to_simduid_htab;
Why two hashes? Seems like you can map from decl to vf directly. At what
point do you have a simduid integer, but not the decl from whence it came?
r~