This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH v3 2/3] Add predict_doloop_p target hook
- From: "Kewen.Lin" <linkw at linux dot ibm dot com>
- To: Kugan Vivekanandarajah <kugan dot vivekanandarajah at linaro dot org>
- Cc: GCC Patches <gcc-patches at gcc dot gnu dot org>, Segher Boessenkool <segher at kernel dot crashing dot org>, wschmidt at linux dot ibm dot com, bin dot cheng at linux dot alibaba dot com, Richard Biener <rguenther at suse dot de>, Jakub Jelinek <jakub at redhat dot com>
- Date: Fri, 17 May 2019 14:15:37 +0800
- Subject: Re: [PATCH v3 2/3] Add predict_doloop_p target hook
- References: <1558064130-111037-1-git-send-email-linkw@linux.ibm.com> <CAELXzTNvysxpca7gp2eV9LFDqrQkKR8+YM4cn-Dc-k6ZcSWf-Q@mail.gmail.com>
on 2019/5/17 下午1:30, Kugan Vivekanandarajah wrote:
> Hi,
>
> On Fri, 17 May 2019 at 13:37, <linkw@linux.ibm.com> wrote:
>>
>> From: Kewen Lin <linkw@linux.ibm.com>
>>
>> +/* Check whether number of iteration computation is too costly for doloop
>> + transformation. It expands the gimple sequence to equivalent RTL insn
>> + sequence, then evaluate the cost.
>> +
>> + Return true if it's costly, otherwise return false. */
>> +
>> +static bool
>> +costly_iter_for_doloop_p (struct loop *loop, tree niters)
>> +{
>> + tree type = TREE_TYPE (niters);
>> + unsigned cost = 0;
>> + bool speed = optimize_loop_for_speed_p (loop);
>> + int regno = LAST_VIRTUAL_REGISTER + 1;
>> + walk_tree (&niters, prepare_decl_rtl, ®no, NULL);
>> + start_sequence ();
>> + expand_expr (niters, NULL_RTX, TYPE_MODE (type), EXPAND_NORMAL);
>> + rtx_insn *seq = get_insns ();
>> + end_sequence ();
>> +
>> + for (; seq; seq = NEXT_INSN (seq))
>> + {
>> + if (!INSN_P (seq))
>> + continue;
>> + rtx body = PATTERN (seq);
>> + if (GET_CODE (body) == SET)
>> + {
>> + rtx set_val = XEXP (body, 1);
>> + enum rtx_code code = GET_CODE (set_val);
>> + enum rtx_class cls = GET_RTX_CLASS (code);
>> + /* For now, we only consider these two RTX classes, to match what we
>> + get in doloop_optimize, excluding operations like zero/sign extend. */
>> + if (cls == RTX_BIN_ARITH || cls == RTX_COMM_ARITH)
>> + cost += set_src_cost (set_val, GET_MODE (set_val), speed);
> Cant you have PARALLEL with SET here?
>
Thanks for catching, updated it with single_set for PARALLEL.
- if (!INSN_P (seq))
- continue;
- rtx body = PATTERN (seq);
- if (GET_CODE (body) == SET)
+ rtx set = single_set (seq);
+ if (set != NULL_RTX)
{
- rtx set_val = XEXP (body, 1);
+ rtx set_val = XEXP (set, 1);
>> + }
>> + }
>> + unsigned max_cost
>> + = COSTS_N_INSNS (PARAM_VALUE (PARAM_MAX_ITERATIONS_COMPUTATION_COST));
>> + if (cost > max_cost)
>> + return true;
> Maybe it is better to bailout early if the limit is reached instead of
> doing it outside the loop?
>
Good point. Based on those cases I've checked so far, most of them are less
than max cost, it looks most cases won't return early. Too many early checks
seem inefficient to some extent. Does it make sense?
And we have to collect some statistics for sure. :)
Thanks,
Kewen
> Thanks,
> Kugan
>
>> +
>> + return false;
>> +}
>> +
>> +/* Predict whether the given loop will be transformed in the RTL
>> + doloop_optimize pass. Attempt to duplicate as many doloop_optimize checks
>> + as possible. This is only for target independent checks, see
>> + targetm.predict_doloop_p for the target dependent ones.
>> +
>> + Some RTL specific checks seems unable to be checked in gimple, if any new
>> + checks or easy checks _are_ missing here, please add them. */
>> +
>> +static bool
>> +generic_predict_doloop_p (struct ivopts_data *data)
>> +{
>> + struct loop *loop = data->current_loop;
>> +
>> + /* Call target hook for target dependent checks. */
>> + if (!targetm.predict_doloop_p (loop))
>> + {
>> + if (dump_file && (dump_flags & TDF_DETAILS))
>> + fprintf (dump_file, "predict doloop failure due to"
>> + "target specific checks.\n");
>> + return false;
>> + }
>> +
>> + /* Similar to doloop_optimize, check iteration description to know it's
>> + suitable or not. */
>> + edge exit = loop_latch_edge (loop);
>> + struct tree_niter_desc *niter_desc = niter_for_exit (data, exit);
>> + if (niter_desc == NULL)
>> + {
>> + if (dump_file && (dump_flags & TDF_DETAILS))
>> + fprintf (dump_file, "predict doloop failure due to"
>> + "unexpected niters.\n");
>> + return false;
>> + }
>> +
>> + /* Similar to doloop_optimize, check whether iteration count too small
>> + and not profitable. */
>> + HOST_WIDE_INT est_niter = get_estimated_loop_iterations_int (loop);
>> + if (est_niter == -1)
>> + est_niter = get_likely_max_loop_iterations_int (loop);
>> + if (est_niter >= 0 && est_niter < 3)
>> + {
>> + if (dump_file && (dump_flags & TDF_DETAILS))
>> + fprintf (dump_file,
>> + "predict doloop failure due to"
>> + "too few iterations (%u).\n",
>> + (unsigned int) est_niter);
>> + return false;
>> + }
>> +
>> + /* Similar to doloop_optimize, check whether number of iterations too costly
>> + to compute. */
>> + if (costly_iter_for_doloop_p (loop, niter_desc->niter))
>> + {
>> + if (dump_file && (dump_flags & TDF_DETAILS))
>> + fprintf (dump_file, "predict doloop failure due to"
>> + "costly niter computation.\n");
>> + return false;
>> + }
>> +
>> + return true;
>> +}
>> +
>> /* Determines cost of the computation of EXPR. */
>>
>> static unsigned
>> --
>> 2.7.4
>>
>