This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH v2 3/9] Introduce can_vector_compare_p function
On Fri, Aug 23, 2019 at 1:35 PM Ilya Leoshkevich <iii@linux.ibm.com> wrote:
>
> > Am 23.08.2019 um 13:24 schrieb Richard Biener <richard.guenther@gmail.com>:
> >
> > On Fri, Aug 23, 2019 at 12:43 PM Richard Sandiford
> > <richard.sandiford@arm.com> wrote:
> >>
> >> Ilya Leoshkevich <iii@linux.ibm.com> writes:
> >>> @@ -3819,6 +3820,82 @@ can_compare_p (enum rtx_code code, machine_mode mode,
> >>> return 0;
> >>> }
> >>>
> >>> +/* can_vector_compare_p presents fake rtx binary operations to the the back-end
> >>> + in order to determine its capabilities. In order to avoid creating fake
> >>> + operations on each call, values from previous calls are cached in a global
> >>> + cached_binops hash_table. It contains rtxes, which can be looked up using
> >>> + binop_keys. */
> >>> +
> >>> +struct binop_key {
> >>> + enum rtx_code code; /* Operation code. */
> >>> + machine_mode value_mode; /* Result mode. */
> >>> + machine_mode cmp_op_mode; /* Operand mode. */
> >>> +};
> >>> +
> >>> +struct binop_hasher : pointer_hash_mark<rtx>, ggc_cache_remove<rtx> {
> >>> + typedef rtx value_type;
> >>> + typedef binop_key compare_type;
> >>> +
> >>> + static hashval_t
> >>> + hash (enum rtx_code code, machine_mode value_mode, machine_mode cmp_op_mode)
> >>> + {
> >>> + inchash::hash hstate (0);
> >>> + hstate.add_int (code);
> >>> + hstate.add_int (value_mode);
> >>> + hstate.add_int (cmp_op_mode);
> >>> + return hstate.end ();
> >>> + }
> >>> +
> >>> + static hashval_t
> >>> + hash (const rtx &ref)
> >>> + {
> >>> + return hash (GET_CODE (ref), GET_MODE (ref), GET_MODE (XEXP (ref, 0)));
> >>> + }
> >>> +
> >>> + static bool
> >>> + equal (const rtx &ref1, const binop_key &ref2)
> >>> + {
> >>> + return (GET_CODE (ref1) == ref2.code)
> >>> + && (GET_MODE (ref1) == ref2.value_mode)
> >>> + && (GET_MODE (XEXP (ref1, 0)) == ref2.cmp_op_mode);
> >>> + }
> >>> +};
> >>> +
> >>> +static GTY ((cache)) hash_table<binop_hasher> *cached_binops;
> >>> +
> >>> +static rtx
> >>> +get_cached_binop (enum rtx_code code, machine_mode value_mode,
> >>> + machine_mode cmp_op_mode)
> >>> +{
> >>> + if (!cached_binops)
> >>> + cached_binops = hash_table<binop_hasher>::create_ggc (1024);
> >>> + binop_key key = { code, value_mode, cmp_op_mode };
> >>> + hashval_t hash = binop_hasher::hash (code, value_mode, cmp_op_mode);
> >>> + rtx *slot = cached_binops->find_slot_with_hash (key, hash, INSERT);
> >>> + if (!*slot)
> >>> + *slot = gen_rtx_fmt_ee (code, value_mode, gen_reg_rtx (cmp_op_mode),
> >>> + gen_reg_rtx (cmp_op_mode));
> >>> + return *slot;
> >>> +}
> >>
> >> Sorry, I didn't mean anything this complicated. I just meant that
> >> we should have a single cached rtx that we can change via PUT_CODE and
> >> PUT_MODE_RAW for each new query, rather than allocating a new rtx each
> >> time.
> >>
> >> Something like:
> >>
> >> static GTY ((cache)) rtx cached_binop;
> >>
> >> rtx
> >> get_cached_binop (machine_mode mode, rtx_code code, machine_mode op_mode)
> >> {
> >> if (cached_binop)
> >> {
> >> PUT_CODE (cached_binop, code);
> >> PUT_MODE_RAW (cached_binop, mode);
> >> PUT_MODE_RAW (XEXP (cached_binop, 0), op_mode);
> >> PUT_MODE_RAW (XEXP (cached_binop, 1), op_mode);
> >> }
> >> else
> >> {
> >> rtx reg1 = gen_raw_REG (op_mode, LAST_VIRTUAL_REGISTER + 1);
> >> rtx reg2 = gen_raw_REG (op_mode, LAST_VIRTUAL_REGISTER + 2);
> >> cached_binop = gen_rtx_fmt_ee (code, mode, reg1, reg2);
> >> }
> >> return cached_binop;
> >> }
> >
> > Hmm, maybe we need auto_rtx (code) that constructs such
> > RTX on the stack instead of wasting a GC root (and causing
> > issues for future threading of GCC ;)).
>
> Do you mean something like this?
>
> union {
> char raw[rtx_code_size[code]];
> rtx rtx;
> } binop;
>
> Does this exist already (git grep auto.*rtx / rtx.*auto doesn't show
> anything useful), or should I implement this?
It doesn't exist AFAIK, I thought about using alloca like
rtx tem;
rtx_alloca (tem, PLUS);
and due to using alloca rtx_alloca has to be a macro like
#define rtx_alloca(r, code) r = (rtx)alloca (RTX_CODE_SIZE(code));
memset (r, 0, RTX_HDR_SIZE); PUT_CODE (r, code);
maybe C++ can help making this prettier but of course
since we use alloca we have to avoid opening new scopes.
I guess templates like with auto_vec doesn't work unless
we can make RTX_CODE_SIZE constant-evaluated.
Richard.