RFA: fix gcc.dg/tree-ssa/popcount4l.c 16 bit failure, improve 64 bit popcount expansion for 32 bit target
Richard Biener
richard.guenther@gmail.com
Tue Jan 26 07:47:44 GMT 2021
On Tue, Jan 26, 2021 at 1:25 AM Joern Wolfgang Rennecke
<joern.rennecke@riscy-ip.com> wrote:
>
> optabs.c:expand_unop_direct can expand a popcount builtin without a call
> under certain conditions even without a popcount pattern of the required
> data width:
>
> if (unoptab == popcount_optab
> && is_a <scalar_int_mode> (mode, &int_mode)
> && GET_MODE_SIZE (int_mode) == 2 * UNITS_PER_WORD
> && optab_handler (unoptab, word_mode) != CODE_FOR_nothing
> && optimize_insn_for_speed_p ())
> {
> temp = expand_doubleword_popcount (int_mode, op0, target);
> if (temp)
> return temp;
> }
>
>
> However, the match.pd recognition of popcount arithmetic using & / + is
> tied to having an exactly matching operation. This causes a failure for
> gcc.dg/tree-ssa/popcount4l.c for 16-bit targets that have a 16 bit
> popcount operation (and no wider).
> Likewise, not recognizing a 64 bit popcount for a 32 bit target with
> 32 bit popcount could be rectified by synthesizing the wide popcount
> operations with two narrower popcount operations.
> The attached patch implements this.
Few comments.
+ (with { tree half_type = (prec <= BITS_PER_WORD || (prec & 1) ? NULL_TREE
+ : lang_hooks.types.type_for_size (prec/2, 1));
+ gcc_assert (prec > 2 || half_type == NULL_TREE);
+ }
+ (if (half_type != NULL_TREE
type_for_size can return a type with > prec/2 precision, I suppose that's OK
here. In the end we're probably looking for the "next" narrower mode
with at least half the number of bits as type and support for popcount, not
sure how likely it is to have say a 24bit PSImode popcount only so that
we'd still fail to recognize a 32bit popcount since we only will try
16bit halves.
That said, I don't like including langhooks here very much and I'd prefer
sth like (given there's no GET_MODE_NARROWER)
FOR_EACH_MODE_IN_CLASS (m, MODE_INT)
if (m == TYPE_MODE (type))
break;
else if (known_ge (GET_MODE_PRECISION (m), prec/2))
{
half_type = build_nonstandard_integer_type
(GET_MODE_PRECISION (m), 1);
if (direct_internal_fn_supported_p (IFN_POPCOUNT, half_type,
OPTIMIZE_FOR_SPEED))
break;
half_type = NULL_TREE;
^^^ IMHO we should be conservative with -Os and use OPTIMIZE_FOR_SPEED when
we need two popcount ops?
}
}
(if (half_type)
(....
+ (IFN_POPCOUNT:half_type (convert (rshift @0
+ { wide_int_to_tree (half_type, prec/2); } )))))))))))
please use build_int_cst (integer_type_node, prec/2) for the shift amount.
Otherwise this looks reasonable but since it doesn't fix a regression it has to
wait for stage1 now.
Thanks,
Richard.
More information about the Gcc-patches
mailing list