This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [RFC][PR87528][PR86677] Disable builtin popcount detection when back-end does not define it
- From: Richard Biener <richard dot guenther at gmail dot com>
- To: Kugan Vivekanandarajah <kugan dot vivekanandarajah at linaro dot org>
- Cc: Jeff Law <law at redhat dot com>, Richard Sandiford <rdsandiford at googlemail dot com>, GCC Patches <gcc-patches at gcc dot gnu dot org>
- Date: Wed, 7 Nov 2018 14:03:43 +0100
- Subject: Re: [RFC][PR87528][PR86677] Disable builtin popcount detection when back-end does not define it
- References: <CAELXzTP6SOzBp-sZBk2TX0P_20gqyCM+yFOTmZjmkxrHDCZAtA@mail.gmail.com> <c57076c6-5e37-3522-3168-0b61028c2a35@redhat.com> <CAFiYyc1iic-871sYUOk_sOB1YC7y7YqW=UFwSYcJOa6X4Lkf+A@mail.gmail.com> <CAELXzTOnOM+=92xQjybn8u+_Mn=pifHhvvunpMOa6e-JDr+7Xw@mail.gmail.com> <CAFiYyc05iWuWdr-S5mpdN_vBgidS1pCJGByAAeD3zg8+PGOJMA@mail.gmail.com> <CAELXzTMbRrxoLb3i3s7KkLT4dBN2kWS3kd_F7pxxwOeedQu_zw@mail.gmail.com>
On Fri, Nov 2, 2018 at 10:02 AM Kugan Vivekanandarajah
<kugan.vivekanandarajah@linaro.org> wrote:
>
> Hi Richard,
> Thanks for the review.
> On Tue, 30 Oct 2018 at 01:25, Richard Biener <richard.guenther@gmail.com> wrote:
> >
> > On Mon, Oct 29, 2018 at 2:06 AM Kugan Vivekanandarajah
> > <kugan.vivekanandarajah@linaro.org> wrote:
> > >
> > > Hi Richard and Jeff,
> > >
> > > Thanks for your comments.
> > >
> > > On Fri, 26 Oct 2018 at 19:40, Richard Biener <richard.guenther@gmail.com> wrote:
> > > >
> > > > On Fri, Oct 26, 2018 at 4:55 AM Jeff Law <law@redhat.com> wrote:
> > > > >
> > > > > On 10/25/18 4:33 PM, Kugan Vivekanandarajah wrote:
> > > > > > Hi,
> > > > > >
> > > > > > PR87528 showed a case where libgcc generated popcount is causing
> > > > > > regression for Skylake.
> > > > > > We also have PR86677 where kernel build is failing because the kernel
> > > > > > does not use the libgcc (when backend is not defining popcount
> > > > > > pattern). While I agree that the kernel should implement its own
> > > > > > functionality when it is not using the libgcc, I am afraid that the
> > > > > > implementation can have the same performance issues reported for
> > > > > > Skylake in PR87528.
> > > > > >
> > > > > > Therefore, I would like to propose that we disable popcount detection
> > > > > > when we don't have a pattern for that. The attached patch (based on
> > > > > > previous discussions) does this.
> > > > > >
> > > > > > Bootstrapped and regression tested on x86_64-linux-gnu with no new
> > > > > > regressions. We need to disable the popcount* testcases. I will have
> > > > > > to define a effective_target_with_popcount in
> > > > > > gcc/testsuite/lib/target-supports.exp if this patch is OK?
> > > > > > Thanks,
> > > > > > Kugan
> > > > > >
> > > > > >
> > > > > > gcc/ChangeLog:
> > > > > >
> > > > > > 2018-10-25 Kugan Vivekanandarajah <kuganv@linaro.org>
> > > > > >
> > > > > > * tree-scalar-evolution.c (expression_expensive_p): Make BUILTIN POPCOUNT
> > > > > > as expensive when backend does not define it.
> > > > > >
> > > > > >
> > > > > > gcc/testsuite/ChangeLog:
> > > > > >
> > > > > > 2018-10-25 Kugan Vivekanandarajah <kuganv@linaro.org>
> > > > > >
> > > > > > * gcc.target/aarch64/popcount4.c: New test.
> > > > > >
> > > > > FWIW, I've been disabling by checking direct_optab_handler elsewhere
> > > > > (number_of_iterations_popcount) in my tester. It may in fact be an old
> > > > > patch from you.
> > > > >
> > > > > Richi argued that it's the kernel team's responsibility to provide a
> > > > > popcount since they don't link with libgcc. And I'm generally in
> > > > > agreement with that position, though it does tend to generate some
> > > > > friction with the kernel developers. We also run the real risk of GCC 9
> > > > > not being able to build the kernel which, IMHO, would be a disaster from
> > > > > a PR standpoint.
> > > > >
> > > > > I'd like to hear from others here. I fully realize we're beyond the
> > > > > realm of what is strictly technically correct here from a review standpoint.
> > > >
> > > > As said final value replacement to a library call is probably not wanted
> > > > for optimization purpose, so adjusting expression_expensive_p is OK with
> > > > me. It might not fully solve the (non-)issue in case another optimization pass
> > > > chooses to materialize niter computation result.
> > > >
> > > > Few comments on the patch:
> > > >
> > > > + tree fndecl = get_callee_fndecl (expr);
> > > > +
> > > > + if (fndecl && DECL_BUILT_IN_CLASS (fndecl) == BUILT_IN_NORMAL)
> > > > + {
> > > > + combined_fn cfn = as_combined_fn (DECL_FUNCTION_CODE (fndecl));
> > > >
> > > > combined_fn cfn = gimple_call_combined_fn (expr);
> > > > switch (cfn)
> > > > {
> > >
> > > Did you mean:
> > > combined_fn cfn = get_call_combined_fn (expr);
> >
> > Yes.
> >
> > > > ...
> > > >
> > > > cfn will be CFN_LAST for a non-builtin/internal call. I know Richard is mostly
> > > > offline but eventually he knows whether there is a better way to query
> > > >
> > > > + CASE_CFN_POPCOUNT:
> > > > + /* Check if opcode for popcount is available. */
> > > > + if (optab_handler (popcount_optab,
> > > > + TYPE_MODE (TREE_TYPE (CALL_EXPR_ARG
> > > > (expr, 0))))
> > > > + == CODE_FOR_nothing)
> > > > + return true;
> > > >
> > > > note that we currently generate builtin calls rather than IFN calls
> > > > (when a direct
> > > > optab is supported).
> > > >
> > > > Another comment on the patch is that you probably have to adjust existing
> > > > popcount testcases to add architecture specific flags enabling suport for
> > > > the instructions, otherwise you won't see loop replacement.
> > > Indeed.
> > > In lib/target-supports.exp, I will try to add support for
> > > check_effective_target_popcount_long.
> > > When I grep for the popcount pattern in md files, I see it is defined for:
> > >
> > > tilegx
> > > tilepro
> > > alpha
> > > aarch64 when TARGET_SIMD
> > > ia64
> > > rs6000
> > > i386 when TARGET_POPCOUNT
> > > popwerpcspce when TARGET_POPCNTB || TARGET_POPCNTD
> > > s390 when TARGET_Z916 && TARGET_64BIT
> > > sparc when TARGET_POPC
> > > arm when TARGET_NEON
> > > mips when ISA_HAS_POP
> > > spu
> > > avr
> > >
> > > I can check these targets with the condition.
> > > Another possibility is to check with a sample code and see if we are
> > > getting a libcall in the asm. Not sure if that is straightforward. Are
> > > there any example for such.
> >
> > You could try linking w/o libgcc ...
> I realized that there are some examples already and I have based it
> on that. Attached patch
> 0001-fix-kernel-build-v3.patch does this. Bootstrapped and regression
> tested on aarch64-linux-gnu with no new regression. Is this OK?
I suspect using sth like the hard-float test is better, thus
check_no_messages_and_pattern popcountl "!\\(call" rtl-expand
instead of looking for !__popcount in assembly since depending on
assembler syntax this might nor might not match spuriously...
@@ -3501,6 +3504,19 @@ expression_expensive_p (tree expr)
tree arg;
call_expr_arg_iterator iter;
+ combined_fn cfn = get_call_combined_fn (expr);
+ switch (cfn)
+ {
+ CASE_CFN_POPCOUNT:
+ /* Check if opcode for popcount is available. */
+ if (optab_handler (popcount_optab,
+ TYPE_MODE (TREE_TYPE (CALL_EXPR_ARG (expr, 0))))
+ == CODE_FOR_nothing)
+ return true;
+ default:
+ break;
+ }
+
if (!is_inexpensive_builtin (get_callee_fndecl (expr)))
return true;
FOR_EACH_CALL_EXPR_ARG (arg, iter, expr)
note that we can handle double-word popcount by emitting two single-word
popcount instructions. So if the mode is of 2 * UNITS_PER_WORD size
you want to check for mode == word_mode. See expand_unop.
I think you want to add a comment before the code explaining that even
though is_inexpensive_builtin says true we may get a library call which
we consider expensive.
> >
> > > We could also move these test to a primary target that is tested often
> > > tested which also defines popcount pattern. I dont think these tests
> > > change for targets and if we can test in one target that could be
> > > enough,
> > >
> > > I am happy to implement what is appropriate.
> >
> > How about the -Os idea?
> Attached patch 0002-allow-builtin-popcount-for-size_p.patch attempts
> this. As you mentioned, this will again break the kernel. While I am
> trying to provide the popcount implementation for the newer kernel, it
> will still be a problem for existing kernel versions. May I request
> that we provide a flag to disable this if we decide to go with the
> patch please?
Let's consider this only for GCC 10 and get the first patch ready.
expression_expensive_p shouldn't consider division as expensive
for -Os either so there's more churn here and expression_expensive_p
doesn't consider the overall size of the expression anyways.
So let's drop this for now.
Richard.
> Thanks,
> Kugan
>
> >
> > Richard.
> >
> > > Thanks,
> > > Kugan
> > >
> > >
> > >
> > > >
> > > > Also I think that the expression is only expensive (for final value
> > > > replacement!)
> > > > if you consider optimizing for speed. When optimizing for size getting rid of
> > > > the loop is probably beneificial unconditionally. That would leave the
> > > > possibility to switch said testcases to -Os. It would require adding a
> > > > bool size_p flag to expression_expensive and passing down
> > > > optimize_loop_for_size_p ().
> > > >
> > > > _NOTE_ that expression_expensive_p is also used by IVOPTs and there
> > > > replacing sth with an expression based on the niter analysis result doesn't
> > > > mean we get rid of the loop (but only of an IV), so maybe that reasoning
> > > > doesn't apply there.
> > > >
> > > > Richard.
> > > >
> > > > > Jeff
> > > > >