This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH], PR target/81550, Rewrite PowerPC loop_align test so it still tests the original target hook
- From: Michael Meissner <meissner at linux dot vnet dot ibm dot com>
- To: Segher Boessenkool <segher at kernel dot crashing dot org>
- Cc: Michael Meissner <meissner at linux dot vnet dot ibm dot com>, GCC Patches <gcc-patches at gcc dot gnu dot org>, David Edelsohn <dje dot gcc at gmail dot com>, Bill Schmidt <wschmidt at linux dot vnet dot ibm dot com>
- Date: Wed, 24 Jan 2018 15:19:00 -0500
- Subject: Re: [PATCH], PR target/81550, Rewrite PowerPC loop_align test so it still tests the original target hook
- Authentication-results: sourceware.org; auth=none
- References: <20180124052755.GA8250@ibm-tiger.the-meissners.org> <20180124183538.GK21977@gate.crashing.org>
On Wed, Jan 24, 2018 at 12:35:38PM -0600, Segher Boessenkool wrote:
> Hi!
>
> On Wed, Jan 24, 2018 at 12:27:55AM -0500, Michael Meissner wrote:
> >
> > As Segher and I were discussing over private IRC, the root cause of this bug is
> > the compiler no long generates the BDNZ instruction for a count down loop,
> > instead it decrements the index in a GPR and does a branch/comparison on it.
>
> Yes, ivopts makes a bad decision (it uses stride 8 for all IVs, it should
> keep one with stride -1 for the loop counter, for optimal code; it also
> does three separate increments for the three memory accesses, which is
> a bit excessive here).
>
> > In doing so, it now unrolls the loop twice, and and the resulting loop is too
> > big for the target hook TARGET_ASM_LOOP_ALIGN_MAX_SKIP. This means the loop
> > isn't aligned to a 32 byte boundary.
>
> It's not really unrolling, it is bb-reorder copying an RTL block. However,
> even if you disable it you still get 9 insns on some configurations, so
> your patch does not hide the problem :-(
>
> Although, hrm, in your patch you also change "int i" to "long i"; that
> alone seems to be enough to fix everything? Could you check that please?
Changing i and n to either 'long' or 'long unsigned' makes the test work.
It is interesting that -mcpu=power7 -mbig does not seem to be able to create
LFDU and STFDU, but either setting cpu to power8/power9 or setting -mbig to
-mlittle or -m32 it can generate those instructions.
--
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meissner@linux.vnet.ibm.com, phone: +1 (978) 899-4797