This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] rs6000: Make the ctr* patterns allow ints in vector regs (PR71763)


On Fri, Jul 08, 2016 at 01:28:05AM +0930, Alan Modra wrote:
> BTW, both pr70098 and pr71763 are triggered by combine, not
> loop-doloop as I was thinking earlier.  See rtl dumps for the
> testcases.  I doubt the "optimization" done by combine here is worth
> keeping, since loop-doloop.c ought to already handle the benficial
> inner loop use of ctr.  Elsewhere we typically end up with an insn
> that needs splitting back to the original sequence.  So we could avoid
> creating trouble for ourselves with the following patch.
> 
> Bootstrap and regression test powerpc64le-linux and powerpc64-linux in
> progress.
> 
> 	* config/rs6000/rs6000.md (UNSPEC_DONT_COMBINE): New unspec.
> 	(ctr<mode>): Add unspec.
> 	(ctr<mode>_internal* and splitters): Likewise.  Renumber.

The regression tests passed.  I've been looking at differences in
gcc/*.o and find many cases like the following.

orig/combine.o
    1508:	01 00 3f 2c 	cmpdi   r31,1
    150c:	ff ff ff 3b 	addi    r31,r31,-1
    1510:	dc fe 82 41 	beq     13ec
patched/combine.o
    1508:	ff ff ff 37 	addic.  r31,r31,-1
    150c:	e0 fe 82 41 	beq     13ec

Combine transforms the first sequence to the second, then further
transforms that to a bdz (ctr<mode>).  When that fails to get ctr
allocated, the splitter takes us all the way back to the three insn
sequence..

There's also a quite interesting case involving this nested loop in
real.c:real_to_hexadecimal.

  for (i = SIGSZ - 1; i >= 0; --i)
    for (j = HOST_BITS_PER_LONG - 4; j >= 0; j -= 4)
      {
	*p++ = "0123456789abcdef"[(r->sig[i] >> j) & 15];
	if (--digits == 0)
	  goto out;
      }

With the patch we use ctr for the inner loop.  With unpatched gcc
combine generates ctr<mode> for the outer loop, which of course uses
ctr and isn't profitable with an inner loop using ctr.  Vagaries of
the register allocator result in the outer loop using ctr with the
inner one losing.  Oops, we generally want inner loops to be more
highly optimized.

-- 
Alan Modra
Australia Development Lab, IBM


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]