This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH] rs6000: Use xori for HTM builtins and vector compares
- From: Segher Boessenkool <segher at kernel dot crashing dot org>
- To: Peter Bergner <bergner at vnet dot ibm dot com>
- Cc: David Edelsohn <dje dot gcc at gmail dot com>, GCC Patches <gcc-patches at gcc dot gnu dot org>
- Date: Wed, 10 Sep 2014 14:29:02 -0500
- Subject: Re: [PATCH] rs6000: Use xori for HTM builtins and vector compares
- Authentication-results: sourceware.org; auth=none
- References: <3970eebb20386837608385f3e276f7c33080e8c0 dot 1410289982 dot git dot segher at kernel dot crashing dot org> <CAGWvnyn=0=FPOwd5Dj=9_tYr1r3fPk5dVE7zo9GOwaRLZoagYA at mail dot gmail dot com> <1410374293 dot 19545 dot 4 dot camel at otta>
On Wed, Sep 10, 2014 at 01:38:13PM -0500, Peter Bergner wrote:
> On Tue, 2014-09-09 at 19:28 -0400, David Edelsohn wrote:
> > On Tue, Sep 9, 2014 at 3:29 PM, Segher Boessenkool
> > > 2014-09-09 Segher Boessenkool <segher@kernel.crashing.org>
> > >
> > > * config/rs6000/htm.md (tabort, tabortdc, tabortdci, tabortwc,
> > > tabortwci, tbegin, tcheck, tend, trechkpt, treclaim, tsr): Use xor
> > > instead of minus.
> > > * config/rs6000/vector.md (cr6_test_for_zero_reverse,
> > > cr6_test_for_lt_reverse): Ditto.
> >
> > This is okay with me, but let me give Peter a chance to comment if
> > there was a specific reason to use subfic instead of xori. This may
> > have been a carry-over from Z, which does not have the same CA clobber
> > issue.
>
> Actually, I just copied the usage in cr6_test_for_zero_reverse and
> cr6_test_for_lt_reverse, so I'm not against using xori...as long as
> compiling a "if __builtin_tbegin (0) {...}" still ends up with a
> .tbegin followed immediately by a branch (ie, no interleaving copy
> from CR and compare instruction).
Huh, interesting. I assumed 1-(0_or_1) and (0_or_1)^1 would look the
same to combine, but no.
With subfic, combine optimises it all to a branch on cr0. With xori,
for some reason combine has a much easier job, and it optimises the lot
to a copy of cr0 to some cc, and then branch on that. The RA of course
gets rid of the copy. The extra freedom will more likely help than hurt.
The simple testcase ends up as just "tbegin. 0; beqlr 0" in either case.
So, okay?
Segher