This is the mail archive of the
mailing list for the GCC project.
Re: Better info for combine results in worse code generated
- From: Segher Boessenkool <segher at kernel dot crashing dot org>
- To: gcc at gcc dot gnu dot org
- Date: Fri, 29 May 2015 07:58:38 -0500
- Subject: Re: Better info for combine results in worse code generated
- Authentication-results: sourceware.org; auth=none
- References: <20150528143941 dot GL14752 at bubble dot grove dot modra dot org> <20150528194222 dot GA12574 at gate dot crashing dot org> <20150529031120 dot GN14752 at bubble dot grove dot modra dot org>
On Fri, May 29, 2015 at 12:41:20PM +0930, Alan Modra wrote:
> I'll tell you one of the reasons why they are
> slower, as any decent hardware engineer could probably figure this out
> themselves anyway. The record form instructions are cracked into two
> internal ops, the basic arithmetic/logic op, and a compare. There's a
> limit to how much hardware can do in one clock cycle, or conversely,
> if you try to do more your clock must be slower.
Logical and simple arithmetic record-form ops aren't cracked, according
to our pipeline descriptions, and some simple testing (which could well
be flawed :-) ).
Of course I agree that cmp is better than and., but it will execute
pretty much the same in most code as far as I can tell.
> > > one of the aims of the wider patch I was working
> > > on was to remove patterns like rotlsi3_64, ashlsi3_64, lshrsi3_64 and
> > > ashrsi3_64.
> > We will need such patterns no matter what; the compiler cannot magically
> > know what machine insns set the high bits of a 64-bit reg to zero.
> No, not by magic. I define EXTEND_OP in rs6000.h and use it in
> record_value_for_reg. Full patch follows. I see enough code gen
> improvements on powerpc64le to make this patch worth pursuing,
> things like "rlwinm 0,5,6,0,25; extsw 0,0" being converted to
> "rldic 0,5,6,52". No doubt due to being able to prove an int var
> doesn't have the sign bit set. Hmm, in fact the 52 says it is
> known to be only 6 bits before shifting.
Ah, interesting. So you let reg_stat know about the full register
result in cases where the RTL instruction does not mention the full
register at all. That sounds like a worthwhile direction to explore :-)
> +/* Describe how rtl operations on registers behave on this target when
> + operating on less than the entire register. */
> +#define EXTEND_OP(OP) \
> + (GET_MODE (OP) != SImode \
> + || !TARGET_POWERPC64 \
> + ? UNKNOWN \
> + : (GET_CODE (OP) == AND \
> + || GET_CODE (OP) == ZERO_EXTEND \
> + || GET_CODE (OP) == ASHIFT \
> + || GET_CODE (OP) == ROTATE \
> + || GET_CODE (OP) == LSHIFTRT) \
> + ? ZERO_EXTEND \
> + : (GET_CODE (OP) == SIGN_EXTEND \
> + || GET_CODE (OP) == ASHIFTRT) \
> + ? SIGN_EXTEND \
> + : UNKNOWN)
I think this is too simplistic though. For example, AND with -7 is not
zero-extended (rlwinm rD,rA,0,31,28 sets the high 32 bits of rD to the low
32 bits of rA).
In general, everything depends on what exact machine insn is used; basing
the decision on the RTL leads to duplication, is fragile, _will_ get out