Bug 85160 - GCC generates mvn/and instructions instead of bic on aarch64
Summary: GCC generates mvn/and instructions instead of bic on aarch64
Alias: None
Product: gcc
Classification: Unclassified
Component: rtl-optimization (show other bugs)
Version: 8.0
: P3 normal
Target Milestone: ---
Assignee: Segher Boessenkool
Keywords: missed-optimization
Depends on:
Blocks: spec
  Show dependency treegraph
Reported: 2018-04-02 20:05 UTC by Steve Ellcey
Modified: 2018-08-12 19:57 UTC (History)
1 user (show)

See Also:
Target: aarch64
Known to work:
Known to fail:
Last reconfirmed: 2018-04-23 00:00:00


Note You need to log in before you can comment on or make changes to this bug.
Description Steve Ellcey 2018-04-02 20:05:50 UTC
With this test case:

int foo(int a, int b, int *c, int i, int j)
	int x,y;
	x = ((a & (~c[i])) >> 7) |
	     ((a & (~c[j])) >> 9);
	y = ((b & (~c[i])) >> 9) |
	     ((b & (~c[j])) >> 7);
	return x | y;

GCC -O2 generates 2 'mvn' instructions and 4 'and' instructions.
LLVM -O2 generates 4 'bic' instructions instead.


	ldr	w3, [x2, w3, sxtw 2]
	ldr	w2, [x2, w4, sxtw 2]
	mvn	w3, w3
	mvn	w2, w2
	and	w4, w3, w1
	and	w1, w2, w1
	and	w3, w3, w0
	and	w2, w2, w0
	asr	w4, w4, 9
	asr	w1, w1, 7
	orr	w3, w4, w3, asr 7
	orr	w2, w1, w2, asr 9
	orr	w0, w3, w2


	ldr	w8, [x2, w3, sxtw #2]
	ldr	w9, [x2, w4, sxtw #2]
	bic	w10, w0, w8
	bic	w8, w1, w8
	asr	w8, w8, #9
	bic	w11, w0, w9
	orr	w8, w8, w10, asr #7
	bic	w9, w1, w9
	orr	w8, w8, w11, asr #9
	orr	w0, w8, w9, asr #7

I am not sure if this should be considered target specific or not, the 'bic'
instruction is aarch64 specific but GCC knows how to use it.  I think combine
didn't try to replace the mvn instructions because it is used by two subsequent
instructions and that may be a generic combine issue.
Comment 1 Richard Biener 2018-04-03 09:23:18 UTC
Yes, combine only tries multi-uses in limited circumstances.
Comment 2 Andrew Pinski 2018-04-23 03:24:52 UTC
Comment 3 Segher Boessenkool 2018-04-24 16:11:57 UTC
I have some combine patches (for GCC 9) to do more 2->2 combinations.  Still
needs more tuning (but it fixes this testcase).
Comment 4 Segher Boessenkool 2018-07-30 13:18:49 UTC
Author: segher
Date: Mon Jul 30 13:18:17 2018
New Revision: 263067

URL: https://gcc.gnu.org/viewcvs?rev=263067&root=gcc&view=rev
combine: Allow combining two insns to two insns

This patch allows combine to combine two insns into two.  This helps
in many cases, by reducing instruction path length, and also allowing
further combinations to happen.  PR85160 is a typical example of code
that it can improve.

This patch does not allow such combinations if either of the original
instructions was a simple move instruction.  In those cases combining
the two instructions increases register pressure without improving the
code.  With this move test register pressure does no longer increase
noticably as far as I can tell.

(At first I also didn't allow either of the resulting insns to be a
move instruction.  But that is actually a very good thing to have, as
should have been obvious).

	PR rtl-optimization/85160
	* combine.c (is_just_move): New function.
	(try_combine): Allow combining two instructions into two if neither of
	the original instructions was a move.

Comment 5 Segher Boessenkool 2018-07-30 16:12:16 UTC
Author: segher
Date: Mon Jul 30 16:11:44 2018
New Revision: 263072

URL: https://gcc.gnu.org/viewcvs?rev=263072&root=gcc&view=rev
testcase for 2-2 combine

	PR rtl-optimization/85160
	* gcc.target/powerpc/combine-2-2.c: New testcase.

Comment 6 Segher Boessenkool 2018-08-12 19:57:22 UTC
This is fixed on trunk now.