This is the mail archive of the
mailing list for the GCC project.
Re: Reorder/combine insns on superscalar arch
- From: Jeff Law <law at redhat dot com>
- To: Igor Shevlyakov <igor dot shevlyakov at gmail dot com>
- Cc: gcc at gcc dot gnu dot org
- Date: Thu, 14 Jan 2016 23:05:42 -0700
- Subject: Re: Reorder/combine insns on superscalar arch
- Authentication-results: sourceware.org; auth=none
- References: <CAB=oy58R6YVSgFjXC12fS+zQ1MX1Xa9SE84hUOyEAxYEMoUjRA at mail dot gmail dot com> <56987BDB dot 1060105 at redhat dot com> <CAB=oy5--6qyYuz7DKLqjmSyAfLBe1g5uqAEuS-PEQc872FF_5g at mail dot gmail dot com>
On 01/14/2016 10:45 PM, Igor Shevlyakov wrote:
Nope, not really. Though thinking about it, you might want to look into
Bernd's work from 2012 in the haifa scheduler -- it's got some
intelligence for dependency breaking. I don't recall offhand which
idioms it knows about, but it may be extendable enough to suit your needs.
I really hoped that I missed something and there was better answer.
No real harm other than the pain of writing the patterns. The good news
is the combiner will present you with cases that are likely interesting
because the insns it tries to combine always have a true data dependency
between them - which are precisely the cases you care about because you
want to break the dependencies.
But does it do any harm if combiner will try to check every piece of a
parallel like that and if every component is matchable and total cost
is not worse to emit them separately?
It will change nothing for single issue machines just some reordering
but it will help many multi-issue...
Just keep in mind that when presented with a PARALLEL, all inputs are
considered consumed at the same time, then all outputs are written. You
must provide all the outputs of the PARALLEL.
If you aren't provided with a PARALLEL, then the output from the earlier
insns die in the later insns in the sequence it's trying to optimize.
So you don't have a scratch register to aid in dependency breaking.
However, the combiner does know how to add scratch registers, so if you
need them, expose them in the pattern (which will make it a PARALLEL --
one item for the actual operation another to clobber a scratch
register). This is documented in the gcc internals manual.
Well, you have to write the pattern and a splitter. But these days
there's define_insn_and_split to help with that. Reusing Bernd's work
may ultimately be easier though.
What the pitfalls or this approach are?