This is the mail archive of the
mailing list for the GCC project.
Re: RFC: Add of type-demotion pass
- From: Jakub Jelinek <jakub at redhat dot com>
- To: Richard Biener <richard dot guenther at gmail dot com>
- Cc: Jeff Law <law at redhat dot com>, Kai Tietz <ktietz at redhat dot com>, GCC Patches <gcc-patches at gcc dot gnu dot org>
- Date: Tue, 22 Oct 2013 16:27:47 +0200
- Subject: Re: RFC: Add of type-demotion pass
- Authentication-results: sourceware.org; auth=none
- References: <155227895 dot 847667 dot 1373305519786 dot JavaMail dot root at redhat dot com> <525E2B2E dot 6010505 at redhat dot com> <CAFiYyc2czZAkSnuU6KU-z+bUbZXOQKg4BnBtc4d27OPzvmvk_A at mail dot gmail dot com> <525EBFD9 dot 4000509 at redhat dot com> <CAFiYyc1mrkLbOzu0t-bT_XwQ4y1R8XBHWJNkj3wkewwwOWBXHA at mail dot gmail dot com> <52603B61 dot 7020101 at redhat dot com> <CAFiYyc3Vp6XFG6upE7P=Qmt7ds4QXhhGdLEf10WBtTk48EGFMw at mail dot gmail dot com>
- Reply-to: Jakub Jelinek <jakub at redhat dot com>
On Fri, Oct 18, 2013 at 12:06:35PM +0200, Richard Biener wrote:
> You can't move type conversion "out of the way" in most cases as
> GIMPLE is stronly typed
> and data sources and sinks can obviously not be "promoted" (nor can
> function arguments).
> So you'll very likely not be able to remove the code from the
> optimizers, it will only maybe
> trigger less often.
My take on the type demotion and promotion is that we badly need it and the
question is just in which pass to do it.
The benefit of type demotion is code canonicalization and removing
unnecessary computation that e.g. only affects the upper bits that are going
to be thrown away anyway, the disadvantage of type demotion of signed
operations is that we need to perform them in unsigned type instead and thus
we can't perform some loop optimizations based on undefined behavior etc.
for some testcases where type demotion can improve generated code.
If types are demoted, upper bits of constants go away, SCCVN can find
equivalences between SSA_NAMEs that wouldn't be considered before, etc.
But given the issue with signed operation type demotion, I think before loop
optimizations we should only be doing type demotions that don't result
in defining previously undefined behavior operations. I guess passes like
forwprop, gimple-fold etc. could easily handle the easy cases, where there
is a tree of has_single_use SSA_NAMEs that can be demoted, but handling
a more complicated web would be harder. Say in:
unsigned int a, b, c, d, e, f; unsigned char h, i, j;
unsigned int k = a * 2 + b + 0x12340000;
unsigned int l = c * 4 + d + 0x23456700;
unsigned int m = e * 5 + f, n = k + l - m, o = k - l + m, p = -k + 1;
h = n; i = o; j = p;
k, l, m all have multiple imm uses, but still pretty much everything in this
function could be demoted to unsigned char, the two large constants could go
away as additions of zero, etc. Perhaps that can be seen as little benefit,
but what if the above is all
s/unsigned int/unsigned long long/;s/unsigned char/unsigned int/ on 32-bit
target? RTL subreg pass might help a little bit, but that is too late.
For the demotion which changes undefined overflow operations to defined
ones, I wonder when is the last pass that usefully makes use of that
information, if e.g. we could do the full type demotion already before
vectorization somewhere in the loop optimization queue, or if that is still
Where type demotion and promotion is very important is IMHO vectorization,
the code we generate for mixed types vectorization is just huge and
terrible. If we can help it by not computing useless upper bits, or on the
other side sometimes not doing parts of computations in smaller types, which
lead to all the other computations on wider types to be done with bigger
vectorization factor, we could improve generated code quality.
I wonder if for vectorizations we couldn't use the same thing I wrote
recently for if-conversion, for bbs potentially suitable for vectorization
(with the right loop form etc.), that is, if we don't do full type demotion
before vectorization, check if we'd demote anything and if so, work only on
the vectorization only loop copy (or create it), and then try to do some
type promotion to minimize number of type sizes in the loop,
see the http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47477#c16 (admittedly
artificial) testcase for what I mean. After demotion, we could replace the
cast of short to char and back just with and (for zero extension) or signed
shift right + shift left (for sign extension), etc.
And, finally, the question is if we generate good code if we just expand RTL
from the demoted types (we'd better be, because user could have written his
code in the narrower types from the beginning (well, C implicit promotions
make that harder, but fold-const already demotes some computations that
appear in a single statement), or if there are advantages of promoting some
types, what algorithm to use for that, what cost model, what target hooks