This is the mail archive of the
mailing list for the GCC project.
Re: [PATCH]Enable elimination of IV use with unsigned type candidate
- From: "Bin.Cheng" <amker dot cheng at gmail dot com>
- To: Richard Biener <richard dot guenther at gmail dot com>
- Cc: Zdenek Dvorak <ook at ucw dot cz>, GCC Patches <gcc-patches at gcc dot gnu dot org>
- Date: Thu, 3 Jul 2014 14:58:29 +0100
- Subject: Re: [PATCH]Enable elimination of IV use with unsigned type candidate
- Authentication-results: sourceware.org; auth=none
- References: <000001cf8ec8$68c07340$3a4159c0$ at arm dot com> <CAFiYyc18W5R7-CkHBeRi4aiQuu5i=MgenBPL6r64kJ75fQDd0w at mail dot gmail dot com> <CAHFci2-ZJdXrEbV4yQG6UjqswiPp-B8trwrgAn2NAzm4f1fdRw at mail dot gmail dot com>
On Tue, Jul 1, 2014 at 10:32 AM, Bin.Cheng <email@example.com> wrote:
> Sorry for this late reply, I spent some time in understanding the problem.
> On Tue, Jun 24, 2014 at 12:36 PM, Richard Biener
> <firstname.lastname@example.org> wrote:
>> On Mon, Jun 23, 2014 at 11:49 AM, Bin Cheng <email@example.com> wrote:
>>> expressions. It's possible to have iv_elimination_compare_lt to do some
>>> undo transformation on may_be_zero, but I found it's difficult for cases
>>> involving signed/unsigned conversion like case loop-41.c. Since I think
>>> there is no obvious benefit to fold may_be_zero here (somehow because the
>>> two operands are already in folded forms), this patch just calls build2_loc
>> But it may fold to true/false, no?
> You are right, it can be folded to false in many cases. Thus I
> managed to check specific folded forms of may_be_zero, as in attached
> patch. So far it works for tests I added, but there are some other
> folded forms not handled.
>>> When GCC trying to eliminate use 0 with cand 0, the miscellaneous trees in
>>> iv_elimination_compare_lt are like below with i_1 of signed type:
>>> B: i_1 + 1
>>> A: 0
>>> niter->niter: (unsigned int)i_1
>>> Apparently, (B-A-1) is "i_1", which doesn't equal to "(unsigned int)i_1".
>>> Without this patch, it is considered equal to each other.
>> just looking at this part. Do you have a testcase that exhibits a difference
>> when just applying patch A? So I can have a look here?
>> From the code in iv_elimination_compare_lt I can't figure why we'd
>> end up with i_1 instead of (unsigned int)i_1 as we convert to a common
>> I suppose the issue may be that at tree_to_aff_combination time
>> we strip all nops with STRIP_NOPS but when operating on ->rest
>> via convert/scale or add we do not strip them again.
>> But then 'nit' should be i_1, not (unsigned int)i_1. So the analysis above
>> really doesn't look correct.
>> Just to make sure we don't paper over an issue in tree-affine.c.
>> Thus - testcase? On x86 we don't run into this place in
>> iv_elimination_compare_lt (on an unpatched tree).
>> CCing Zdenek for the meat of patch B.
For the record, here is my understanding about the problem.
Considering below simple loop, and we don't restrict P to any specific
type as GCC now does:
some_type p, p_0;
int a, b, i;
i = a;
p = p_0
p += step;
while (i < b);
We want to optimization it into below form, given there is no
overflow/wrap behavior introduced, if NITER is the number of the loop.
p = p_0;
p += step;
while (p < p_0 + (NITER + 1) * step);
For convenient, we assume positive step for now.
I think it's safe (in other words, no new overflow or wrap), if below
two conditions are satisfied:
1) When the original latch executes for N(>0) times, the new latch
executes for same times.
2) When the original latch executes ZERO time, the new latch executes
for ZERO time.
For 1), expression "p_0 + (NITER + 1) * step" should not overflow/wrap upward.
This is checked now by code snippet like:
/* We need to know that the candidate induction variable does not overflow.
While more complex analysis may be used to prove this, for now just
check that the variable appears in the original program and that it
is computed in a type that guarantees no overflows. */
cand_type = TREE_TYPE (cand->iv->base);
if (cand->pos != IP_ORIGINAL || !nowrap_type_p (cand_type))
GCC only checks if the variable appears originally and is of type not
overflow/wrap. Just as the comment states, it can be improved
(something like introduced in my patch).
For 2), expression "p_0 + (NITER + 1) * step" should not overflow/wrap
downward, in other words, "p >= p_0 + (NITER + 1) * step" should hold
when may_be_zero (i.e, "a + 1 > b") holds. Since NITER is computed as
(unsigned_type)B - (unsigned_type)A - 1, the new bound is effective in
form of "p_0 + (unsigned_type)B * step - (unsigned_type)A * step".
Due to the folded form of may_be_zero, GCC only handles cases in which
B/A are of unsigned type, we only need to make sure that "p_0 -
(unsigned_type)A * step" holds, given "(unsigned_type)B <=
(unsigned_type)A" already holds.
When comes to cases in which B is of signed type, we need to make sure
that "(unsigned_type)B" doesn't overflow/wrap.
So the conclusions are:
1) With the original patch B, patch A is needed because we relax GCC
for cases in which B is of signed type.
2) With the updated patch B, patch A won't be needed.