[PATCH] Add pattern for pointer-diff on addresses with same base/offset (PR 94234)

Tue Jun 16 06:08:48 GMT 2020

Here is an question about pointer operation: 

Pointer is treated as unsigned in comparison operation, while distance between
pointers is signed. Then we can not assume the below conclusion is true?

 (ptr_a > ptr_b)     =>     (ptr_a - ptr_b) >= 0

Thanks,
Feng

________________________________________
From: Marc Glisse <marc.glisse@inria.fr>
Sent: Wednesday, June 3, 2020 10:32 PM
To: Feng Xue OS
Cc: gcc-patches@gcc.gnu.org
Subject: Re: [PATCH] Add pattern for pointer-diff on addresses with same base/offset (PR 94234)

On Wed, 3 Jun 2020, Feng Xue OS via Gcc-patches wrote:

>> Ah, looking at the PR, you decided to perform the operation as unsigned
>> because that has fewer NOP conversions, which, in that particular testcase
>> where the offsets are originally unsigned, means we simplify better. But I
>> would expect it to regress other testcases (in particular if the offsets
>> were originally signed). Also, changing the second argument of
>> pointer_plus to be signed, as is supposed to eventually happen, would
>> break your testcase again.
> The old rule might produce overflow result (offset_a = (signed_int_max)UL,
> offset_b = 1UL).

signed_int_max-1 does not overflow. But the point is that pointer_plus /
pointer_diff are defined in a way that if that subtraction would overflow,
then one of the pointer_plus or pointed_diff would have been undefined
already. In particular, you cannot have objects larger than half the
address space, and pointer_plus/pointer_diff have to remain inside an
object. Doing the subtraction in a signed type keeps (part of) that
information.

> Additionally, (stype)(offset_a - offset_b) is more compact,

Not if offset_a comes from (utype)a and offset_b from (utype)b with a and
b signed. Using size_t indices as in the bugzilla testcase is not
recommended practice. Change it to ssize_t, and we do optimize the
testcase in CCP1 already.

> there might be
> further simplification opportunities on offset_a - offset_b, even it is not
> in form of (A * C - B * C), for example (~A - 1 -> -A). But for old rule, we have
> to introduce another rule as (T)A - (T)(B) -> (T)(A - B), which seems to
> be too generic to benefit performance in all situations.

Sadly, conversions complicate optimizations and are all over the place, we
need to handle them in more places. I sometimes dream of getting rid of
NOP conversions, and having a single PLUS_EXPR with some kind of flag
saying if it can wrap/saturate/trap when seen as a signed/unsigned
operation, i.e. push the information on the operations instead of objects.

> If the 2nd argument is signed, we can add a specific rule as your suggestion
> (T)(A * C) - (T)(B * C) -> (T) (A - B) * C.
>
>> At the very least we want to keep a comment next to the transformation
>> explaining the situation.
>
>> If there are platforms where the second argument of pointer_plus is a
>> smaller type than the result of pointer_diff (can this happen? I keep
>> forgetting all the weird things some platforms do), this version may do an
>> unsafe zero-extension.
> If the 2nd argument is a smaller type, this might bring confuse semantic to
> pointer_plus operator. Suppose the type is a (unsigned) char, the expression
> "ptr + ((char) -1)" represents ptr + 255 or ptr - 1?

(pointer_plus ptr 255) would mean ptr - 1 on a platform where the second
argument of pointer_plus has size 1 byte.

Do note that I am not a reviewer, what I say isn't final.

--
Marc Glisse