Bug 49330 - Integer arithmetic on addresses optimised with pointer arithmetic rules
Summary: Integer arithmetic on addresses optimised with pointer arithmetic rules
Status: NEW
Alias: None
Product: gcc
Classification: Unclassified
Component: rtl-optimization (show other bugs)
Version: 4.6.1
: P3 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords: alias, wrong-code
: 52517 86026 (view as bug list)
Depends on:
Blocks: 65752 88433 49140 50063
  Show dependency treegraph
 
Reported: 2011-06-08 20:14 UTC by Harald van Dijk
Modified: 2019-06-14 18:53 UTC (History)
12 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail: 4.3.5, 4.4.5, 4.5.3, 4.6.0, 4.7.0, 8.1.0, 9.0
Last reconfirmed: 2019-01-08 00:00:00


Attachments
statistic patch (987 bytes, text/plain)
2019-01-09 11:05 UTC, Richard Biener
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Harald van Dijk 2011-06-08 20:14:28 UTC
#include <stdint.h>
int x, y;
int main(void) {
  uintptr_t px = (uintptr_t) &x;
  uintptr_t py = (uintptr_t) &y;
  volatile uintptr_t d = px - py;
  uintptr_t p = py + d;
  x = 1;
  *(int *) p = 2;
  return x;
}

gcc 4.6(20110603) returns 1 at -O1 or higher. configure options:

--build=x86_64-pc-linux-gnu --host=x86_64-pc-linux-gnu --prefix=/usr --sysconfdir=/etc --program-suffix=-4.6 --enable-languages=c,c++ --enable-checking --enable-build-with-cxx

As far as I can see, this program is perfectly valid and is required to return 2. gcc seems to be optimising on the assumption that an addition to &y will not result in a pointer to a distinct object (and so stores 2 in y), but that assumption is only correct for a pointer addition, which the above is not.
Comment 1 Harald van Dijk 2011-06-08 20:42:04 UTC
A similar example, but one which does not convert the integer back to a pointer:

#include <stdio.h>
#include <stdlib.h>
int a;
int main() {
  unsigned long b = (unsigned long) &a - 134518548;
  volatile unsigned long c = b;
  if (c == 0) {
    if (b != 0) abort ();
  }
  printf("%lu\n", c);
  return a;
}

This should never abort. It aborts on my system (with -m32).
Comment 2 Richard Biener 2011-06-09 09:36:46 UTC
I can confirm the original testcase with r173419 (not the one from comment#1).
The tree level optimizers handle this ok and we expand from

<bb 2>:
  px_1 = (uintptr_t) &x;
  py_2 = (uintptr_t) &y;
  d.0_3 = px_1 - py_2;
  d ={v} d.0_3;
  d.1_4 ={v} d;
  p_5 = d.1_4 + py_2;
  x = 1;
  # PT = { x y } (glob)
  p.2_6 = (int *) p_5;
  *p.2_6 = 2;
  D.2722_7 = x;
  return D.2722_7;

where you can also see that we correctly assume that p.2_6 may point either
x or y.

CSE1 optimizes the load from x to 1.
Comment 3 Jakub Jelinek 2011-06-09 10:04:28 UTC
Ugh.
base_alias_check for
(symbol_ref:DI ("x") <var_decl 0x7ffff1a32000 x>)
and
(plus:DI (reg:DI 62 [ d.1 ])
    (symbol_ref:DI ("y") <var_decl 0x7ffff1a320a0 y>))
returns 0.  I'm afraid dropping the base_alias_check call would significantly penalize generated code, after all we still sometimes have MEM accesses without MEM_EXPR.  Perhaps we could rely on REG_POINTER bits, extend them to SYMBOL_REF too and only do return NULL from find_base_term if SYMBOL_REF doesn't have SYMBOL_REF_POINTER bit set.  During CSE etc., when replacing a pseudo with its definition REG_POINTER/SYMBOL_REF_POINTER would be kept only if it is set on the pseudo being optimized away as well.  Thus, any casts from pointers to correspondingly sized integers and back would be seen in the RTL.
Comment 4 rguenther@suse.de 2011-06-09 10:08:55 UTC
On Thu, 9 Jun 2011, jakub at gcc dot gnu.org wrote:

> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49330
> 
> Jakub Jelinek <jakub at gcc dot gnu.org> changed:
> 
>            What    |Removed                     |Added
> ----------------------------------------------------------------------------
>                  CC|                            |jakub at gcc dot gnu.org
> 
> --- Comment #3 from Jakub Jelinek <jakub at gcc dot gnu.org> 2011-06-09 10:04:28 UTC ---
> Ugh.
> base_alias_check for
> (symbol_ref:DI ("x") <var_decl 0x7ffff1a32000 x>)
> and
> (plus:DI (reg:DI 62 [ d.1 ])
>     (symbol_ref:DI ("y") <var_decl 0x7ffff1a320a0 y>))
> returns 0.  I'm afraid dropping the base_alias_check call would significantly
> penalize generated code, after all we still sometimes have MEM accesses without
> MEM_EXPR.  Perhaps we could rely on REG_POINTER bits, extend them to SYMBOL_REF
> too and only do return NULL from find_base_term if SYMBOL_REF doesn't have
> SYMBOL_REF_POINTER bit set.  During CSE etc., when replacing a pseudo with its
> definition REG_POINTER/SYMBOL_REF_POINTER would be kept only if it is set on
> the pseudo being optimized away as well.  Thus, any casts from pointers to
> correspondingly sized integers and back would be seen in the RTL.

Maybe restrict the base_alias_check to var-decls that do not have
their address taken?  Points-to analysis should conver them.

Richard.
Comment 5 Richard Biener 2011-06-09 10:22:57 UTC
The testcase also fails similarly without the volatile qualification of d
when compiling with -O -fno-tree-fre -fno-tree-forwprop -fno-tree-reassoc
Comment 6 Jakub Jelinek 2011-06-09 10:38:37 UTC
find_base_value/find_base_term seem to be overly optimistic, prefer to return something over being conservative.  E.g. if both operands of PLUS or MINUS return non-NULL find_base_term, we just return the first one, etc.
It doesn't differentiate between "certainly not based on something" - e.g.
CONST_INT, regs only initialized by them and only incremented by constant etc. would be such and unknown.  While the overly optimistic implementation might be useful in some cases, at least for base_alias_check it is inappropriate.
Comment 7 Eric Botcazou 2011-06-10 09:50:28 UTC
The alias.c machinery is clearly based on the fundamental assumption of pointer arithmetics, i.e. that you aren't allowed to compute a difference unless both pointers point to the same enclosing object.  The testcase is a nice attempt at smuggling the violation of this rule behind a uintptr_t based manipulation.

Not clear what to do IMO.  Richard is proposing to leverage the escape sites:

  px_1 = (uintptr_t) &x;
  py_2 = (uintptr_t) &y;

but this will pessimize.  Ideally we should leverage the one problematic line:

  p.2_6 = (int *) p_5;

which creates the pointer out of the integer, but it has already disappeared in the very first RTL representation.
Comment 8 Richard Biener 2011-06-10 09:59:06 UTC
(In reply to comment #7)
> The alias.c machinery is clearly based on the fundamental assumption of pointer
> arithmetics, i.e. that you aren't allowed to compute a difference unless both
> pointers point to the same enclosing object.  The testcase is a nice attempt at
> smuggling the violation of this rule behind a uintptr_t based manipulation.
> 
> Not clear what to do IMO.  Richard is proposing to leverage the escape sites:
> 
>   px_1 = (uintptr_t) &x;
>   py_2 = (uintptr_t) &y;
> 
> but this will pessimize.  Ideally we should leverage the one problematic line:
> 
>   p.2_6 = (int *) p_5;
> 
> which creates the pointer out of the integer, but it has already disappeared in
> the very first RTL representation.

Creating that pointer is perfectly valid - you are allowed to cast a
pointer to an uintptr_t and back, which is what the code does (in some
obfuscated way of course).

p.2_6 = (int *) ((uintptr_t) &y + ((uintptr_t) &x - (uintptr_t) &y)))

which is, as is trivially obvious, (int *) (uintptr_t) &x.

Consider

int foo(uintptr_t a)
{
  uintptr_t px = (uintptr_t) &x;
  uintptr_t py = a;
  volatile uintptr_t d = px - py;
  uintptr_t p = py + d;
  x = 1;
  *(int *) p = 2;
  return x;
}

int main() { foo(&y); }

which is equivalent.
Comment 9 Eric Botcazou 2011-06-10 10:55:53 UTC
> Creating that pointer is perfectly valid - you are allowed to cast a
> pointer to an uintptr_t and back, which is what the code does (in some
> obfuscated way of course).

No disagreement, "problematic" only in the sense that it breaks the assumption of the aliasing machinery, as you're creating a pointer out of nothing.
Comment 10 Richard Biener 2011-06-10 11:13:53 UTC
(In reply to comment #9)
> > Creating that pointer is perfectly valid - you are allowed to cast a
> > pointer to an uintptr_t and back, which is what the code does (in some
> > obfuscated way of course).
> 
> No disagreement, "problematic" only in the sense that it breaks the assumption
> of the aliasing machinery, as you're creating a pointer out of nothing.

Indeed.  I fixed similar bugs in tree points-to analysis, like

int x, y;
int foo (int *p)
{
  int *q = &x;
  int *p = &y;
  int i;
  for (i=0; i<sizeof(int *); ++i)
    ((char *)&q)[i] = ((char *)&p)[i];
  *p = 1;
  *q = 2;
  return *p;
}

which I suspect might work on RTL only by accident (at -O3 with the
loop unrolled).
Comment 11 Andrew Pinski 2012-03-06 23:49:08 UTC
*** Bug 52517 has been marked as a duplicate of this bug. ***
Comment 12 Richard Biener 2016-10-19 13:42:11 UTC
Re-confirmed on trunk.
Comment 13 Richard Biener 2018-06-04 06:59:59 UTC
*** Bug 86026 has been marked as a duplicate of this bug. ***
Comment 14 Richard Biener 2018-06-04 07:02:05 UTC
Reconfirmed.
Comment 15 Richard Biener 2018-06-04 07:07:26 UTC
IMHO as RTL drops the difference between pointers and integers (of the same mode)
it has to drop the assumption that pointer arithmetic has to stay inside a pointed-to object similar to how it has to drop reliance on undefined overflow
for optimization since it drops the notion of signedness.

The other choice may seem to take REG_POINTER setting conservative and only
have the assumption on REG_POINTER regs.  (and thus make sure to set REG_POINTER and MEM_POINTER conservatively, which may be a difficult task on its own)
Comment 16 Alexander Monakov 2018-06-04 08:37:24 UTC
What do you think about the suggestion made in the most recent duplicate, namely expanding GIMPLE pointer-to-integer casts to non-transparent RTL assignments, i.e. going from

  val = (intptr_t) ptr;

to

  asm ("" : "=g" (rval) : "0" (rptr));

Wouldn't this plug the hole in one shot instead of chasing down missing REG_POINTERs in multiple RTL passes?
Comment 17 rguenther@suse.de 2018-06-04 10:14:17 UTC
On Mon, 4 Jun 2018, amonakov at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=49330
> 
> Alexander Monakov <amonakov at gcc dot gnu.org> changed:
> 
>            What    |Removed                     |Added
> ----------------------------------------------------------------------------
>                  CC|                            |amonakov at gcc dot gnu.org
> 
> --- Comment #16 from Alexander Monakov <amonakov at gcc dot gnu.org> ---
> What do you think about the suggestion made in the most recent duplicate,
> namely expanding GIMPLE pointer-to-integer casts to non-transparent RTL
> assignments, i.e. going from
> 
>   val = (intptr_t) ptr;
> 
> to
> 
>   asm ("" : "=g" (rval) : "0" (rptr));
> 
> Wouldn't this plug the hole in one shot instead of chasing down missing
> REG_POINTERs in multiple RTL passes?

I suspect such assignments are simply too common and doing this would
be worse than not assuming pointer arithmetic cannot cross different
objects at the RTL level.  How do you for example treat an aggregate
memcpy of a structure containing pointers RTL expanded to say
word-wise (integer-mode!) copies and then RTL afterwards figuring
out reg-reg dependence chains from that.  You have to realize that
you'd have to introduce those "barriers" during RTL optimization
itself...  in which case you're back at sth like REG_POINTER.

I think it would be nice to isolate this assumption on RTL and
put it behind a user-crontrollable flag.  That would allow
benchmarking this.
Comment 18 Richard Biener 2019-01-08 15:24:25 UTC
So for find_base_term to compute sth conservative we'd need to track
RTX_SURELY_NON_POINTER (what RTX is surely _not_ based on a pointer
and thus can be ignored).  And when find_base_term ever figures
two bases in say a PLUS it has to conservatively return 0.

I fear the existing REG_POINTER does not help at all.  For the testcase
we have

(plus:DI (reg:DI 83 [ d.0_2 ])
    (symbol_ref:DI ("y") [flags 0x2]  <var_decl 0x7ffff7fefb40 y>))

where reg:DI 83 is not marked with REG_POINTER and find_base_term
doesn't find it to be an alternate base.  For the testcase the
offending MEM has a MEM_EXPR and we have proper points-to info.

IMHO the proper solution is to kill base_alias_check or all problematic
cases in find_base_term (binary ops with more than one non-CONST_INT
operand).

And eventually make sure to more properly preserve MEM_EXPRs.

Maybe sth as "simple" as the following which of course fixes the
testcase but will make find_base_term fail on any variable-indexed
thing.

diff --git a/gcc/alias.c b/gcc/alias.c
index 93f53543d12..3a66e10b431 100644
--- a/gcc/alias.c
+++ b/gcc/alias.c
@@ -2009,12 +2009,14 @@ find_base_term (rtx x, vec<std::pair<cselib_val *,
        rtx base = find_base_term (tmp1, visited_vals);
        if (base != NULL_RTX
            && ((REG_P (tmp1) && REG_POINTER (tmp1))
-                || known_base_value_p (base)))
+                || known_base_value_p (base))
+           && CONST_INT_P (tmp2))
          return base;
        base = find_base_term (tmp2, visited_vals);
        if (base != NULL_RTX
            && ((REG_P (tmp2) && REG_POINTER (tmp2))
-                || known_base_value_p (base)))
+                || known_base_value_p (base))
+           && CONST_INT_P (tmp1))
          return base;
 
        /* We could not determine which of the two operands was the
Comment 19 Richard Biener 2019-01-09 08:48:31 UTC
(In reply to Richard Biener from comment #18)
> So for find_base_term to compute sth conservative we'd need to track
> RTX_SURELY_NON_POINTER (what RTX is surely _not_ based on a pointer
> and thus can be ignored).  And when find_base_term ever figures
> two bases in say a PLUS it has to conservatively return 0.
> 
> I fear the existing REG_POINTER does not help at all.  For the testcase
> we have
> 
> (plus:DI (reg:DI 83 [ d.0_2 ])
>     (symbol_ref:DI ("y") [flags 0x2]  <var_decl 0x7ffff7fefb40 y>))
> 
> where reg:DI 83 is not marked with REG_POINTER and find_base_term
> doesn't find it to be an alternate base.  For the testcase the
> offending MEM has a MEM_EXPR and we have proper points-to info.
> 
> IMHO the proper solution is to kill base_alias_check or all problematic
> cases in find_base_term (binary ops with more than one non-CONST_INT
> operand).
> 
> And eventually make sure to more properly preserve MEM_EXPRs.
> 
> Maybe sth as "simple" as the following which of course fixes the
> testcase but will make find_base_term fail on any variable-indexed
> thing.
> 
> diff --git a/gcc/alias.c b/gcc/alias.c
> index 93f53543d12..3a66e10b431 100644
> --- a/gcc/alias.c
> +++ b/gcc/alias.c
> @@ -2009,12 +2009,14 @@ find_base_term (rtx x, vec<std::pair<cselib_val *,
>         rtx base = find_base_term (tmp1, visited_vals);
>         if (base != NULL_RTX
>             && ((REG_P (tmp1) && REG_POINTER (tmp1))
> -                || known_base_value_p (base)))
> +                || known_base_value_p (base))
> +           && CONST_INT_P (tmp2))
>           return base;
>         base = find_base_term (tmp2, visited_vals);
>         if (base != NULL_RTX
>             && ((REG_P (tmp2) && REG_POINTER (tmp2))
> -                || known_base_value_p (base)))
> +                || known_base_value_p (base))
> +           && CONST_INT_P (tmp1))
>           return base;
>  
>         /* We could not determine which of the two operands was the

"benchmarking" this by comparing cc1 with/without shows a difference mostly
in scheduling (but the number of differences is comparatively small!).  Also
overall text size shrinks with the patch (whatever that means).

On GIMPLE we try hard to not construct addresses "based" on the wrong
object, in fact IVOPTs has code to avoid building IVs based on
things like &a - &b and propagation avoids turning unintptr_t arithmetic
into pointer arithmetic even if it can see the converted from addresses.

All those things cannot be done on RTL since we lost the distinction between
pointers and integers and there's only PLUS.

So I have a _very_ hard time seeing how RTL can ever be fixed to discover
bases for alias analysis purposes without just resorting to MEM_EXPRs.

That is, unless we want to live with this kind of wrong-code bugs.

Similarly fishy is may_be_sp_based_p.
Comment 20 Richard Biener 2019-01-09 11:04:56 UTC
For stage3/gcc/*.o statistics show we perform 21051052 base_alias_check calls
and in the end 706852 times it is the one that would have disambiguated
things compared to if we remove it (thus as if we do base_alias_check last).

Note there's also

  base = find_base_term (x_addr);
  if (base && (GET_CODE (base) == LABEL_REF
               || (GET_CODE (base) == SYMBOL_REF
                   && CONSTANT_POOL_ADDRESS_P (base))))
    return 0;

which is suspicious but I guess harder to hit in practice so things go wrong.

base_alias_check is not exactly the first thing we check (but nearly) so
we'd roughly lose 3% disambiguations from RTL alias analysis if we scrap
base_alias_check completely.

That's probably too much.

Note the CONSTANT_POOL_ADDRESS_P thing isn't necessary and subsumed by
following checks so we could remove that without losing anything
(it hits only 84 times at all in the above set and later checks subsume it).
Comment 21 Richard Biener 2019-01-09 11:05:39 UTC
Created attachment 45389 [details]
statistic patch

patch I added to record statistics
Comment 22 Richard Biener 2019-01-09 12:24:26 UTC
Things we fail to disambiguate are

(mem:TF (pre_dec:SI (reg/f:SI 7 sp)) [0  S16 A8])
 vs.
(mem/c:TF (plus:SI (reg/f:SI 19 frame)
  (const_int -16 [0xfffffffffffffff0])) [1  S16 A128])

or

(mem:SI (pre_dec:SI (reg/f:SI 7 sp)) [3  S4 A32])
 vs.
(mem/f/c:SI (symbol_ref:SI ("argv") [flags 0x2] <var_decl 0x7f782d1e6360 argv>) [2 argv+0 S4 A32])

where I don't find anything besides CSELIB cselib_sp_based_value_p handling
in find_base_term that could be the one handling it?

I guess we should be able to somehow handle both sp and frame based
accesses in a more conservative way?
Comment 23 Richard Biener 2019-01-09 14:33:42 UTC
(In reply to Richard Biener from comment #22)
> Things we fail to disambiguate are
> 
> (mem:TF (pre_dec:SI (reg/f:SI 7 sp)) [0  S16 A8])
>  vs.
> (mem/c:TF (plus:SI (reg/f:SI 19 frame)
>   (const_int -16 [0xfffffffffffffff0])) [1  S16 A128])
> 
> or
> 
> (mem:SI (pre_dec:SI (reg/f:SI 7 sp)) [3  S4 A32])
>  vs.
> (mem/f/c:SI (symbol_ref:SI ("argv") [flags 0x2] <var_decl 0x7f782d1e6360
> argv>) [2 argv+0 S4 A32])
> 
> where I don't find anything besides CSELIB cselib_sp_based_value_p handling
> in find_base_term that could be the one handling it?
> 
> I guess we should be able to somehow handle both sp and frame based
> accesses in a more conservative way?

it's really 99% like this which is why eventually that CONST_INT restriction
worked so "well".

Can we easily identify spill slot accesses somehow?  The parameter accesses
(frame references?) should simply get appropriate MEM_EXPRs IMHO.
Comment 24 Richard Biener 2019-01-14 13:34:25 UTC
On GCC testcases one large group of MEMs only disambiguated through base_alias_check is disambiguations agains DSEs group_info->base_mem
which is BLKmode mems based on some "base" pointer.  This base_mem
lacks a MEM_EXPR but I think it shouldn't be difficult to add one,
like with (completely lacking sanity testing):

diff --git a/gcc/dse.c b/gcc/dse.c
index 389c52d4284..098c77165de 100644
--- a/gcc/dse.c
+++ b/gcc/dse.c
@@ -1097,6 +1097,7 @@ canon_address (rtx mem,
 {
   machine_mode address_mode = get_address_mode (mem);
   rtx mem_address = XEXP (mem, 0);
+  tree mem_expr = MEM_EXPR (mem);
   rtx expanded_address, address;
   int expanded;
 
@@ -1165,6 +1166,9 @@ canon_address (rtx mem,
          && const_or_frame_p (address))
        {
          group_info *group = get_group_info (address);
+         if (!MEM_EXPR (group->base_mem)
+             && mem_expr)
+           set_mem_expr (group->base_mem, get_base_address (mem_expr));
 
          if (dump_file && (dump_flags & TDF_DETAILS))
            {


btw, the disambiguations like

(mem/c:SI (symbol_ref:DI ("g") [flags 0x2] <var_decl 0x7fe01f78a510 g>) [1 g+0 S4 A32])
 vs. (mem:DI (pre_dec:DI (reg/f:DI 7 sp)) [0  S8 A8])

are handled through REG_BASE_VALUE which assigns 'sp' (address:DI -1).

I believe we should be working towards adding proper MEM_EXPRs to more
places and simply make find_base_term more conservative which means
simplifying the PLUS/MINUS cases.
Comment 25 Richard Biener 2019-01-16 14:05:57 UTC
When considering the patch from comment#18 additional data is that only
95802 out of 636160 disambiguations that ultimately require base_alias_check
involve non-CONST_INT_P "other" operand.  That is out of 9531871 total
cases that would run into base_alias_check, or 1%.

This is w/o "fixing" DSE (the simple patch of course miscompiles things).
Comment 26 Richard Biener 2019-01-17 12:37:57 UTC
(In reply to Richard Biener from comment #25)
> When considering the patch from comment#18 additional data is that only
> 95802 out of 636160 disambiguations that ultimately require base_alias_check
> involve non-CONST_INT_P "other" operand.  That is out of 9531871 total
> cases that would run into base_alias_check, or 1%.
> 
> This is w/o "fixing" DSE (the simple patch of course miscompiles things).

Sorting by invoking pass this is

    236 combine
   8138 cse1
   8404 cse2
   1423 cse_local
 144964 dse1
  83650 dse2
    350 ira
  32000 postreload
  32758 sched2
  75866 vartrack

so the majority comes from DSE (note the above numbers are for the full
boostrap so not comparable to the numbers quoted which are just for
stage3-gcc/*.o)

From CSE I also see the weird

(mem:TF (plus:SI (reg/f:SI 7 sp)
        (scratch:SI)) [0  S16 A8])
 vs. (mem/c:TF (plus:SI (reg/f:SI 19 frame)
        (const_int -16 [0xfffffffffffffff0])) [1  S16 A128])

what's this (scratch:SI)!?  Anyway, I wonder how with unknown scratch
value base_alias_check can figure out the above do not alias.