This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Aliasing rules for unannotated SYMBOL_REFs


Thanks for the answer, and sorry for slow follow-up.  Got distracted by
other things...

Jeff Law <law@redhat.com> writes:
> On Sat, 2020-01-25 at 09:31 +0000, Richard Sandiford wrote:
>> TL;DR: if we have two bare SYMBOL_REFs X and Y, neither of which have an
>> associated source-level decl and neither of which are in an anchor block:
>> 
>> (Q1) can a valid byte access at X+C alias a valid byte access at Y+C?
>> 
>> (Q2) can a valid byte access at X+C1 alias a valid byte access at Y+C2,
>>      C1 != C2?
>> 
>> Also:
>> 
>> (Q3) If X has a source-level decl and Y doesn't, and neither of them are
>>      in an anchor block, can valid accesses based on X alias valid accesses
>>      based on Y?
> So what are the  cases where Y won't have a source level decl but we
> have a decl in RTL?  anchors, other cases? 

Not really sure why I wrote "source-level" TBH.  I was really talking
about any symbol that has a SYMBOL_REF_DECL.

I think there are three "interesting" cases:

- symbols with a SYMBOL_REF_DECL
- anchor symbols
- bare symbols (i.e. everything else)

Bare symbols are hopefully rare these days.

>> (well, OK, that wasn't too short either...)
> I would have thought the answer would be "no" across the board.  But
> the code clearly indicates otherwise.
>
> Interposition clearly complicates things as do explicit aliases though.
>
>> This part seems obvious enough.  But then, apart from the special case of
>> forced address alignment, we use an offset-based check even for cmp==-1:
>> 
>>       /* Assume a potential overlap for symbolic addresses that went
>> 	 through alignment adjustments (i.e., that have negative
>> 	 sizes), because we can't know how far they are from each
>> 	 other.  */
>>       if (maybe_lt (xsize, 0) || maybe_lt (ysize, 0))
>> 	return -1;
>>       /* If decls are different or we know by offsets that there is no overlap,
>> 	 we win.  */
>>       if (!cmp || !offset_overlap_p (c, xsize, ysize))
>> 	return 0;
>> 
>> So we seem to be taking cmp==-1 to mean that although we don't know
>> the relationship between the symbols, it must be the case that either
>> (a) the symbols are equal (e.g. via aliasing) or (b) the accesses are
>> to non-overlapping objects.  In other words, one of the situations
>> described by cmp==1 or cmp==0 must be true, but we don't know which
>> at compile time.
> Right.  That was the conclusion I came to.  If a  SYMBOL_REF has an
> alias, the alias must have the same value as the SYMBOL_REF.  So their
> either equal or there's no valid case for overlap.
>
>> 
>> This means that in practice, the answer to (Q1) appears to be "yes"
>> but the answer to (Q2) appears to be "no".
> That would be my understanding once aliases/interpositioning come into
> play.
>
>> 
>> This somewhat contradicts:
>> 
>>   /* In general we assume that memory locations pointed to by different labels
>>      may overlap in undefined ways.  */
>>   return -1;
>> 
>> at the end of compare_base_symbol_refs, which seems to be saying
>> that the answer to (Q2) ought to be "yes" instead.  Which is right?
> I'm not sure how we could get to yes in that case.  A symbol alias or
> interposition ultimately still results in two symbols having the same
> final address.  Thus for a byte access if C1 != C2, then we can't have
> an overlap.

I think it's handling cases in which one symbol is a bare symbol (has no
decl and isn't an anchor).  I assumed the idea was that we could have a
decl-less SYMBOL_REF for the start of a particular section, or things
like that.

>> In PR92294 we have a symbol X at ANCHOR+OFFSET that's preemptible.
>> Under the (Q1)==yes/(Q2)==no assumption, cmp==-1 means that either
>> (a) X = ANCHOR+OFFSET or (b) X and ANCHOR reference non-overlapping
>> objects.  So we should take the offset into account when doing:
>> 
>>       if (!cmp || !offset_overlap_p (c, xsize, ysize))
>> 	return 0;
>> 
>> Let's call this FIX1.
> So this is a really interesting wrinkle.  Doesn't this change Q2 to a
> yes?  In particular it changes the "invariant" that the symbols have
> the same address in the event of an symbol alias or interposition.  Of
> course one could ask the question of whether or not we should handle
> cases with anchors specially.

This wouldn't come under Q2, since that was about symbols that aren't in
an anchor block.  I think it just means we need to generalise the three
cases that don't involve bare symbols from:

  - known equal
  - independent
  - equal or independent

to:

  - known distance apart
  - independent
  - known distance apart or independent

It's fortunate that anchors themselves can't be interposed. :-)

>> But that then brings us to: why does memrefs_conflict_p return -1
>> when one symbol X has a decl and the other symbol Y doesn't, and neither
>> of them are block symbols?  Is the answer to (Q3) that we allow equality
>> but not overlap here too?  E.g. a linker script could define Y to X but
>> not to a region that contains X at a nonzero offset?
> Does digging into the history provide any insights here?

Not that I could see.  The code in question was part of a single patch.

> I'm not sure given the issues you've introduced if I could actually
> fill out the matrix of answers without more underlying information. 
> ie, when can we get symbols without source level decls, 
> anchors+interposition issues, etc.

OK.  In that case, I wonder whether it would be safer to have a
fourth state on top of the three above:

  - known distance apart
  - independent
  - known distance apart or independent
  - don't know

with "don't know" being anything that involves bare symbols?

Richard


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]