This is the mail archive of the
mailing list for the GCC project.
Re: Some aliasing questions
- From: Bill Schmidt <wschmidt at linux dot vnet dot ibm dot com>
- To: Alan Modra <amodra at gmail dot com>
- Cc: gcc at gcc dot gnu dot org, Richard Henderson <rth at redhat dot com>, dje dot gcc at gmail dot com, rguenther at suse dot de
- Date: Tue, 12 Apr 2016 08:03:37 -0500
- Subject: Re: Some aliasing questions
- Authentication-results: sourceware.org; auth=none
- References: <1460139016 dot 18355 dot 36 dot camel at oc8801110288 dot ibm dot com> <57081761 dot 8040502 at redhat dot com> <20160412003006 dot GT18129 at bubble dot grove dot modra dot org>
On Tue, 2016-04-12 at 10:00 +0930, Alan Modra wrote:
> On Fri, Apr 08, 2016 at 01:41:05PM -0700, Richard Henderson wrote:
> > On 04/08/2016 11:10 AM, Bill Schmidt wrote:
> > > The first is an issue with TOC-relative addresses on PowerPC. These are
> > > symbolic addresses that are to be loaded from a fixed slot in the table
> > > of contents, as addressed by the TOC pointer (r2). In the RTL phases
> > > prior to register allocation, these are described in an UNSPEC that
> > > looks like this for an example store:
> > >
> > > (set (mem/c:DI (unspec:DI [
> > > (symbol_ref:DI ("*.LANCHOR0") [flags 0x182])
> > > (reg:DI 2 2)
> > > ] UNSPEC_TOCREL) [1 svul+0 S8 A128])
> > > (reg:DI 178))
> > >
> > > The UNSPEC helps keep track of the r2 reference until this is split into
> > > two or more insns depending on the memory model.
> > That's why Alpha uses LO_SUM for pre-reload tracking of such things.
> > Even though that's a bit of a liberty, since there's no HIGH to go along with
> > the LO_SUM. But at least it allows the middle-end to continue to find the symbol.
> I wish I'd been made aware of the problem with alias analysis when I
> invented this scheme for -mcmodel=medium code..
It's certainly subtle. I had to be pretty lucky to discover it, as the
only effect is to rather harmlessly say "who knows" rather than giving a
> Back in gcc-4.3 days, when small-model code was the only option, we
> used to generate
> mem (plus ((reg 2) (const (minus ((symbol_ref)
> (symbol_ref toc_base))))))
> for a toc mem reference, which accurately reflects the addressing.
> The problem is that when splitting this to a high/lo_sum you lose the
> r2 reference in the lo_sum, and that allows r2 to die prematurely,
> breaking an important linker code editing optimisation.
> Hmm. Maybe if we rewrote the mem to
> mem (plus ((symbol_ref toc_base) (const (minus ((symbol_ref)
> (reg 2))))))
> It might look odd, but is no lie. r2 is equal to toc_base. Or
> perhaps we could lie a litte and simply omit the plus and toc_base
> Either way, when we split to
> set (reg tmp) (high (const (minus ((symbol_ref) (reg 2)))))
> .. mem (lo_sum (reg tmp) (const (minus ((symbol_ref) (reg 2)))))
> both high and lo_sum reference r2 and the linker could happily replace
> rtmp in the lo_sum insn with r2 when the high address is known to be
Yes, this sounds promising. And it really helps to know the history
here -- you saved me a lot of digging through the archives, since I
didn't want to rediscover the issue behind the present design.
> Bill, do you have test cases for the alias problem? Is this something
> that needs fixing for gcc-6?
Last question first ... no, I don't think it does. It's generally fine
for the structural aliasing to report "I don't know" and let other
checks decide whether aliasing can exist; it just isn't optimal. I only
spotted this because getting past this check allowed me to run into a
problem in my code that was exposed in the TBAA checks afterwards.
I ran into this with an experimental patch for GCC 7. I can send you a
copy of the patch, and point you to the test in the test suite that
exhibits the problem when that patch is applied. I'll do that offline.