This is the mail archive of the
mailing list for the GCC project.
Re: should MEM tracking be able to optimize this?
- From: Zack Weinberg <zack at codesourcery dot com>
- To: Dan Nicolaescu <dann at godzilla dot ICS dot UCI dot EDU>
- Cc: gcc at gcc dot gnu dot org, Richard Kenner <kenner at vlsi1 dot ultra dot nyu dot edu>
- Date: Sat, 17 Nov 2001 14:16:03 -0800
- Subject: Re: should MEM tracking be able to optimize this?
- References: <email@example.com>
On Fri, Nov 16, 2001 at 12:47:30PM -0800, Dan Nicolaescu wrote:
> The following 2 functions should generate very similar assembly, right?
This is purely a sticking-my-nose-in comment, but I looked into it
briefly and it does appear to be a genuinely hard problem with our
At the RTL level, loads and stores in calc1 look like
(insn 45 43 47 (set (reg:SF 123)
(mem/s:SF (plus:SI (reg/f:SI 111)
(reg:SI 115)) [4 A S4 A32])) -1 (nil)
where in calc2 they look like
(insn 85 83 86 (set (reg:SF 150)
(mem/s:SF (plus:SI (reg/f:SI 145)
(reg:SI 148)) [4 p S4 A32])) -1 (nil)
alias.c uses the information in square brackets to make decisions
about whether memory refs can conflict. [4 A S4 A32] means alias set
4, variable 'A', unit size 4 (bytes), alignment 32 (bits).
In theory, if we had a thing we could stick into that structure that
meant "A.p" instead of "A", that would be enough to get identical
assembly for both loops. The trouble is that we don't.
At the tree level there is enough information to be clear what is
going on: the C expression "A.f[i]" becomes this tree:
arg 0 <component_ref
arg 0 <var_decl A>
arg 1 <field_decl f>
arg 1 <var_decl i>>
(This is a brutally trimmed down version of what you'd get if you
called debug_tree() on the expression node.)
The tree expansion routines eventually call get_inner_reference()
which transforms that into a (decl, offset) pair:
arg 0 <var_decl i>
arg 1 <integer_cst 4>>
If we'd asked it for A.p[i] instead, the offset would be something
like <plus <mult <var i> <constant 4>> <constant 8192>> instead.
The "variable" slot of [4 A S4 A32] is, in fact, <var_decl A> as
returned by get_inner_reference. Here's the rub: you might think that
it would work to put <field_decl f> there instead, but FIELD_DECLs are
unique to the type; all instances of a structure of that type use the
same <field_decl f>.
It might conceivably work to use <component_ref <var A> <field f>> in
that slot. However, I do not believe that COMPONENT_REFs are unique;
if you refer to A.f twice you'll probably get two trees in memory, and
that would defeat the simple address comparison currently being done
by alias.c. They could be _made_ unique, or alias.c could look more
deeply, but that would be More Work Than I Have Time For (tm).
I do think that we should be able to do this optimization; perhaps
I'll stick these notes onto the projects page for someone who wants to
pick up the ball.