This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: should MEM tracking be able to optimize this?


On Fri, Nov 16, 2001 at 12:47:30PM -0800, Dan Nicolaescu wrote:
> 
> The following 2 functions should generate very similar assembly, right?
...

This is purely a sticking-my-nose-in comment, but I looked into it
briefly and it does appear to be a genuinely hard problem with our
intermediate representation.

At the RTL level, loads and stores in calc1 look like

(insn 45 43 47 (set (reg:SF 123)
        (mem/s:SF (plus:SI (reg/f:SI 111)
                (reg:SI 115)) [4 A S4 A32])) -1 (nil)
    (nil))

where in calc2 they look like

(insn 85 83 86 (set (reg:SF 150)
        (mem/s:SF (plus:SI (reg/f:SI 145)
                (reg:SI 148)) [4 p S4 A32])) -1 (nil)
    (nil))

alias.c uses the information in square brackets to make decisions
about whether memory refs can conflict.  [4 A S4 A32] means alias set
4, variable 'A', unit size 4 (bytes), alignment 32 (bits).

In theory, if we had a thing we could stick into that structure that
meant "A.p" instead of "A", that would be enough to get identical
assembly for both loops.  The trouble is that we don't.

At the tree level there is enough information to be clear what is
going on: the C expression "A.f[i]" becomes this tree:

 <array_ref
    arg 0 <component_ref
        arg 0 <var_decl A>
        arg 1 <field_decl f>
    arg 1 <var_decl i>>

(This is a brutally trimmed down version of what you'd get if you
called debug_tree() on the expression node.)

The tree expansion routines eventually call get_inner_reference()
which transforms that into a (decl, offset) pair:

 <var_decl A>

 <mult_expr
    arg 0 <var_decl i>
    arg 1 <integer_cst 4>>

If we'd asked it for A.p[i] instead, the offset would be something
like <plus <mult <var i> <constant 4>> <constant 8192>> instead.

The "variable" slot of [4 A S4 A32] is, in fact, <var_decl A> as
returned by get_inner_reference.  Here's the rub: you might think that
it would work to put <field_decl f> there instead, but FIELD_DECLs are
unique to the type; all instances of a structure of that type use the
same <field_decl f>.

It might conceivably work to use <component_ref <var A> <field f>> in
that slot.  However, I do not believe that COMPONENT_REFs are unique;
if you refer to A.f twice you'll probably get two trees in memory, and
that would defeat the simple address comparison currently being done
by alias.c.  They could be _made_ unique, or alias.c could look more
deeply, but that would be More Work Than I Have Time For (tm).

I do think that we should be able to do this optimization; perhaps
I'll stick these notes onto the projects page for someone who wants to
pick up the ball.

zw


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]