This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: Basic infrastructure for substitution tracking
On Sun, 20 Sep 2009, Jan Hubicka wrote:
> >
> > This seems to duplicate what DEBUG_STMTs do. Can you convince me
> > otherwise by providing an actual example and explaining what your code
> > will do?
>
> Yes, the tracking is sort of subset of what DEBUG_STMT does. The main
> difference is that tracking declare that "everywhere the variable was
> defined, it is replaced by this expression" function wide. One don't
> need duplicated declaration nor the statement and thus is a lot cheaper
> and can handle common cases (plus tree-sra that seem to be bit difficult
> to fit into debug-stmt code).
>
> It is also bit more robust too, since debug_stmts inserted at the random
> places of program (entry points of inlined functions) might drift away.
>
> Lets take following expample "C++"
>
> struct a{
> struct b {int a;} b;
> struct c{ int a;} c;
> };
>
> static void
> t1 (struct b *a, int b)
> {
> printf ("%i %i\n",a, b);
> }
> static void
> t2 (struct c *a, char *msg)
> {
> printf ("%i %s\n",a,msg);
> }
> static void
> t3 (struct a *a)
> {
> t1(&a->b, 1);
> t2(&a->c, "test");
> }
> struct a a={{0},{1}};
> main()
> {
> t3(&a);
> }
>
> Now what happens is that early inliner inlines t1+t2 into t3 and later
> we inline t3 into main. With inline subtitution we can track these
> changes as follows
>
> ;; Function t3 (t3)
>
> Scope blocks after cleanups:
>
> { Scope block #0
>
> { Scope block #2 t.c:20 Originating from : static void t2 (struct c *, char *);
> struct c * a = struct c *; [value-expr: &a->c ] (nonlocalized)
> char * msg = char *; [value-expr: &"test"[0] ] (nonlocalized)
>
> { Scope block #3 Originating from :#0
> extern int printf (void); (nonlocalized)
>
> }
>
> }
>
> { Scope block #4 t.c:19 Originating from : static void t1 (struct b *, int);
> struct b * a = struct b *; [value-expr: &a->b ] (nonlocalized)
> int b = int; [value-expr: 1 ] (nonlocalized)
>
> { Scope block #5 Originating from :#0
> extern int printf (void); (nonlocalized)
>
> }
>
> }
>
> }
> t3 (struct a * a)
> {
> struct c * D.2721;
> struct b * D.2720;
>
> <bb 2>:
> D.2720_2 = &a_1(D)->b;
> printf (&"%i %i\n"[0], D.2720_2, 1);
> D.2721_3 = &a_1(D)->c;
> printf (&"%i %s\n"[0], D.2721_3, &"test"[0]);
> return;
>
> }
>
> And later we get into main:
>
> Scope blocks after cleanups:
>
> { Scope block #0
>
> { Scope block #6 t.c:25 Originating from : static void t3 (struct a *);
> struct a * a = struct a *; [value-expr: &a ] (nonlocalized)
>
> { Scope block #7 Originating from :#0
>
> { Scope block #8 t.c:20 Originating from : static void t2 (struct c *, char *);
> struct c * a = struct c *; [value-expr: &a.c ] (nonlocalized)
> char * msg = char *; [value-expr: &"test"[0] ] (nonlocalized)
>
> { Scope block #9 Originating from :#0
> extern int printf (void); (nonlocalized)
>
> }
>
> }
>
> { Scope block #10 t.c:19 Originating from : static void t1 (struct b *, int);
> struct b * a = struct b *; [value-expr: &a.b ] (nonlocalized)
> int b = int; [value-expr: 1 ] (nonlocalized)
>
> { Scope block #11 Originating from :#0
> extern int printf (void); (nonlocalized)
>
> }
>
> }
>
> }
>
> }
>
> }
Ok, so this is equivalent to inserting DEBUG_STMTs for all param decls
in the caller at the start of the callee copy. But it is cheaper
because when that function is inlined we don't copy those DEBUG_STMTs
but instead via NONLOCALIZED_VARS we know they were the same.
Now I see you actually adjust the DECL_VALUE_EXPRs and unshare the
expression at some point. But this will adjust all inlined-to
copies which is not correct?
I am btw confused by your re-using of DECL_VALUE_EXPR -- that might
be already in use, and in this patch I see you only ever set it if
it was already set which obviously cannot be everything required?
I am concerned by all the duplicating of what DEBUG_STMTs can do
(the instantiation and the substitution) - it adds a lot of code.
Doesn't the substitution code in tree-ssa-live.c have the same
problem as DEBUG_STMTs, exponential growth of expression size?
Richard.
> Now dwarf2out can annotate all the user vars
>
> .uleb128 0xf # (DIE (0x166) DW_TAG_formal_parameter)
> .long 0x11f # DW_AT_abstract_origin
> .quad a # DW_AT_const_value
> .uleb128 0x10 # (DIE (0x173) DW_TAG_inlined_subroutine)
> .long 0xd1 # DW_AT_abstract_origin
> .quad .LBB13 # DW_AT_entry_pc
> .long .Ldebug_ranges0+0x30 # DW_AT_ranges
> .byte 0x1 # DW_AT_call_file (t.c)
> .byte 0x14 # DW_AT_call_line
> .long 0x1b6 # DW_AT_sibling
> .uleb128 0x11 # (DIE (0x18a) DW_TAG_formal_parameter)
> .long 0xdd # DW_AT_abstract_origin
> .byte 0xe # DW_AT_location
> .byte 0x3 # DW_OP_addr
> .quad a
> .byte 0x23 # DW_OP_plus_uconst
> .uleb128 0x4
> .byte 0x9f # DW_OP_stack_value
> .byte 0x93 # DW_OP_piece
> .uleb128 0x8
> .uleb128 0x11 # (DIE (0x19e) DW_TAG_formal_parameter)
> .long 0xe6 # DW_AT_abstract_origin
> .byte 0xc # DW_AT_location
> .byte 0x3 # DW_OP_addr
> .quad .LC1
> .byte 0x9f # DW_OP_stack_value
> .byte 0x93 # DW_OP_piece
> .uleb128 0x8
> .uleb128 0x12 # (DIE (0x1b0) DW_TAG_lexical_block)
> .long .Ldebug_ranges0+0x60 # DW_AT_ranges
> .byte 0x0 # end of children of DIE 0x173
> .uleb128 0x13 # (DIE (0x1b6) DW_TAG_inlined_subroutine)
> .long 0x85 # DW_AT_abstract_origin
> .quad .LBB16 # DW_AT_low_pc
> .quad .LBE16 # DW_AT_high_pc
> .byte 0x1 # DW_AT_call_file (t.c)
> .byte 0x13 # DW_AT_call_line
> .uleb128 0x11 # (DIE (0x1cd) DW_TAG_formal_parameter)
> .long 0x91 # DW_AT_abstract_origin
> .byte 0xc # DW_AT_location
> .byte 0x3 # DW_OP_addr
> .quad a
> .byte 0x9f # DW_OP_stack_value
> .byte 0x93 # DW_OP_piece
> .uleb128 0x8
> .uleb128 0x14 # (DIE (0x1df) DW_TAG_formal_parameter)
> .long 0x9a # DW_AT_abstract_origin
> .byte 0x1 # DW_AT_const_value
> .uleb128 0x15 # (DIE (0x1e5) DW_TAG_lexical_block)
> .quad .LBB17 # DW_AT_low_pc
> .quad .LBE17 # DW_AT_high_pc
> .byte 0x0 # end of children of DIE 0x1b6
> .byte 0x0 # end of children of DIE 0x153
> .byte 0x0 # end of children of DIE 0x12f
>
> And this happens with significandly less overhead than with debug
> statements plus we can avoid building the statements when not
> optimizing. With -O2 -g I get 450MB->320MB of memory use for
> integration and variable tracking goes from 40MB to 30MB. (with
> producing 10 times more locations in debug info that we do on mainline)
>
> Memory usage is 460->440 for generate-3.4.ii and 20K savings for
> combine.c at -O3 (so the code really helps saving many debug statemetns
> only with C++, but that is expected).
>
> With GDB with inline tracking and DW_OP_stack_value support (I don't
> have this one, but by combining current mainline and gdb with inline
> tracking I can pretty much approximate this) I can single step through
> the whole sequence and get proper values minus the fact that backtraces
> are missing arguments since ipa-cp is not yet updated to record the
> removed arguments for debug info. With -fno-ipa-cp -O2 -g I get pretty
> nice results and I plan to update ipa-cp shortly (it can even be
> declared regression and is not too difficult to do: we just need to keep
> track of arguments being removed and teach dwarf2out to re-build
> original sequence when producing formal parameter list).
>
> On current mainline we work hard enough to get all the DEBUG_INSNSs at
> the place but for some reason we fail to produce useful debug info. I
> am looking it it now.
>
> Honza
>
>
--
Richard Guenther <rguenther@suse.de>
Novell / SUSE Labs
SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746 - GF: Markus Rex