This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

tips on debugging a GCC 3.4.3 MIPS RTL optim problem?


Hello, using the 3.4.3 baseline on SGI MIPS3 Irix6.5,
I'm running into a problem where bad code is generated on a relatively
trivial program when both -funit-at-a-time and -foptimize-sibling-calls
is asserted.  The nature of the failure is that the RTL optimizer
seems to get confused about what value should be targeted to an
argument register; it seems to coallesce two separate temporaries
into one.  Note that the original RTL being generated originates
in some new code that I've added to support an experimental dialact
of C (called UPC), so it isn't out of the question that there is some
aliasing or other issue that I've introduced.  However, most tests
are passing, and just a few show the failure mode illustrated below.
All the tests pass on i386 and IA64, fyi -- they don't demonstrate
this failure.

First question: are there known problems in 3.4.3 with -funit-at-a-time
and/or -foptimize-sibling-calls? (I ran a few queries of the Bugzilladatabase but didn't find anything).

I confirmed the problematic optimizations by compiling the program with
 -O0 -funit-at-a-time -foptimize-sibling-calls
and noticed that correct code is generated if either
or both optimization switches are removed from the command line.

I tried debugging the problem by compiling with -da and
looked at the various rtl dump files:

t.upc.00.cgraph   t.upc.07.addressof  t.upc.25.greg        t.upc.35.mach
t.upc.01.rtl      t.upc.11.cfg        t.upc.26.postreload
t.upc.02.sibling  t.upc.19.life       t.upc.27.flow2
t.upc.04.jump     t.upc.24.lreg       t.upc.29.ce3

The bad code shows up in t.upc.02.sibling, so probably -dr -di would
have sufficed.

The problem that I'm seeing is illustrated in the following RTL:


(insn 66 65 77 0 (set (reg:SI 225 [ <anonymous> ])
        (reg/f:SI 177 virtual-stack-vars)) -1 (nil)
    (nil))
 
(insn 77 66 78 0 (set (reg:DI 228)
        (const_int 0 [0x0])) -1 (nil)
    (nil))
 
(insn 78 77 79 0 (set (reg:DI 228)
        (mem/s:DI (reg/f:SI 177 virtual-stack-vars) [0 S8 A128])) -1 (nil)
    (nil))
 
(insn 79 78 80 0 (set (reg:DI 4 $4)
        (reg:DI 228)) -1 (nil)
    (nil))
 
(insn 80 79 81 0 (set (reg:SI 5 $5)
        (reg:SI 225 [ <anonymous> ])) -1 (nil)
    (nil))
 
(insn 81 80 82 0 (set (reg:SI 6 $6)
        (reg:SI 224 [ <anonymous> ])) -1 (nil)
    (nil))
 
(insn 82 81 83 0 (set (reg:SI 229)
        (unspec:SI [
                (reg:SI 28 $28)
                (const:SI (unspec:SI [
                            (symbol_ref:SI ("__putblk3") [flags 0x41] <function_decl 40ced00 __putblk3>)
                        ] 107))
                (reg:SI 79 $fakec)
            ] 27)) -1 (nil)
    (nil))
 
(call_insn 83 82 115 0 (parallel [
            (call (mem:SI (reg:SI 229) [0 S4 A32])
                (const_int 0 [0x0]))
            (clobber (reg:SI 31 $31))
        ]) -1 (nil)
    (nil)
    (expr_list (use (reg:SI 28 $28))
        (expr_list (use (reg:SI 6 $6))
            (expr_list (use (reg:SI 5 $5))
                (expr_list (use (reg:DI 4 $4))
                    (nil))))))
 
(insn 115 83 116 0 (clobber (mem/s:BLK (reg/f:SI 177 virtual-stack-vars) [0 A128])) -1 (nil)

Above, the second argument (reg:SI $5) is set to (reg:SI 225), which
in turn is set to (reg/f:SI 177 virtual-stack-vars) which is simply
the frame pointer.  Note that the first argument (reg:SI $4) will
end up being set to the contents of the location that the frame
pointer points to -- this is incorrect -- it should be set to the
contents of 16($fp), or at least some other location than the
double word location beginning at $fp.

It looks as if the optimizer somehow aliased the two locations,
or it decided somehow that they weren't both live at the same time.

If we maintain the -foptimize-sibling-calls switch but do not
assert -funit-at-a-time, the following correct RTL is generated:

(insn 39 38 40 0 (set (reg:SI 205)
        (const_int 8 [0x8])) -1 (nil)
    (nil))
 
(insn 40 39 41 0 (set (reg:SI 206)
        (reg/f:SI 177 virtual-stack-vars)) -1 (nil)
    (nil))
 
(insn 41 40 42 0 (set (reg:DI 207)
        (const_int 0 [0x0])) -1 (nil)
    (nil))

 
(insn 42 41 43 0 (set (reg:DI 207)
        (mem/s:DI (plus:SI (reg/f:SI 177 virtual-stack-vars)
                (const_int 16 [0x10])) [0 S8 A128])) -1 (nil)
    (nil))
 
(insn 43 42 44 0 (set (reg:DI 4 $4)
        (reg:DI 207)) -1 (nil)
    (nil))
 
(insn 44 43 45 0 (set (reg:SI 5 $5)
        (reg:SI 206)) -1 (nil)
    (nil))
 
(insn 45 44 46 0 (set (reg:SI 6 $6)
        (reg:SI 205)) -1 (nil)
    (nil))
 
(call_insn 46 45 48 0 (parallel [
            (call (mem:SI (symbol_ref:SI ("__putblk3") [flags 0x41] <function_decl 40ced00 __putblk3>) [0 S4 A32])
                (const_int 0 [0x0]))
            (clobber (reg:SI 31 $31))
        ]) -1 (nil)
    (nil)
    (expr_list (use (reg:SI 28 $28))
        (expr_list (use (reg:SI 6 $6))
            (expr_list (use (reg:SI 5 $5))
                (expr_list (use (reg:DI 4 $4))
                    (nil))))))
 
(insn 48 46 49 0 (clobber (mem/s:BLK (plus:SI (reg/f:SI 177 virtual-stack-vars)
                (const_int 16 [0x10])) [0 A128])) -1 (nil)
    (nil))

Here it is a little different, because the first arg. ($4) is
set the contents of 16($fp), and the second arg. is set the $fp.

When -funit-at-a-time is asserted, I tried looking at t.upc.01.rtl to
get a picture of the RTL before it is optimized.  However, the RTL
is not very comprehensible and seems abbreviated. For example, this
is the call to __putblk3:

(call_insn 84 66 141 (call_placeholder 77 67 0 0 (call_insn 83 82 0 (parallel [
                (call (mem:SI (reg:SI 229) [0 S4 A32])
                    (const_int 0 [0x0]))
                (clobber (reg:SI 31 $31))
            ]) -1 (nil)
        (nil)
        (expr_list (use (reg:SI 28 $28))
            (expr_list (use (reg:SI 6 $6))
                (expr_list (use (reg:SI 5 $5))
                    (expr_list (use (reg:DI 4 $4))
                        (nil))))))) -1 (nil)
    (nil)
    (nil))

Note that there is no mention of __putblk3 at all, presumably because
it is buried inside the call place holder somewhere.

Also stranger still, there is no mention of the argument registers
at all.

Am I wrong in assuming that the two digits in the names of the RTL
dump files indicate the sequence of the RTL passes? Which dump file
has the unoptimized RTL?   Which passes run before the sibling
call optimization?

It may be worth adding here that the double word locations involved
are records whose values set by moving a constructor to the relevant
location.  Although Ada probably uses constructors a lot, it
wouldn't surprise me if this area isn't heavily tested.

Any tips on debugging this codegen issue would be appreciated.

thanks - Gary


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]