This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Re: Solaris bootstrap failure


I've been spending some time tracking down the cause of the bootstrap
comparison failure much more thoroughly, and I have some interesting
results to report.

I have still been using my Mar 6 13:33 PST to do this investigation,
since I wished to be sure that I could reproduce the failure reliably.

The problem is definitely in cselib_lookup() and the way it uses its
hash table.  The hash table itself appears to be working correctly.
However, cselib uses pointers to structures as part of the hash values
for some expressions.  Because the stage1 and stage2 compilers have
different memory maps, these pointer values differ, and thus the hash
values for these expressions differ.  This leads to a different
arrangement of items in the hash table between the two compilers,
since some items are put into different slots.

Because of the way cselib_lookup uses the hash table, its results may
sometimes depend on the arrangement of items in the table.  This means
that in certain circumstances, the results of optimizations which use
cselib_lookup may depend on the compiler's memory map.  So two stages
of a bootstrap may produce slightly different code, or a cross
compiler for a particular target on one host type may generate
slightly different code from a cross compiler for the same target on a
different host type.

Here are the details of the specific failure I'm seeing.  This
explanation is very long and potentially confusing.  I apologize for
this; the failure itself is subtle and tricky to describe or explain.

****************************************************************

The function being compiled is no_linkage_helper() in cp/tree.c.  The
problem occurs within reload_cse_regs(), while reload_cse_simplify_set()
is processing insn 37 of this function.

Here is the RTL of no_linkage_helper up to insn 37 just after global
register allocation and reload have completed.  (Both compilers
generate the same RTL up to this point.)  For clarity, I have omitted
all comments and NOTE insns, as well as machine description pattern
numbers, pattern names, insn_lists, and expr_lists.  If you wish to
see these, let me know.


(insn 4 24 13 (set (reg/v/f:SI 24 %i0 [106])
        (reg:SI 24 %i0)) ... )

(insn 15 13 16 (set (reg/v/f:SI 24 %i0 [109])
        (mem:SI (reg/v/f:SI 24 %i0 [106]) 49)) ... )

(insn 21 17 168 (set (reg:SI 10 %o2 [113])
        (zero_extend:SI (mem/s:QI (plus:SI (reg/v/f:SI 24 %i0 [109])
                    (const_int 8 [0x8])) 100))) ... )

(insn 168 21 18 (set (reg/f:SI 8 %o0 [111])
        (high:SI (symbol_ref:SI ("tree_code_type")))) ... )

(insn 18 168 25 (set (reg/f:SI 8 %o0 [110])
        (lo_sum:SI (reg/f:SI 8 %o0 [111])
            (symbol_ref:SI ("tree_code_type")))) ... )

(insn 25 18 26 (set (reg:SI 9 %o1 [115])
        (sign_extend:SI (mem/s:QI (plus:SI (reg/f:SI 8 %o0 [110])
                    (reg:SI 10 %o2 [113])) 0))) ... )

(insn 26 25 27 (set (reg:CC 100 %icc)
        (compare:CC (reg:SI 9 %o1 [115])
            (const_int 116 [0x74]))) ... )

(jump_insn 27 26 141 (set (pc)
        (if_then_else (ne (reg:CC 100 %icc)
                (const_int 0 [0x0]))
            (label_ref 126)
            (pc))) ... )

(insn 29 141 31 (set (reg:QI 9 %o1 [117])
        (mem/s:QI (plus:SI (reg/v/f:SI 24 %i0 [109])
                (const_int 8 [0x8])) 0)) ... )

(insn 31 29 32 (set (reg:SI 8 %o0 [118])
        (plus:SI (reg:SI 9 %o1 [117])
            (const_int -21 [0xffffffeb]))) ... )

(insn 32 31 33 (set (reg:SI 8 %o0 [119])
        (zero_extend:SI (reg:QI 8 %o0 [118]))) ... )

(insn 33 32 34 (set (reg:CC 100 %icc)
        (compare:CC (reg:SI 8 %o0 [119])
            (const_int 1 [0x1]))) ... )

(jump_insn 34 33 142 (set (pc)
        (if_then_else (gtu (reg:CC 100 %icc)
                (const_int 0 [0x0]))
            (label_ref 165)
            (pc))) ... )

(insn 37 142 38 (set (reg:SI 9 %o1 [150])
        (zero_extend:SI (reg:QI 9 %o1 [117]))) ... )


As I've said before, while the stage1 compiler is processing insn 37,
it replaces the expression:

(zero_extend:SI (reg:QI 9 %o1 [117]))

with:

(reg:SI 10 %o2 [113])

This replacement is valid, since these are in fact the same value.
Here's why.

Since there are no CODE_LABEL's in the RTL up to insn 37, we don't
have to worry about a jump from somewhere else, and we can do a simple
straight-line analysis.

The last insn to set register 9 (%o1) is insn 29, which sets it to the
value:

(mem/s:QI (plus:SI (reg/v/f:SI 24 %i0 [109])
		   (const_int 8 [0x8])) 0)

The last insn to set register 10 (%o2) is insn 21, which precedes insn
29 and sets register 10 to the value:

(zero_extend:SI (mem/s:QI (plus:SI (reg/v/f:SI 24 %i0 [109])
				   (const_int 8 [0x8])) 100))

Register 24 (%i0) is unchanged between insns 21 and 29, since no
intervening insn sets it.  The memory byte at %i0+8 is also unchanged
between insns 21 and 29, since no intervening insn stores to memory at
all.  There are also no function calls or jump targets between insns
21 and 29.

Therefore the expressions:

(zero_extend:SI (reg:QI 9 %o1 [117]))

and

(reg:SI 10 %o2 [113])

must have the same value at insn 37.

Why does the stage1 compiler decide to do the replacement while the
stage2 compiler doesn't?  Let's look at what happens while the two
compilers process insn 37 within reload_cse_simplify_set().

cselib's hash table uses "open addressing" to resolve collisions; if
the first slot corresponding to the hash value is occupied, the hash
value is used to generate a slot interval, and succeeding slots
separated by this interval (wrapping around) are checked in turn until
the desired entry or an unoccupied slot is found.

Here is insn 37 when reload_cse_simplify_set() is called on it:

(insn 37 142 38 (set (reg:SI 9 %o1 [150])
        (zero_extend:SI (reg:QI 9 %o1 [117]))) 125 {*zero_extendqisi2_insn} (nil)
    (nil))

reload_cse_simplify_set() calls cselib_lookup() on the source
expression:

(zero_extend:SI (reg:QI 9 %o1 [117]))

cselib_lookup() calls hash_rtx() to compute the hash value of this
expression, which is 183 in both stage1 and stage2 compilers.

cselib_lookup() then calls htab_find_slot_with_hash() to look up the
source expression in the hash table.

Hash value 183 corresponds to hash table entry 28, so this entry is
checked first.  To check the entry, entry_and_rtx_equal_p() is called
to compare the entry's expression with the source expression.

In the stage1 compiler, table entry 28 contains the expression:

(compare:CC (reg:SI 9 %o1 [115])
    (const_int 116 [0x74]))

Since the modes of the table entry expression and the source
expression are different, the comparison fails.

In the stage2 compiler, table entry 28 contains the expression:

(lo_sum:SI (reg/f:SI 8 %o0 [111])
    (symbol_ref:SI ("tree_code_type")))

The modes of the table entry expression and the source expression are
the same, so entry_and_rtx_equal_p() calls rtx_equal_for_cselib_p().
But since the RTX codes of the expressions differ,
rtx_equal_for_cselib_p() returns false, so the comparison fails.

The comparison failed for both compilers.  Since the hash table uses
open addressing, the next entry to check depends on the hash value.
In this case, both compilers now check table entry 7.

In the stage2 compiler, table entry 7 is empty, so the check fails.
Therefore the hash table lookup has failed for the stage2 compiler,
and cselib_lookup() also fails.  The stage2 compiler has not found a
suitable expression that can be substituted for the source expression
of insn 37.

However, in the stage1 compiler, table entry 7 contains the
expression:

(plus:SI (reg:SI 9 %o1 [117])
    (const_int -21 [0xffffffeb]))

Again entry_and_rtx_equal_p() is called to compare the table entry
expression with the source expression.  The modes are the same, so
entry_and_rtx_equal_p() calls rtx_equal_for_cselib_p().  But since the
RTX codes of the expressions differ, rtx_equal_for_cselib_p() returns
false, so the comparison fails.

Now the stage1 compiler checks the next hash table entry to use
according to its open addressing scheme.  This is entry 17, which
contains a list of two expressions.  In order, these are:

(reg:SI 10 %o2 [113])

(zero_extend:SI (value:QI))

The VALUE rtx here refers to table entry 3, which also has two common
subexpressions; in order, these are:

(reg:QI 9 %o1 [117])

(mem/s:QI (plus:SI (reg/v/f:SI 24 %i0 [109])
		   (const_int 8 [0x8])) 100)

(Table entry 17 is actually the same in both the stage1 and stage2
compilers, as is table entry 3.)

The mode of the expressions in table entry 17 is the same as the mode
of the source expression, so entry_and_rtx_equal_p() calls
rtx_equal_for_cselib_p() on the first expression in entry 17 and the
source expression.  Because entry 17's first expression is a REG,
cselib_lookup() is called to find the VALUE rtx which represents it.
rtx_equal_for_cselib_p() then compares each expression in this VALUE's
list of common expressions with the source expression; it does so by
calling itself recursively.  (We skip the first expression in the list,
since it is the original REG expression.)

So rtx_equal_for_cselib_p() now compares:

(zero_extend:SI (value:QI))

with the source expression:

(zero_extend:SI (reg:QI 9 %o1 [117]))

The codes match, so it calls itself recursively to compare the
subexpressions:

(value:QI)

and

(reg:QI 9 %o1 [117])

Because one of the expressions is a REG, cselib_lookup() is called to
find the VALUE rtx which represents it.  This is the VALUE for table
entry 3, which I mentioned above.  Since both VALUE rtx's are the same
VALUE, the comparison succeeds, and this call of
rtx_equal_for_cselib_p() returns true.  Therefore all the recursive
calls of rtx_equal_for_cselib_p() will return true, reflecting the
fact that the expressions:

(zero_extend:SI (reg:QI 9 %o1 [117]))

(reg:SI 10 %o2 [113])

are in fact equal.

entry_and_rtx_equal_p() also returns true, and the hash table lookup
returns entry 17.  cselib_lookup() returns that entry.

In the stage1 compiler, reload_cse_simplify_set() has now found a
register which is the same as the source expression, so it changes
insn 37 to use the register.

****************************************************************

What exactly went wrong here?  To me, it looks like the root cause of
the problem is that cselib_lookup() hashes the expression based on its
structure, not on its value.  Therefore when it checks its hash table,
it does not check the table entry which actually contains the value of
the expression, entry 17.  (The hash value for this entry when it was
created was 188, while the hash value for the expression being checked
is 183; therefore the expression being checked doesn't hash to the
right place!)

It was only through luck that the stage1 compiler found entry 17,
since the two preceding slots it checked both happened to be full.
And that only happened because the layout of memory, and thus the
values of pointers used for hash values, turned out to be favorable.

Since I'm not initimately familiar with this code, I'm not sure of the
best way to fix the problem.

One fix would be to make sure all the hashing is done on the contents
of expressions, rather than on pointer values, so that every instance
of the compiler, regardless of its memory state, would get the same
table arrangement.

A better fix would be to come up with a hashing scheme which somehow
uses expression value numbers rather than expression structure to do
the hashing.  I think this would solve the root of this problem, which
is that a cselib_lookup() isn't guaranteed to find all other
expressions which we can prove have the same value.

-- 
Colin Douglas Howell            E-mail:  chowell@redhat.com
Support Engineer
Red Hat, Inc.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]