Hashing algorithm for cselib causes bootstrap differences
Richard Earnshaw
rearnsha@arm.com
Thu Nov 23 13:30:00 GMT 2000
This appears to be a latent bug that has been in there since the cselib
hashing algorithm was introduced in March. It causes a bootstrap compare
difference for arm-netbsd. Other than this, I think the problem is benign.
The sequence of rtl we have is:
(insn 147 145 139 (set (reg:SI 10 sl)
(symbol_ref:SI ("keep_next_if_subblocks")))
(expr_list:REG_EQUIV (symbol_ref:SI ("keep_next_if_subblocks"))
(nil)))
...
(insn 143 139 86 (set (reg:SI 8 r8)
(const_int 0 [0x0]))
(expr_list:REG_EQUIV (const_int 0 [0x0])
(nil)))
...
(insn 89 90 184 (set (reg:QI 5 r5)
(const_int 0 [0x0]))
(nil))
Note that insn 143 sets r8 as an SImode register, while insn 89 sets it as
a QImode register. Now the hashing algorithm used for this will often
produce hash values that differ by exactly 2, since the only difference in
the hash calculations is the integer replacement for the mode that is
included.
Now, the key to the problem is when a hash table conflict occurs while
trying to look up equivalents for insn 89. When one does, we increment
the hash bucket number by hash2, but in this case, hash2 is 2, so we hit
the bucket for the previous set. Since constants have no mode,
rtx_equal_for_cselib_p declares the two entries to be equivalent.
Why does this cause a bootstrap comparison failure? Well, the hashing
algorithm for a symbol_ref is to use the pointer to the symbol. But the
address for this differs between the stage1 compiler and the stage2
compiler. It just so happens that during stage2 the hash bucket assigned
is exactly the same as for the const 0 in insn 89.
I think it is a really bad idea that we use a pointer for the hash entry
of a symbol_ref, since it means that only small changes elsewhere in the
compiler (for example, even turning on or off some other independent
feature) can change the output in an unexpected way. A simple fix would
perhaps be to make the hash of a symbol_ref be the length of the string
plus the value of the first char, but this might cause a few conflicts
elsewhere. I guess there should be better algorithms than this.
There is also the issue of whether rtx_equal_for_cselib_p should have
allowed these two chains to match. Since we include the mode when hashing
a constant: surely we should include the mode when comparing them for a
match?
Comments?
Richard.
More information about the Gcc-bugs
mailing list