Hashing algorithm for cselib causes bootstrap differences

Thu Nov 23 13:30:00 GMT 2000

This appears to be a latent bug that has been in there since the cselib 
hashing algorithm was introduced in March.  It causes a bootstrap compare 
difference for arm-netbsd.  Other than this, I think the problem is benign.

The sequence of rtl we have is:

(insn 147 145 139 (set (reg:SI 10 sl)
        (symbol_ref:SI ("keep_next_if_subblocks"))) 
    (expr_list:REG_EQUIV (symbol_ref:SI ("keep_next_if_subblocks"))
        (nil)))

...

(insn 143 139 86 (set (reg:SI 8 r8)
        (const_int 0 [0x0]))
    (expr_list:REG_EQUIV (const_int 0 [0x0])
        (nil)))

...

(insn 89 90 184 (set (reg:QI 5 r5)
        (const_int 0 [0x0]))
    (nil))

Note that insn 143 sets r8 as an SImode register, while insn 89 sets it as 
a QImode register.  Now the hashing algorithm used for this will often 
produce hash values that differ by exactly 2, since the only difference in 
the hash calculations is the integer replacement for the mode that is 
included.

Now, the key to the problem is when a hash table conflict occurs while 
trying to look up equivalents for insn 89.  When one does, we increment 
the hash bucket number by hash2, but in this case, hash2 is 2, so we hit 
the bucket for the previous set.  Since constants have no mode, 
rtx_equal_for_cselib_p declares the two entries to be equivalent.

Why does this cause a bootstrap comparison failure?  Well, the hashing 
algorithm for a symbol_ref is to use the pointer to the symbol.  But the 
address for this differs between the stage1 compiler and the stage2 
compiler.  It just so happens that during stage2 the hash bucket assigned 
is exactly the same as for the const 0 in insn 89.

I think it is a really bad idea that we use a pointer for the hash entry 
of a symbol_ref, since it means that only small changes elsewhere in the 
compiler (for example, even turning on or off some other independent 
feature) can change the output in an unexpected way.  A simple fix would 
perhaps be to make the hash of a symbol_ref be the length of the string 
plus the value of the first char, but this might cause a few conflicts 
elsewhere.  I guess there should be better algorithms than this.

There is also the issue of whether rtx_equal_for_cselib_p should have 
allowed these two chains to match.  Since we include the mode when hashing 
a constant: surely we should include the mode when comparing them for a 
match?

Comments?

Richard.