[Bug target/33755] Gcc 4.2.2 broken for mips linux kernel builds

Sun Oct 14 09:41:00 GMT 2007

------- Comment #11 from rsandifo at nildram dot co dot uk  2007-10-14 09:41 -------
Subject: Re:  Gcc 4.2.2 broken for mips linux kernel builds

"ddaney at avtrex dot com" <gcc-bugzilla@gcc.gnu.org> writes:
> rsandifo at gcc dot gnu dot org wrote:
>> ------- Comment #9 from rsandifo at gcc dot gnu dot org  2007-10-13 10:47 -------
>> The problem comes from dbr_schedule, although it's not really a bug there.
>> We have:
>>
>>         bne     $5,$0,L1        # A
>>         ...stuff...
>> L1:
>>         bne     $5,$0,L2        # B
>>         ...printk call...
>> L2:
>>
>> and nothing before dbr_schedule has managed to thread A to L2.
>> dbr_schedule first fills B's delay slot with an lui from the printk
>> block, then steal_delay_list_from_target realises that A can steal B's
>> delay slot and branch directly to L2.  There is no other path to L1,
>> so the rest of the printk call is now dead.
>>
>> For most targets, this is at worst a missed optimisation; we should have
>> threaded A to L2 much earlier than dbr_schedule, and deleted the whole
>> printk block as dead.  I don't think the MIPS port can rely on that
>> happening for correctness.  So (alas!) I think the upshot is simply
>> that we need to add some special code to mips_reorg to delete high-part
>> relocations that have no matching lows.
>>
>> I'll have a poke.
>>
>>   
> That makes sense, however it is a bit strange because... IIRC when I 
> compiled the .i file with a fairly recent 4.3 build, the printk  in 
> question was not optimized away.  So if 4.2.2 can validly optimize away 
> the printk, then we have an optimization regression in 4.3

Well, in one sense, yeah.  But it's not a regression that indicates
a new bug.  The point is that dbr_schedule isn't how we're meant to
optimise this case anyway; the optimisation in dbr_schedule is really
there for other situations.  Both 4.2 and 4.3 are missing optimisations
further up the chain, and 4.3 isn't lacking them more than 4.2.

Both versions expand the switch statement to the following rtl:

   tmp = val & 1
   if (tmp == 0) goto ...
   tmp2 = 1
   if (tmp == tmp2) goto ...
   ...printk() code...

4.2 keeps the block in essentially this form up until combine,
which use nonzero_bits checks to optimise the second branch into
"if (tmp != 0) goto ...".  4.3's loop-invariant motion can hoist
the setting of tmp2, thus stopping combine from doing the optimisation.
But what 4.3 is doing is perfectly reasonable if we need to keep the
branch, and if we don't need to keep the branch, we should have
optimised it away before lim.

We probably need some "nonzero bits" optimisations in tree vrp, if we
don't already.  (And if we don't already, I doubt I'll have been the
first person to say that.)

Richard

-- 

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33755