[Bug rtl-optimization/78041] Wrong code on ARMv7 with -mthumb -mfpu=neon-fp16 -O0

wdijkstr at arm dot com gcc-bugzilla@gcc.gnu.org
Thu Oct 20 15:32:00 GMT 2016


https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78041

--- Comment #8 from Wilco <wdijkstr at arm dot com> ---
(In reply to Bernd Edlinger from comment #7)
> (In reply to Richard Earnshaw from comment #6)
> > (In reply to Bernd Edlinger from comment #5)
> > > (In reply to Wilco from comment #4)
> > > > However dealing with partial overlaps is complex so maybe the best option
> > > > would be to add alternatives to <shift>di3_neon to either allow full overlap
> > > > "r 0 X X X" or no overlap "&r r X  X X". The shift code works with full
> > > > overlap.
> > > 
> > > That sounds like a good idea.
> > > 
> > > Then this condition in <shift>di3_neon could go away too:
> > > 
> > >             && (!reg_overlap_mentioned_p (operands[0], operands[1])
> > >                 || REGNO (operands[0]) == REGNO (operands[1])))
> > 
> > Note that we don't want to restrict complete overlaps, only partial
> > overlaps.  Restricting complete overlaps leads to significant increase in
> > register pressure and a lot of redundant copying.
> 
> Yes.
> 
> That is Wilco's idea: instead of =r 0r X X X
> use =r 0 X X X and =&r r X X X, that should ensure that
> no partial overlap happens, just full overlap or nothing.
> 
> That's what arm_emit_coreregs_64bit_shift
> and arm_ashldi3_1bit can handle.
> 
> Who will do it?

I've got a patch that fixes it, it's being tested.

While looking at how DI mode operations get expanded, I noticed there is a CQ
issue with your shift change. Shifts that are expanded early now use extra
registers due to the DI mode write of zero. Given all other DI mode operations
are expanded after reload, it may be better to do the same for shifts too.


More information about the Gcc-bugs mailing list