This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug rtl-optimization/67609] [5/6 Regression] Generates wrong code for SSE2 _mm_load_pd
- From: "vmakarov at gcc dot gnu.org" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: Fri, 16 Oct 2015 12:44:13 +0000
- Subject: [Bug rtl-optimization/67609] [5/6 Regression] Generates wrong code for SSE2 _mm_load_pd
- Auto-submitted: auto-generated
- References: <bug-67609-4 at http dot gcc dot gnu dot org/bugzilla/>
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67609
--- Comment #8 from Vladimir Makarov <vmakarov at gcc dot gnu.org> ---
(In reply to UroÅ Bizjak from comment #7)
> (In reply to Richard Biener from comment #4)
> > (In reply to UroÅ Bizjak from comment #3)
> > > The doc says:
> > >
> > > When used as an lvalue, 'subreg' is a word-based accessor.
> > > Storing to a 'subreg' modifies all the words of REG that
> > > overlap the 'subreg', but it leaves the other words of REG
> > > alone.
> >
> > But UNITS_PER_WORD is 8 so (subreg:DF (TI)) should leave the upper half
> > of the TImode register unchanged.
>
> Indeed, and -m32 creates correct code. So, it is register allocator that
> fails.
>
> Reconfirmed as rtl-optimization problem.
It is a quite interesting PR which reveals a long lasting latent bug in GCC.
Basically we have before LRA
2: r90:DF=xmm0:DF
REG_DEAD xmm0:DF
3: NOTE_INSN_FUNCTION_BEG
6: r89:TI=[`reg']
7: r89:TI#0=r90:DF
REG_DEAD r90:DF
8: [`reg']=r89:TI#0
LRA and reload pass produces
6: xmm1:TI=[`reg']
7: xmm1:DF=xmm0:DF
8: [`reg']=xmm1:V2DF
They does not do any transformations except transforming subreg of hard
register in insn #7. And after that insn #6 is removed as a dead one by
subsequent optimizations. In order to avoid removing insn #6 we need to keep
the subreg until the final pass:
7: xmm1:TI#0=xmm0:DF
Why do LRA and reload remove subregs of hard registers? That is because some
subsequent optimizations can handle them.
Last two days I've been struggling to find solution which involves only LRA
(partial removing subreg of hard regs) but still failing.
In any case, even if I find such solution in LRA, it needs extensive testing on
other targets and probably it will be ready next week at the best.