This is the mail archive of the
mailing list for the GCC project.
Re: [i386] Scalar DImode instructions on XMM registers
- From: Jan Hubicka <hubicka at ucw dot cz>
- To: Ilya Enkovich <enkovich dot gnu at gmail dot com>
- Cc: GCC Development <gcc at gcc dot gnu dot org>, Uros Bizjak <ubizjak at gmail dot com>, rth at redhat dot com, vmakarov at redhat dot com
- Date: Sat, 25 Apr 2015 03:32:39 +0200
- Subject: Re: [i386] Scalar DImode instructions on XMM registers
- Authentication-results: sourceware.org; auth=none
- References: <CAMbmDYYT6zE86-xAYs08VV2nWDK6Np+qEYoj+6oGM276MtBuPQ at mail dot gmail dot com> <CAFULd4YVruAT=RHgENhBcuKZgE6FvRa=8aR6WygKm9F4GjnJyg at mail dot gmail dot com> <CAFULd4aycTg3bYKx7c9GXpgiY4WeqmLh1f5HFYL6K+K35QmTWA at mail dot gmail dot com> <CAMbmDYaDrCnDCnQfP0toV87pi_mE_pbPCP6M-FEkGNDAtWKFUA at mail dot gmail dot com> <CAFULd4amXWDT45oUNqi2cLL2Tec-kMJm7Kz301myZSWZw-3H7Q at mail dot gmail dot com> <alpine dot DEB dot 2 dot 11 dot 1504241222020 dot 1687 at laptop-mg dot saclay dot inria dot fr> <CAMbmDYYfq-RVYa0MwrGH_DpnV7psPHKZpxaouMuq_nsOPeO_ug at mail dot gmail dot com>
I am adding Vladimir and Richard into CC. I tried to solve similar problem
with FP math years ago by having -mfpmath=sse,i387. The idea was to allow
use of i387 registers when SSE ones run out and possibly also model the fact
that Pentium4 had faster i387 additions than SSE additions. I also had some
plans to extend this one mixed SSE/MMX/GPR integer arithmetics, but never
got to that.
This did not really fly becuase of the regalloc not really being able to
understnad it (I made path to regclass to propagate the classes and figure out
what operations needs to stay in i387 and what in SSE to avoid reloading, but
that never got in).
I believe Vladimir did some work on this with IRA (he is able to spill GPR
regs into SSE and do bit of other tricks).
Also I believe it was kind of Richard's design deicsion to avoid use of
(paradoxical) subregs for vector conversions because these have funny
The code for handling upper parts of paradoxical subregs is controlled by
macros around SUBREG_PROMOTED_VAR_P but I do not think it will handle
V1DI->V2DI conversions fluently without some middle-end hacking. (it will
probably try to produce zero extensions)
When we are on SSE instructions, it would be great to finally teach
copy_by_pieces/store_by_pieces to use vector instructions (these are more
compact and either equaly fast or faster on some CPUs). I hope to get into
this, but it would be great if someone beat me.
> 2015-04-24 13:27 GMT+03:00 Marc Glisse <email@example.com>:
> > On Fri, 24 Apr 2015, Uros Bizjak wrote:
> >> Please try to generate paradoxical subreg (V2DImode subreg of V1DImode
> >> pseudo). IIRC, there is some functionality in the compiler that is
> >> able to tell if the highpart of the paradoxical register is zeroed.
> > Those are not currently legal (I tried to change that)
> > https://gcc.gnu.org/ml/gcc-patches/2013-03/msg00745.html
> > https://gcc.gnu.org/ml/gcc-patches/2014-06/msg00769.html
> > In this case, a subreg:V2DI of DImode should work.
> > --
> > Marc Glisse
> Thank you for you tips! It seems to work, will try and see what it
> gives us for i386.