This is the mail archive of the
mailing list for the GCC project.
Re: [i386] Scalar DImode instructions on XMM registers
- From: Richard Henderson <rth at redhat dot com>
- To: Jan Hubicka <hubicka at ucw dot cz>, Ilya Enkovich <enkovich dot gnu at gmail dot com>
- Cc: GCC Development <gcc at gcc dot gnu dot org>, Uros Bizjak <ubizjak at gmail dot com>, vmakarov at redhat dot com
- Date: Thu, 07 May 2015 09:24:08 -0700
- Subject: Re: [i386] Scalar DImode instructions on XMM registers
- Authentication-results: sourceware.org; auth=none
- References: <CAMbmDYYT6zE86-xAYs08VV2nWDK6Np+qEYoj+6oGM276MtBuPQ at mail dot gmail dot com> <CAFULd4YVruAT=RHgENhBcuKZgE6FvRa=8aR6WygKm9F4GjnJyg at mail dot gmail dot com> <CAFULd4aycTg3bYKx7c9GXpgiY4WeqmLh1f5HFYL6K+K35QmTWA at mail dot gmail dot com> <CAMbmDYaDrCnDCnQfP0toV87pi_mE_pbPCP6M-FEkGNDAtWKFUA at mail dot gmail dot com> <CAFULd4amXWDT45oUNqi2cLL2Tec-kMJm7Kz301myZSWZw-3H7Q at mail dot gmail dot com> <alpine dot DEB dot 2 dot 11 dot 1504241222020 dot 1687 at laptop-mg dot saclay dot inria dot fr> <CAMbmDYYfq-RVYa0MwrGH_DpnV7psPHKZpxaouMuq_nsOPeO_ug at mail dot gmail dot com> <20150425013239 dot GB719 at atrey dot karlin dot mff dot cuni dot cz>
On 04/24/2015 06:32 PM, Jan Hubicka wrote:
> Also I believe it was kind of Richard's design deicsion to avoid use of
> (paradoxical) subregs for vector conversions because these have funny
> The code for handling upper parts of paradoxical subregs is controlled by
> macros around SUBREG_PROMOTED_VAR_P but I do not think it will handle
> V1DI->V2DI conversions fluently without some middle-end hacking. (it will
> probably try to produce zero extensions)
> When we are on SSE instructions, it would be great to finally teach
> copy_by_pieces/store_by_pieces to use vector instructions (these are more
> compact and either equaly fast or faster on some CPUs). I hope to get into
> this, but it would be great if someone beat me.
Well, I think it would be worthwhile to teach the i386 backend how to do 64-bit
vectors in SSE registers. First, this would aid portability with other targets
who may have GCC generic vectors written only for 8 byte quantities. Since we
do have zero-extending 8 byte load/store insns for SSE, we don't actually need
paradoxical regs, just additional macro-ization of the existing patterns.
This almost certainly would conflict with the MMX code generation. But given
the problems we've always had with that, perhaps it's time to kill that off.
To a large extent we can preserve source compatibility with MMX builtins once
we have 8-byte vectors implemented in SSE.
As for the subject, we'd want to delay expansion of DImode arithmetic until
after RA. That bypasses all of the good work done in lower-subreg.c, so we
need some sort of replacement.
I was wondering this morning about the possibility of a kind of constraint that
would allow RA to generate pairs of registers via CONCAT. That is, the two
hard registers within the CONCAT are collectively the double-word allocation,
but need not be sequential like current multi-word allocations. A target using
such a constraint is promising to handle the CONCAT either by splitting (and
gen_lowpart et al), or print_operand letters (e.g. the m68k %R, for outputting
the low part of a pair).
With that, we get the best of both -- lower-subreg effectively happening in RA,
and DImode arithmetic in SSE no subregs required.