This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: How is generic SIMD support supposed to work?
- From: Jan Hubicka <jh at suse dot cz>
- To: Richard Henderson <rth at redhat dot com>, Jan Hubicka <jh at suse dot cz>,gcc at gcc dot gnu dot org, shebs at apple dot com, aldyh at redhat dot com, aj at suse dot de,rakdver at atrey dot karlin dot mff dot cuni dot cz
- Date: Mon, 9 Dec 2002 01:47:41 +0100
- Subject: Re: How is generic SIMD support supposed to work?
- References: <20021020124628.GD25286@kam.mff.cuni.cz> <20021020184219.GB6120@redhat.com>
> On Sun, Oct 20, 2002 at 02:46:28PM +0200, Jan Hubicka wrote:
> > __v4si val = {1,2,3,4};
> > return val;
> > }
> > and I hope this to be compiled into static initializer loaded at once.
> > Unforutnately this does not happen, but even worse compiler dies:
>
> What happens on other architectures is that this value gets
> loaded into 4 integer registers, and the subregging works as
> expected. That's going to be prohibitive on x86, so we either
> need to arrange for such pseudos to get allocated to the stack
> (via CLASS_CANNOT_CHANGE_MODE_P), and/or come up with another
> mechanism (via named patterns, I assume) for the code generator
> to ask to read or set a vector element.
HI,
I've hit this problem in different context. We currently misscompile
gcc on P4 and K8 since we generate:
(insn:HI 178 177 179 3 0x2a95ce16c0 (set (subreg:DF (reg/v:DI 94) 0)
(reg/v:DF 58)) 93 {*movdf_nointeger} (nil)
(expr_list:REG_DEAD (reg/v:DF 58)
(nil)))
(insn:HI 179 178 180 3 0x2a95ce16c0 (set (mem/f:SI (plus:SI (reg/f:SI 7 esp)
(const_int 16 [0x10])) [0 S4 A32])
(subreg:SI (reg/v:DI 94) 0)) 44 {*movsi_1} (insn_list 178 (nil))
(nil))
(insn:HI 180 179 181 3 0x2a95ce16c0 (set (mem/f:SI (plus:SI (reg/f:SI 7 esp)
(const_int 12 [0xc])) [0 S4 A32])
(subreg:SI (reg/v:DI 94) 4)) 44 {*movsi_1} (nil)
(expr_list:REG_DEAD (reg/v:DI 94)
(nil)))
Register allocator decides to put 94 into XMM that is kind of sane
decision. The sequence than comes out as two movqs with xmm0 as
operand.
I tried to use CLASS_CANNOT_CHANGE_MODE_P like this:
#define CANNOT_CHANGE_MODE_CLASS(FROM, TO) \
((FROM) != (TO) ? SSE_REGS : NO_REGS)
But this makes us to refuse subregs like (subreg:V2DF (reg:DF) 0)
that is valid and we use it to represent some of scalar fp operations in
later optimization passes.
problem is that I don't see how to distinquish it from
(subreg:V2DF (reg:DF) 8) that is ivalid.
What would you think about adding an SUBREG_BYTE arugment into the macro
and updating everything? This is the only way out I see and I think
this is important regression we should fix in 3.3
Honza
>
>
> r~