This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] fix arm neon ICE by widening tree_type's precision field


On Mon, Jun 08, 2009 at 07:40:45PM +0000, Joseph S. Myers wrote:
> On Mon, 8 Jun 2009, Nathan Froyd wrote:
> 
> > ARM's NEON support in GCC includes some rather wide types: XImode, the
> > biggest, is a 64-byte integer type.  Compiling the simple testcase in
> > the patch below, which uses XImode, results in an ICE.
> 
> To be clear for those not already familiar with the NEON support: this is 
> not an integer type in the sense of being able to carry out any arithmetic 
> on it, it's a mode needed for the insn patterns for intrinsics for certain 
> NEON instructions that load or store 64 bytes in consecutive NEON 
> registers.

Which, by the way, I'd love to be rid of.  I think we'd get better
NEON code if we could tell the compiler what we were actually doing.
Last time I looked at this I ended up way over my head, though.

NEON doesn't have any obvious instructions for vector interleaving
in the order the vectorizer wants.  It's probably possible with the
existing instructions, but it would be inefficient and un-NEON-like;
instead, there are special load instructions VLD2, VLD3, and VLD4.
These support loading interleaved vectors from memory directly to
consecutive vector registers.  You can use VLD3 to load 24 bytes into
either three consecutive registers (e.g. d0, d1, d2) or into three
registers with one-register gaps to accomodate 16-byte vectors
(e.g. vld3 into d0, d2, d4 with post-increment; then vld3 into
d1, d3, d5).

In order to get good code out of these, I think we'd need to represent
early on that the single gimple operation set three different vectors
(SSA?  What SSA?)  Also we'd need to somehow do sensible register
allocation for these constraints.

Instead, we do not support these in the vectorizer; use unions for
the intrinsics (which do not get scalarized, so perhaps the new SRA
will help here), and fake it with these huge partial modes during RTL
expansion.  See the XImode patterns in neon.md for examples.

Any ideas? :-)

-- 
Daniel Jacobowitz
CodeSourcery


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]