This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [i386] scalar ops that preserve the high part of a vector

From: Marc Glisse <marc dot glisse at inria dot fr>
To: Uros Bizjak <ubizjak at gmail dot com>
Cc: gcc-patches at gcc dot gnu dot org
Date: Fri, 30 Nov 2012 13:34:49 +0100 (CET)
Subject: Re: [i386] scalar ops that preserve the high part of a vector
References: <alpine.DEB.2.02.1210131032460.9651@stedding.saclay.inria.fr> <CAFULd4YHdLF1ZyxrMG8MhRjo40f-EfAJZnDOEBc80pOGa4WNGQ@mail.gmail.com> <alpine.DEB.2.02.1210141057010.3752@laptop-mg.saclay.inria.fr>

On Sun, 14 Oct 2012, Marc Glisse wrote:

On Sun, 14 Oct 2012, Uros Bizjak wrote:
On Sat, Oct 13, 2012 at 10:52 AM, Marc Glisse <marc.glisse@inria.fr> wrote:
Hello,
this patch provides an alternate pattern to let combine recognize scalar
operations that preserve the high part of a vector. If the strategy is all
right, I could do the same for more operations (mul, div, ...). Something
similar is also possible for V4SF (different pattern though), but probably
not as useful.
But, we _do_ have vec_merge pattern that describes the operation.
Adding another one to each operation just to satisfy combine is IMO
not correct approach.
At some point I wondered about _replacing_ the existing pattern, so there would only be one ;-)

The vec_merge pattern takes as argument 2 vectors instead of a vector and a scalar, and describes the operation as a vector operation where we drop half of the result, instead of a scalar operation where we re-add the top half of the vector. I don't know if that's the most convenient choice. Adding code in simplify-rtx to replace vec_merge with vec_concat / vec_select might be easier than the other way around.
If the middle-end somehow gave us:
(plus X (vec_concat Y 0))
it would seem a bit strange to add an optimization that turns it into:
(vec_merge (plus X (subreg:V2DF Y)) X 1)
but then producing:
(vec_concat (plus (vec_select X 0) Y) (vec_select X 1))
would be strange as well.
(ignoring the signed zero issues here)
I'd rather see generic RTX simplification that
simplifies your proposed pattern to vec_merge pattern.
Ok, I'll see what I can do.

Also, as you mention in PR54855, Comment #5, the approach is too fragile...
I am not sure I can make the RTX simplification much less fragile... Whenever I see (vec_concat X (vec_select Y 1)), I would have to check whether X is some (possibly large) tree of scalar computations involving Y[0], move it all to vec_merge computations, and fix other users of some of those scalars to now use S[0]. Seems too hard, I would stop at single-operation X that is used only once. Besides, the gain is larger in proportion when there is a single operation :-)

Thank you for your comments,

Hello,

I experimented with the simplify-rtx transformation you suggested, see:
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54855

It works when the argument is a register, but not for memory (which is where the constant is in the testcase). And the description of the operation in sse.md does seem problematic. It says the second argument is:

(match_operand:VF_128 2 "nonimmediate_operand" "xm,xm"))

but Intel's documentation says "The source operand can be an XMM register or a 64-bit memory location", not quite the same.

Do you think the .md description should really stay this way, or could we change it to something that better reflects "64-bit memory location"?

--
Marc Glisse

Follow-Ups:
- Re: [i386] scalar ops that preserve the high part of a vector
  - From: Uros Bizjak

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]