This is the mail archive of the
mailing list for the GCC project.
RE: [RFC][mid-end] Support vectorization of complex numbers using machine instructions.
Thanks for all the help so far,
> > so I'd need 5 parameters and then I'm guessing the other expressions
> would be removed by DCE at some point?
> Are you planning to make the FCMLA behaviour directly available as an
> internal function or provide a higher-level one that does a full complex
> multiply, with the target lowering that into individual instructions where
I was planning on doing it as one internal function and leave it up to the target
to expand it however it needs to.
> Either way, each individual FCMLA should only need three scalar inputs.
> Like with FCADD, it doesn't matter whether the operands to the individual
> scalar FCMLAs are the ones (or the only ones) that determine the associated
> FCMLA scalar result. All the node needs to do is describe something that
> would work when vectorised.
Ah yes that makes sense. I see what you mean.
> What to do with the intermediate results you don't need is an interesting
> question :-). Like you say, I was hoping DCE would get rid of them later.
> Does that not work?
I haven't tried it yet 😊 But I assume it'll work too.
I have complex add almost working, it generates the right code for the vectorized
loop. The loads are also corrected and the permute is gone and I update all the data references
for the two statements I replaced.
However for the scalar tail loop I have a problem since I only have vector
versions of the instructions, and the scalar loop is created from the same SLP tree.
So I end up with the builtins in the tail loop with nothing to expand them to and with
no way to differentiate between the two calls to the internal fn.
I would need to somehow undo this for the scalar part..