[i386] Replace builtins with vector extensions

Thu Oct 9 13:25:00 GMT 2014

On Thu, Oct 9, 2014 at 2:28 PM, Marc Glisse <marc.glisse@inria.fr> wrote:
> On Thu, 9 Oct 2014, Uros Bizjak wrote:
>
>> On Thu, Oct 9, 2014 at 12:33 PM, Marc Glisse <marc.glisse@inria.fr> wrote:
>>>
>>> Ping https://gcc.gnu.org/ml/gcc-patches/2014-07/msg01812.html
>>>
>>> (another part of the discussion is around
>>> https://gcc.gnu.org/ml/gcc-patches/2014-06/msg02288.html )
>>>
>>> Most people who commented seem cautiously in favor. The least favorable
>>> was
>>> Ulrich who suggested to go with it but keep the old behavior accessible
>>> if
>>> the user defines some macro (which imho would lose a large part of the
>>> simplification benefits of the patch)
>>> https://gcc.gnu.org/ml/gcc-patches/2014-06/msg02328.html
>>>
>>> If this is accepted, I will gladly prepare patches removing the unused
>>> builtins and extending this to a few more operations (integer vectors in
>>> particular). If this is not the direction we want to go, I'd like to hear
>>> it
>>> clearly so I can move on...
>>
>>
>> Well, I'm undecided.
>
>
> First, thanks for answering, it helps me a lot to know what others think.
>
>> The current approach is proven to work OK, there is no bugs reported
>> in this area and the performance is apparently OK. There should be
>> clear benefits in order to change something that "ain't broken", and
>> at least some proof that we won't regress in this area with the new
>> approach.
>
>
> There are quite a few enhancement PRs asking for more performance, but
> indeed no (or very few) complaints about correctness or about gcc turning
> their code into something worse than what they wrote, which I completely
> agree weighs more.
>
>> On the other hand, if the new approach opens new optimization
>> opportunities (without regression!), I'm in favor of it, including the
>> fact that new code won't produce equivalent assembly - as long as
>> functionality of the optimized asm stays the same (obviously, I'd
>> say).
>>
>> Please also note that this is quite big project. There are plenty of
>> intrinsics and I for one don't want another partial transition ...
>
>
> That might be an issue : this transition is partial by nature. Many
> intrinsics cannot (easily) be expressed in GIMPLE, and among those that can
> be represented, we only want to change those for which we are confident that
> we will not regress the quality of the code. From the reactions, I would
> assume that we want to be quite conservative at the beginning, and maybe we
> can reconsider some other intrinsics later.
>
> The best I can offer is consistency: if addition of v2df is changed,
> addition of v4df is changed as well (and say any +-*/ of float/double
> vectors of any supported size). Another block would be +-*/% for integer
> vectors. And construction / access (most construction is already
> builtin-free). And remove the unused builtins in the same patch that makes
> them unused. If you don't like those blocks, I can write one mega-patch that
> does all these, if we roughly agree on the list beforehand, so it goes in
> all at once.
>
> Would that be good enough?

OK, let's go in the proposed way, more detailed:

- we begin with +-*/ of float/double vectors. IMO, this would result
in a relatively small and easily reviewable patch to iron out the
details of the approach. Alternatively, we can begin with floats only.
- commit the patch and wait for the sky to fall down.
- we play a bit with the compiler to check generated code and corner
cases (some kind of Q/A) and wait if someone finds a problem (say, a
couple of weeks).
- if there are no problems, continue with integer builtins following
the established approach, otherwise we revert everything and go back
to the drawing board.
- repeat the procedure for other builtins.

I propose to wait a couple of days for possible comments before we get
the ball rolling.

Uros.