[PATCH] Also fold bmi/bmi2/tbm bextr/bextri/bzhi/pext/pdep builtins

Marc Glisse marc.glisse@inria.fr
Sat Oct 22 17:44:00 GMT 2016


On Sat, 22 Oct 2016, Jakub Jelinek wrote:

> On Sat, Oct 22, 2016 at 01:46:30PM +0200, Uros Bizjak wrote:
>> On Fri, Oct 21, 2016 at 5:37 PM, Uros Bizjak <ubizjak@gmail.com> wrote:
>>> On Fri, Oct 21, 2016 at 5:26 PM, Jakub Jelinek <jakub@redhat.com> wrote:
>>>
>>>> This patch on top of the just posted patch adds folding for a couple more
>>>> builtins (though, hundreds or thousands of other md builtins remain unfolded
>>>> even though they actually could be folded for e.g. const arguments).
>>
>> Just a few words regarding other unfolded builtins. x86 intrinsics
>> (and consequently builtins) are considered as a convenient way to emit
>> assembly instructions. So, the same rules as when writting assembly,
>> although slightly relaxed, should apply there. IMO, compiler
>> optimizations with intrinsics should be an exception, not the rule. As
>> an example, __builtin_ctz, __builtin_clz and functionaly similar
>> target-builtins are rather messy w.r.t to "undefinedness", so I think
>> this fact warrants some help from the compiler. But there is no need
>> to handle every single builtin - only a competent person that knows
>> the background of these intrinsics should use them.
>
> Generally constant folding what we can is a good thing, usually people will
> not use the intrinsics when they are passing constants directly, but
> constants could appear there through inlining and other optimizations.
> If we do constant fold the x86 intrinsics, we allow further constant folding
> and optimizations down the road.

+1

> For various x86 intrinsics we do some constant folding, but only late
> (during RTL optimizations), and only if the insn patterns don't contain
> UNSPECs.
>
> Besides the BMI/BMI2/TBM/LZCNT intrinsics that are already folded or I've
> posted patch for, intrinsics that IMHO would be nice to be folded are e.g.
> __builtin_ia32_bsr*, __builtin_ia32_ro[rl]*, maybe
> __builtin_ia32_{,r}sqrtps*, __builtin_ia32_rcpps, etc.
> For __builtin_ia32_addps and the like the question is why we have those
> builtins at all, it would be better to just use normal vector arithmetics.

Note that we do use operator+ directly in *intrin.h. We only keep the 
builtin __builtin_ia32_addps because ada maintainers asked us to. We could 
lower them to normal vector arithmetics early in gimple, but it doesn't 
seem worth touching them since they are legacy.

> __builtin_ia32_cmp*p[sd], __builtin_ia32_{min,max}[ps][sd] etc. are also
> nicely constant foldable, etc.

I think _mm_cmpeq_pd could use the vector extensions instead of 
__builtin_ia32_cmpeqpd if they were ported from C++ to C, same for a few 
more. Some others which don't have such a close match in the vector 
extensions could still be lowered (in gimple) to vector operations, which 
would allow constant folding as well as other optimizations.

-- 
Marc Glisse



More information about the Gcc-patches mailing list