[PATCH] Also fold bmi/bmi2/tbm bextr/bextri/bzhi/pext/pdep builtins

Marc Glisse marc.glisse@inria.fr
Sat Oct 22 17:44:00 GMT 2016

On Sat, 22 Oct 2016, Jakub Jelinek wrote:

> On Sat, Oct 22, 2016 at 01:46:30PM +0200, Uros Bizjak wrote:
>> On Fri, Oct 21, 2016 at 5:37 PM, Uros Bizjak <ubizjak@gmail.com> wrote:
>>> On Fri, Oct 21, 2016 at 5:26 PM, Jakub Jelinek <jakub@redhat.com> wrote:
>>>> This patch on top of the just posted patch adds folding for a couple more
>>>> builtins (though, hundreds or thousands of other md builtins remain unfolded
>>>> even though they actually could be folded for e.g. const arguments).
>> Just a few words regarding other unfolded builtins. x86 intrinsics
>> (and consequently builtins) are considered as a convenient way to emit
>> assembly instructions. So, the same rules as when writting assembly,
>> although slightly relaxed, should apply there. IMO, compiler
>> optimizations with intrinsics should be an exception, not the rule. As
>> an example, __builtin_ctz, __builtin_clz and functionaly similar
>> target-builtins are rather messy w.r.t to "undefinedness", so I think
>> this fact warrants some help from the compiler. But there is no need
>> to handle every single builtin - only a competent person that knows
>> the background of these intrinsics should use them.
> Generally constant folding what we can is a good thing, usually people will
> not use the intrinsics when they are passing constants directly, but
> constants could appear there through inlining and other optimizations.
> If we do constant fold the x86 intrinsics, we allow further constant folding
> and optimizations down the road.


> For various x86 intrinsics we do some constant folding, but only late
> (during RTL optimizations), and only if the insn patterns don't contain
> Besides the BMI/BMI2/TBM/LZCNT intrinsics that are already folded or I've
> posted patch for, intrinsics that IMHO would be nice to be folded are e.g.
> __builtin_ia32_bsr*, __builtin_ia32_ro[rl]*, maybe
> __builtin_ia32_{,r}sqrtps*, __builtin_ia32_rcpps, etc.
> For __builtin_ia32_addps and the like the question is why we have those
> builtins at all, it would be better to just use normal vector arithmetics.

Note that we do use operator+ directly in *intrin.h. We only keep the 
builtin __builtin_ia32_addps because ada maintainers asked us to. We could 
lower them to normal vector arithmetics early in gimple, but it doesn't 
seem worth touching them since they are legacy.

> __builtin_ia32_cmp*p[sd], __builtin_ia32_{min,max}[ps][sd] etc. are also
> nicely constant foldable, etc.

I think _mm_cmpeq_pd could use the vector extensions instead of 
__builtin_ia32_cmpeqpd if they were ported from C++ to C, same for a few 
more. Some others which don't have such a close match in the vector 
extensions could still be lowered (in gimple) to vector operations, which 
would allow constant folding as well as other optimizations.

Marc Glisse

More information about the Gcc-patches mailing list