This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Also fold bmi/bmi2/tbm bextr/bextri/bzhi/pext/pdep builtins


On Sat, 22 Oct 2016, Jakub Jelinek wrote:

On Sat, Oct 22, 2016 at 01:46:30PM +0200, Uros Bizjak wrote:
On Fri, Oct 21, 2016 at 5:37 PM, Uros Bizjak <ubizjak@gmail.com> wrote:
On Fri, Oct 21, 2016 at 5:26 PM, Jakub Jelinek <jakub@redhat.com> wrote:

This patch on top of the just posted patch adds folding for a couple more
builtins (though, hundreds or thousands of other md builtins remain unfolded
even though they actually could be folded for e.g. const arguments).

Just a few words regarding other unfolded builtins. x86 intrinsics
(and consequently builtins) are considered as a convenient way to emit
assembly instructions. So, the same rules as when writting assembly,
although slightly relaxed, should apply there. IMO, compiler
optimizations with intrinsics should be an exception, not the rule. As
an example, __builtin_ctz, __builtin_clz and functionaly similar
target-builtins are rather messy w.r.t to "undefinedness", so I think
this fact warrants some help from the compiler. But there is no need
to handle every single builtin - only a competent person that knows
the background of these intrinsics should use them.

Generally constant folding what we can is a good thing, usually people will
not use the intrinsics when they are passing constants directly, but
constants could appear there through inlining and other optimizations.
If we do constant fold the x86 intrinsics, we allow further constant folding
and optimizations down the road.

+1

For various x86 intrinsics we do some constant folding, but only late
(during RTL optimizations), and only if the insn patterns don't contain
UNSPECs.

Besides the BMI/BMI2/TBM/LZCNT intrinsics that are already folded or I've
posted patch for, intrinsics that IMHO would be nice to be folded are e.g.
__builtin_ia32_bsr*, __builtin_ia32_ro[rl]*, maybe
__builtin_ia32_{,r}sqrtps*, __builtin_ia32_rcpps, etc.
For __builtin_ia32_addps and the like the question is why we have those
builtins at all, it would be better to just use normal vector arithmetics.

Note that we do use operator+ directly in *intrin.h. We only keep the builtin __builtin_ia32_addps because ada maintainers asked us to. We could lower them to normal vector arithmetics early in gimple, but it doesn't seem worth touching them since they are legacy.

__builtin_ia32_cmp*p[sd], __builtin_ia32_{min,max}[ps][sd] etc. are also
nicely constant foldable, etc.

I think _mm_cmpeq_pd could use the vector extensions instead of __builtin_ia32_cmpeqpd if they were ported from C++ to C, same for a few more. Some others which don't have such a close match in the vector extensions could still be lowered (in gimple) to vector operations, which would allow constant folding as well as other optimizations.

--
Marc Glisse


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]