[PATCH, i386] V4DF __builtin_shuffle

Marc Glisse marc.glisse@inria.fr
Mon Apr 30 15:06:00 GMT 2012


Ping?

http://gcc.gnu.org/ml/gcc-patches/2012-04/msg01034.html

Since then, I've run a c,c++ bootstrap and:
make -k check RUNTESTFLAGS="--target_board=my-sde-sim"
where my-sde-sim is the dejagnu board posted by H.J. Lu to run tests 
inside Intel's simulator, no difference between before and after my patch.
(If I understand correctly, the testsuite always compiles the AVX and AVX2 
tests, and uses cpuid (which I expect the simulator must fake) to 
determine if it should run them, so I don't need to pass any extra flag 
in RUNTESTFLAGS. If I am wrong, please tell me.)

Adding in Cc: the 2 people who kindly commented on the other shuffle patch 
(the one that isn't finished).

On Tue, 17 Apr 2012, Marc Glisse wrote:

> Hello,
>
> this patch expands __builtin_shuffle for V4DF mode in at most 3 insn. It is 
> simple and works really well, often generates only 2 insn. It is not very 
> generic, because other modes don't have an instruction equivalent to vshufpd. 
> For V8SF (and likely V4DI and V8SI with AVX2, but I still need to do that), 
> my patch "default case" in PR 52607 seems more interesting.
>
> I tried calling this new function after expand_vec_perm_vperm2f128_vblend 
> (instead of before as in the patch), but it generated more instructions for 
> some permutations, and never less. That function is still useful for V8SF 
> though.
>
> I bootstrapped gcc on a non-avx platform, compiled a program that tests all 
> 4096 shuffles with -mavx/-mavx2, and ran the result using Intel's emulator 
> (SDE).
>
> There are still a few V4DF permutations that don't generate an optimal 
> sequence (3 insn instead of 2), but not that many I think. Of course, I am 
> assuming a constant cost of 1 per insn, which is completely false, but seems 
> like a sensible first approximation.
>
> (note that I can't commit)
>
>
> 2012-04-17  Marc Glisse  <marc.glisse@inria.fr>
>
> 	PR target/502607
> 	* config/i386/i386.c (ix86_expand_vec_perm_const): Move code to ...
> 	(canonicalize_perm): ... new function.
> 	(expand_vec_perm_2vperm2f128_vshuf): New function.
> 	(ix86_expand_vec_perm_const_1): Call it.

-- 
Marc Glisse



More information about the Gcc-patches mailing list