The code for combining 4 SFmode values into a V4SFmode could be improved in GCC. For example: #include <altivec.h> vector combine (float a, float b, float c, float d) { return (vector float) { a, b, c, d }; } Generates: .file "foo.c" .section ".text" .align 2 .p2align 4,,15 .globl merge .section ".opd","aw" .align 3 merge: .quad .L.merge,.TOC.@tocbase,0 .previous .type merge, @function .L.merge: addis 9,2,.LC0@toc@ha xxpermdi 34,2,1,0 xxpermdi 32,4,3,0 addi 9,9,.LC0@toc@l xvcvdpsp 32,32 xvcvdpsp 34,34 lxvd2x 33,0,9 xxpermdi 33,33,33,2 vperm 2,0,2,1 blr .long 0 .byte 0,0,0,0,0,0,0,0 .size merge,.-.L.merge .section .rodata.cst16,"aM",@progbits,16 .align 4 .LC0: .byte 31 .byte 30 .byte 29 .byte 28 .byte 23 .byte 22 .byte 21 .byte 20 .byte 15 .byte 14 .byte 13 .byte 12 .byte 7 .byte 6 .byte 5 .byte 4 If you build the 2 V2DF temporaries differently, you could use the VMRGEW and VMRGOW instructions to do the final combination instead of loading up a permute mask and doing a VPERM instruction.
Fixed in subversion id 240272.
Author: meissner Date: Wed Sep 21 20:17:32 2016 New Revision: 240332 URL: https://gcc.gnu.org/viewcvs?rev=240332&root=gcc&view=rev Log: Add PR target/71395 marker to 71395 fix Modified: trunk/gcc/ChangeLog