Bug 71395 - PowerPC vec_init of 4 SFmode values could be improved on Power8
Summary: PowerPC vec_init of 4 SFmode values could be improved on Power8
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: target (show other bugs)
Version: 7.0
: P3 enhancement
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords: missed-optimization
Depends on:
Blocks:
 
Reported: 2016-06-02 19:38 UTC by Michael Meissner
Modified: 2016-09-21 20:18 UTC (History)
3 users (show)

See Also:
Host:
Target: powerpc64le-*-*
Build:
Known to work:
Known to fail:
Last reconfirmed:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Michael Meissner 2016-06-02 19:38:20 UTC
The code for combining 4 SFmode values into a V4SFmode could be improved in GCC.

For example:

#include <altivec.h>

vector combine (float a, float b, float c, float d)
{
  return (vector float) { a, b, c, d };
}

Generates:

        .file   "foo.c"
        .section        ".text"
        .align 2
        .p2align 4,,15
        .globl merge
        .section        ".opd","aw"
        .align 3
merge:
        .quad   .L.merge,.TOC.@tocbase,0
        .previous
        .type   merge, @function
.L.merge:
        addis 9,2,.LC0@toc@ha
        xxpermdi 34,2,1,0
        xxpermdi 32,4,3,0
        addi 9,9,.LC0@toc@l
        xvcvdpsp 32,32
        xvcvdpsp 34,34
        lxvd2x 33,0,9
        xxpermdi 33,33,33,2
        vperm 2,0,2,1
        blr
        .long 0
        .byte 0,0,0,0,0,0,0,0
        .size   merge,.-.L.merge
        .section        .rodata.cst16,"aM",@progbits,16
        .align 4
.LC0:
        .byte   31
        .byte   30
        .byte   29
        .byte   28
        .byte   23
        .byte   22
        .byte   21
        .byte   20
        .byte   15
        .byte   14
        .byte   13
        .byte   12
        .byte   7
        .byte   6
        .byte   5
        .byte   4

If you build the 2 V2DF temporaries differently, you could use the VMRGEW and VMRGOW instructions to do the final combination instead of loading up a permute mask and doing a VPERM instruction.
Comment 1 Michael Meissner 2016-09-21 20:09:53 UTC
Fixed in subversion id 240272.
Comment 2 Michael Meissner 2016-09-21 20:18:04 UTC
Author: meissner
Date: Wed Sep 21 20:17:32 2016
New Revision: 240332

URL: https://gcc.gnu.org/viewcvs?rev=240332&root=gcc&view=rev
Log:
Add PR target/71395 marker to 71395 fix

Modified:
    trunk/gcc/ChangeLog