Bug 85819 - conversion from __v[48]su to __v[48]sf should use FMA
Summary: conversion from __v[48]su to __v[48]sf should use FMA
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: target (show other bugs)
Version: 9.0
: P3 enhancement
Target Milestone: 12.0
Assignee: Not yet assigned to anyone
URL:
Keywords: missed-optimization
Depends on:
Blocks:
 
Reported: 2018-05-17 14:00 UTC by Matthias Kretz (Vir)
Modified: 2021-09-07 13:42 UTC (History)
3 users (show)

See Also:
Host:
Target: x86_64-*-*, i?86-*-*
Build:
Known to work:
Known to fail:
Last reconfirmed: 2021-09-03 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Matthias Kretz (Vir) 2018-05-17 14:00:59 UTC
Testcase (cf. https://godbolt.org/g/UoU3zj):

using T = float;
using To [[gnu::vector_size(32)]] = T;
using From [[gnu::vector_size(32)]] = unsigned;

#define A2(I) (T)a[I], (T)a[1+I]
#define A4(I) A2(I), A2(2+I)
#define A8(I) A4(I), A4(4+I)

To f(From a) {
    return To{A8(0)};
}

This compiles to:
  vpand .LC0(%rip), %ymm0, %ymm1
  vpsrld $16, %ymm0, %ymm0
  vcvtdq2ps %ymm0, %ymm0
  vcvtdq2ps %ymm1, %ymm1
  vmulps .LC1(%rip), %ymm0, %ymm0
  vaddps %ymm0, %ymm1, %ymm0
  ret

The last vmulps and vaddps can be contracted to vfmadd132ps .LC1(%rip), %ymm1, %ymm0.

The same is true for vector_size(16).
Comment 1 Richard Biener 2018-05-18 08:24:44 UTC
Confirmed.
Comment 2 Andrew Pinski 2021-09-04 06:12:26 UTC
ix86_expand_convert_uns_sisf_sse and ix86_expand_vector_convert_uns_vsivsf should check if FMA exists and expand directly to them instead of doing MULT PLUS seperately.
Comment 3 GCC Commits 2021-09-07 12:36:21 UTC
The master branch has been updated by H.J. Lu <hjl@gcc.gnu.org>:

https://gcc.gnu.org/g:ad9fcb961c0705f56907a728c3748c011a0a8048

commit r12-3382-gad9fcb961c0705f56907a728c3748c011a0a8048
Author: H.J. Lu <hjl.tools@gmail.com>
Date:   Sat Sep 4 07:48:43 2021 -0700

    x86: Enable FMA in unsigned SI to SF expanders
    
    Enable FMA in scalar/vector unsigned SI to SF expanders.  Don't check
    TARGET_AVX512F which has vcvtusi2ss and vcvtudq2ps instructions.
    
    gcc/
    
            PR target/85819
            * config/i386/i386-expand.c (ix86_expand_convert_uns_sisf_sse):
            Enable FMA.
            (ix86_expand_vector_convert_uns_vsivsf): Likewise.
    
    gcc/testsuite/
    
            PR target/85819
            * gcc.target/i386/pr85819-1a.c: New test.
            * gcc.target/i386/pr85819-1b.c: Likewise.
            * gcc.target/i386/pr85819-2a.c: Likewise.
            * gcc.target/i386/pr85819-2b.c: Likewise.
            * gcc.target/i386/pr85819-2c.c: Likewise.
            * gcc.target/i386/pr85819-3.c: Likewise.
Comment 4 H.J. Lu 2021-09-07 13:42:53 UTC
Fixed for GCC 12.