85819 – conversion from __v[48]su to __v[48]sf should use FMA

Bug 85819 - conversion from __v[48]su to __v[48]sf should use FMA

Summary: conversion from __v[48]su to __v[48]sf should use FMA

Status:	RESOLVED FIXED

Alias:	None

Product:	gcc
Classification:	Unclassified
Component:	target (show other bugs)
Version:	9.0

Importance:	P3 enhancement
Target Milestone:	12.0
Assignee:	Not yet assigned to anyone

URL:
Keywords:	missed-optimization

Depends on:
Blocks:

Reported:	2018-05-17 14:00 UTC by Matthias Kretz (Vir)
Modified:	2021-09-07 13:42 UTC (History)
CC List:	3 users (show)

See Also:
Host:
Target:	x86_64--, i?86--
Build:
Known to work:
Known to fail:
Last reconfirmed:	2021-09-03 00:00:00

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Matthias Kretz (Vir) 2018-05-17 14:00:59 UTC

Testcase (cf. https://godbolt.org/g/UoU3zj):

using T = float;
using To [[gnu::vector_size(32)]] = T;
using From [[gnu::vector_size(32)]] = unsigned;

#define A2(I) (T)a[I], (T)a[1+I]
#define A4(I) A2(I), A2(2+I)
#define A8(I) A4(I), A4(4+I)

To f(From a) {
    return To{A8(0)};
}

This compiles to:
  vpand .LC0(%rip), %ymm0, %ymm1
  vpsrld $16, %ymm0, %ymm0
  vcvtdq2ps %ymm0, %ymm0
  vcvtdq2ps %ymm1, %ymm1
  vmulps .LC1(%rip), %ymm0, %ymm0
  vaddps %ymm0, %ymm1, %ymm0
  ret

The last vmulps and vaddps can be contracted to vfmadd132ps .LC1(%rip), %ymm1, %ymm0.

The same is true for vector_size(16).

Comment 1 Richard Biener 2018-05-18 08:24:44 UTC

Confirmed.

Comment 2 Andrew Pinski 2021-09-04 06:12:26 UTC

ix86_expand_convert_uns_sisf_sse and ix86_expand_vector_convert_uns_vsivsf should check if FMA exists and expand directly to them instead of doing MULT PLUS seperately.

Comment 3 GCC Commits 2021-09-07 12:36:21 UTC

The master branch has been updated by H.J. Lu <hjl@gcc.gnu.org>:

https://gcc.gnu.org/g:ad9fcb961c0705f56907a728c3748c011a0a8048

commit r12-3382-gad9fcb961c0705f56907a728c3748c011a0a8048
Author: H.J. Lu <hjl.tools@gmail.com>
Date:   Sat Sep 4 07:48:43 2021 -0700

    x86: Enable FMA in unsigned SI to SF expanders
    
    Enable FMA in scalar/vector unsigned SI to SF expanders.  Don't check
    TARGET_AVX512F which has vcvtusi2ss and vcvtudq2ps instructions.
    
    gcc/
    
            PR target/85819
            * config/i386/i386-expand.c (ix86_expand_convert_uns_sisf_sse):
            Enable FMA.
            (ix86_expand_vector_convert_uns_vsivsf): Likewise.
    
    gcc/testsuite/
    
            PR target/85819
            * gcc.target/i386/pr85819-1a.c: New test.
            * gcc.target/i386/pr85819-1b.c: Likewise.
            * gcc.target/i386/pr85819-2a.c: Likewise.
            * gcc.target/i386/pr85819-2b.c: Likewise.
            * gcc.target/i386/pr85819-2c.c: Likewise.
            * gcc.target/i386/pr85819-3.c: Likewise.

Comment 4 H.J. Lu 2021-09-07 13:42:53 UTC

Fixed for GCC 12.