108804 – missed vectorization in presence of conversion from uint64_t to float

Bug 108804 - missed vectorization in presence of conversion from uint64_t to float

Summary: missed vectorization in presence of conversion from uint64_t to float

Status:	NEW

Alias:	None

Product:	gcc
Classification:	Unclassified
Component:	target (show other bugs)
Version:	12.2.1

Importance:	P3 enhancement
Target Milestone:	---
Assignee:	Not yet assigned to anyone

URL:
Keywords:	missed-optimization

Depends on:
Blocks:	vectorizer
	Show dependency tree / graph

Reported:	2023-02-15 14:16 UTC by vincenzo Innocente
Modified:	2023-05-30 23:18 UTC (History)
CC List:	2 users (show)

See Also:	107283 90491
Host:
Target:	x86_64--
Build:
Known to work:
Known to fail:
Last reconfirmed:	2023-02-20 00:00:00

Attachments
Patch pending for GCC14 (3.02 KB, patch) 2023-03-09 05:52 UTC, Hongtao.liu	Details \| Diff
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description vincenzo Innocente 2023-02-15 14:16:14 UTC

in the following code [1] foo does not vectorize, bar doos
compiled with -march=haswell -Ofast --no-math-errno -Wall
see
https://godbolt.org/z/E6xzfavxc

clang seems do do better

[1]
#include<cstdint>


 
uint64_t d[512];
//uint32_t f[1024];
float f[1024];

void foo() {
    for (int i=0; i<512; ++i) {
        uint64_t k = d[i];
        auto x  = (k & 0x007FFFFF) |  0x3F800000;
        k = k >> 23;
        auto y  = (k & 0x007FFFFF) |  0x3F800000;
        f[i]=x; f[128+i] = y;

    }    
}

void bar() {
    for (int i=0; i<512; ++i) {
        uint64_t k = d[i];
        uint32_t x  = (k & 0x007FFFFF);
        x |= 0x3F800000;
        uint32_t y  = k >> 23;
        y  = (y & 0x007FFFFF) |  0x3F800000;
        f[i]=x; f[128+i] = y;

    }  
}

Comment 1 Andrew Pinski 2023-02-15 20:28:34 UTC

  _16 = (signed long) x_10;

Comment 2 Richard Biener 2023-02-20 11:14:28 UTC

EVRP does

@@ -38,16 +61,18 @@
   k_12 = k_10 >> 23;
   _2 = k_12 & 8388607;
   y_13 = _2 | 1065353216;
-  _3 = (float) x_11;
+  _17 = (signed long) x_11;
+  _3 = (float) _17;
   f[i_6] = _3;
   _4 = i_6 + 128;
-  _5 = (float) y_13;
+  _18 = (signed long) y_13;
+  _5 = (float) _18;
   f[_4] = _5;

because unsigned long -> float is even more difficult.  With -fno-tree-vrp
the conversion is still from uint64_t but that's not supported either.

So it's a target issue.  Shorter testcase:

#include<stdint.h>

uint64_t d[512];
float f[1024];

void foo() {
    for (int i=0; i<512; ++i) {
        uint64_t k = d[i];
        f[i]=k;
    }
}

Comment 3 Hongtao.liu 2023-02-21 00:54:47 UTC

I think the point here is: although it's unit64_t -> float, but the range of x and y can be represent as int32(k & 0x007FFFFF) |  0x3F800000), so we can use int32 -> float instructions which are supported by the backend.

So it looks to me a middle-end issue.

A simple testcase clang generates vcvtdq2ps but gcc doesn't vectorize.

#include<stdint.h>

uint64_t d[512];
float f[1024];

void foo() {
    for (int i=0; i<512; ++i) {
        uint64_t k = d[i];
        f[i]=(k & 0x3F30FFFF);
    }
}

manually add convertion then gcc also can do vectorization.

#include<stdint.h>

uint64_t d[512];
float f[1024];

void foo() {
    for (int i=0; i<512; ++i) {
        uint64_t k = d[i];
        f[i]=(int)(k & 0x3F30FFFF);
    }
}

Comment 4 Hongtao.liu 2023-03-09 05:52:51 UTC

Created attachment 54613 [details]
Patch pending for GCC14

Comment 5 GCC Commits 2023-05-30 23:18:05 UTC

The master branch has been updated by hongtao Liu <liuhongt@gcc.gnu.org>:

https://gcc.gnu.org/g:3279b6223066d36d2e6880a137f80a46d3c82c8f

commit r14-1421-g3279b6223066d36d2e6880a137f80a46d3c82c8f
Author: liuhongt <hongtao.liu@intel.com>
Date:   Wed Feb 22 17:54:46 2023 +0800

    Enhance NARROW FLOAT_EXPR vectorization by truncating integer to lower precision.
    
    Similar like WIDEN FLOAT_EXPR, when direct_optab is not existed, try
    intermediate integer type whenever gimple ranger can tell it's safe.
    
    .i.e.
    When there's no direct optab for vector long long -> vector float, but
    the value range of integer can be represented as int, try vector int
    -> vector float if availble.
    
    gcc/ChangeLog:
    
            PR tree-optimization/108804
            * tree-vect-patterns.cc (vect_get_range_info): Remove static.
            * tree-vect-stmts.cc (vect_create_vectorized_demotion_stmts):
            Add new parameter narrow_src_p.
            (vectorizable_conversion): Enhance NARROW FLOAT_EXPR
            vectorization by truncating to lower precision.
            * tree-vectorizer.h (vect_get_range_info): New declare.
    
    gcc/testsuite/ChangeLog:
    
            * gcc.target/i386/pr108804.c: New test.

Comment 6 Hongtao.liu 2023-05-30 23:18:41 UTC

Fixed for GCC14.