This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH v2 0/9] S/390: Use signaling FP comparison instructions
> Am 30.08.2019 um 09:16 schrieb Richard Biener <richard.guenther@gmail.com>:
>
> On Fri, Aug 30, 2019 at 9:12 AM Richard Biener
> <richard.guenther@gmail.com> wrote:
>>
>> On Thu, Aug 29, 2019 at 5:39 PM Ilya Leoshkevich <iii@linux.ibm.com> wrote:
>>>
>>>> Am 22.08.2019 um 15:45 schrieb Ilya Leoshkevich <iii@linux.ibm.com>:
>>>>
>>>> Bootstrap and regtest running on x86_64-redhat-linux and
>>>> s390x-redhat-linux.
>>>>
>>>> This patch series adds signaling FP comparison support (both scalar and
>>>> vector) to s390 backend.
>>>
>>> I'm running into a problem on ppc64 with this patch, and it would be
>>> great if someone could help me figure out the best way to resolve it.
>>>
>>> vector36.C test is failing because gimplifier produces the following
>>>
>>> _5 = _4 > { 2.0e+0, 2.0e+0, 2.0e+0, 2.0e+0 };
>>> _6 = VEC_COND_EXPR <_5, { -1, -1, -1, -1 }, { 0, 0, 0, 0 }>;
>>>
>>> from
>>>
>>> VEC_COND_EXPR < (*b > { 2.0e+0, 2.0e+0, 2.0e+0, 2.0e+0 }) ,
>>> { -1, -1, -1, -1 } ,
>>> { 0, 0, 0, 0 } >
>>>
>>> Since the comparison tree code is now hidden behind a temporary, my code
>>> does not have anything to pass to the backend. The reason for creating
>>> a temporary is that the comparison can trap, and so the following check
>>> in gimplify_expr fails:
>>>
>>> if (gimple_seq_empty_p (internal_post) && (*gimple_test_f) (*expr_p))
>>> goto out;
>>>
>>> gimple_test_f is is_gimple_condexpr, and it eventually calls
>>> operation_could_trap_p (GT).
>>>
>>> My current solution is to simply state that backend does not support
>>> SSA_NAME in vector comparisons, however, I don't like it, since it may
>>> cause performance regressions due to having to fall back to scalar
>>> comparisons.
>>>
>>> I was thinking about two other possible solutions:
>>>
>>> 1. Change the gimplifier to allow trapping vector comparisons. That's
>>> a bit complicated, because tree_could_throw_p checks not only for
>>> floating point traps, but also e.g. for array index out of bounds
>>> traps. So I would have to create a tree_could_throw_p version which
>>> disregards specific kinds of traps.
>>>
>>> 2. Change expand_vector_condition to follow SSA_NAME_DEF_STMT and use
>>> its tree_code instead of SSA_NAME. The potential problem I see with
>>> this is that there appears to be no guarantee that _5 will be inlined
>>> into _6 at a later point. So if we say that we don't need to fall
>>> back to scalar comparisons based on availability of vector >
>>> instruction and inlining does not happen, then what's actually will
>>> be required is vector selection (vsel on S/390), which might not be
>>> available in general case.
>>>
>>> What would be a better way to proceed here?
>>
>> On GIMPLE there isn't a good reason to split out trapping comparisons
>> from [VEC_]COND_EXPR - the gimplifier does this for GIMPLE_CONDs
>> where it is important because we'd have no way to represent EH info
>> when not done. It might be a bit awkward to preserve EH across RTL
>> expansion though in case the [VEC_]COND_EXPR are not expanded
>> as a single pattern, but I'm not sure.
>>
>> To go this route you'd have to split the is_gimple_condexpr check
>> I guess and eventually users turning [VEC_]COND_EXPR into conditional
>> code (do we have any?) have to be extra careful then.
>
> Oh, btw - the fact that we have an expression embedded in [VEC_]COND_EXPR
> is something that bothers me for quite some time already and it makes
> things like VN awkward and GIMPLE fincky. We've discussed alternatives
> to dead with the simplest being moving the comparison out to a separate
> stmt and others like having four operand [VEC_]COND_{EQ,NE,...}_EXPR
> codes or simply treating {EQ,NE,...}_EXPR as quarternary on GIMPLE
> with either optional 3rd and 4th operand (defaulting to boolean_true/false_node)
> or always explicit ones (and thus dropping [VEC_]COND_EXPR).
>
> What does LLVM do here?
For
void f(long long * restrict w, double * restrict x, double * restrict y, int n)
{
for (int i = 0; i < n; i++)
w[i] = x[i] == y[i] ? x[i] : y[i];
}
LLVM does
%26 = fcmp oeq <2 x double> %21, %25
%27 = extractelement <2 x i1> %26, i32 0
%28 = select <2 x i1> %26, <2 x double> %21, <2 x double> %25
So they have separate operations for comparisons and ternary operator
(fcmp + select).