This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug middle-end/31723] Use reciprocal and reciprocal square root with -ffast-math
- From: "ubizjak at gmail dot com" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: 10 Jun 2007 17:34:24 -0000
- Subject: [Bug middle-end/31723] Use reciprocal and reciprocal square root with -ffast-math
- References: <bug-31723-11659@http.gcc.gnu.org/bugzilla/>
- Reply-to: gcc-bugzilla at gcc dot gnu dot org
------- Comment #18 from ubizjak at gmail dot com 2007-06-10 17:34 -------
(In reply to comment #14)
> The interesting difference between sqrtss, divss and rcpss, rsqrtss is that
> the former have throughput of 1/16 while the latter are 1/1 (latencies compare
> 21 vs. 3). This is on K10. The optimization guide only mentions calculating
> the reciprocal y = a/b via rcpss and the square root (!) via rsqrtss
> (sqrt a = 0.5 * a * rsqrtss(a) * (3.0 - a * rsqrtss(a) * rsqrtss(a)))
>
> So the optimization would be mainly to improve instruction throughput, not
> overall latency.
If this is the case, then middle-end will need to fold sqrtss in different way
for targets that prefer rsqrtss. According to Comment #16, it is better to fold
to 1.0/sqrt(c/b) instead of sqrt(b/c) because this way, we will loose one
multiplication during NR expansion by rsqrt [due to sqrt(x) <=> x * (1.0 /
sqrt(x))].
IMO we need a new tree code to handle reciprocal sqrt - RSQRT_EXPR, together
with proper folding functionality that expands directly to (NR-enhanced) rsqrt
optab. If we consider a*sqrt(b/c), then b/c will be expanded as b* NR-rcp(c)
[where NR-rcp stands for NR enhanced rcp] and sqrt will be expanded as
NR-rsqrt. In this case, I see no RTL pass that would be able to combine
everything together in order to swap (b/c) operands to produce NR-enhanced
a*rsqrt(c/b) equivalent.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31723