This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: Optimize flag breaks code on many versions of gcc (not all)

From: tbp <tbptbp at gmail dot com>
To: "Richard Guenther" <richard dot guenther at gmail dot com>
Cc: "Seongbae Park" <seongbae dot park at gmail dot com>, "Dave Korn" <dave dot korn at artimi dot com>, "Paolo Carlini" <pcarlini at suse dot de>, "Zdenek Dvorak" <rakdver at atrey dot karlin dot mff dot cuni dot cz>, minus <minus at toneby dot com>, gcc at gcc dot gnu dot org
Date: Tue, 20 Jun 2006 04:07:22 +0200
Subject: Re: Optimize flag breaks code on many versions of gcc (not all)
References: <4495DBD5.4060802@suse.de> <016701c69380$87203650$a501a8c0@CAM.ARTIMI.COM> <ab3a61990606190859k7bea450cy69bc9a035641cf89@mail.gmail.com> <84fc9c000606190934y702499e4l554cf47522fc53a8@mail.gmail.com>

On 6/19/06, Richard Guenther <richard.guenther@gmail.com> wrote:

Using -mfpmath=sse -msse2 is a workaround if you have a processor that supports
SSE2 instructions.  As opposed to -ffloat-store, it works reliably and
with no performance
impact.

Such slab test can be turned into a branchless sequence of SSE
min/max, even for filtering infinities around dir ~= 0; it's much
simpler and efficient to intersect 4 rays against one box at once
though.
Without intrinsics a NaN oblivious version would be like:

static float minf(const float a, const float b) { return (a < b) ? a : b; }
static float maxf(const float a, const float b) { return (a > b) ? a : b; }

bool_t intersect_ray_box(const aabb_t &box, const rt::mono::ray_t
&ray, float &lmin, float &lmax)
{
	float
		l1	= (box.min.x - ray.pos.x) * ray.inv_dir.x,
		l2	= (box.max.x - ray.pos.x) * ray.inv_dir.x;
	lmin	= minf(l1,l2);
	lmax	= maxf(l1,l2);

	l1	= (box.min.y - ray.pos.y) * ray.inv_dir.y;
	l2	= (box.max.y - ray.pos.y) * ray.inv_dir.y;
	lmin	= maxf(minf(l1,l2), lmin);
	lmax	= minf(maxf(l1,l2), lmax);
	
	l1	= (box.min.z - ray.pos.z) * ray.inv_dir.z;
	l2	= (box.max.z - ray.pos.z) * ray.inv_dir.z;
	lmin	= maxf(minf(l1,l2), lmin);
	lmax	= minf(maxf(l1,l2), lmax);

	return (lmax >= lmin) & (lmax >= 0.f);
}

References:
- Re: Optimize flag breaks code on many versions of gcc (not all)
  - From: Paolo Carlini
- RE: Optimize flag breaks code on many versions of gcc (not all)
  - From: Dave Korn
- Re: Optimize flag breaks code on many versions of gcc (not all)
  - From: Seongbae Park
- Re: Optimize flag breaks code on many versions of gcc (not all)
  - From: Richard Guenther

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]