[Patch ARM 0/3] Neon intrinsics TLC - Replace intrinsics with GNU C implementations where possible and remove dead code.

Ramana Radhakrishnan ramrad01@arm.com
Mon Apr 28 10:44:00 GMT 2014


Hi,

	I was investigating a performance issue with Neon intrinsics and 
realized this needed to happen.

	Patch 1/3 does this. I've special cased the ffast-math case for the 
_f32 intrinsics to prevent the auto-vectorizer from coming along and 
vectorizing addv2sf and addv4sf type operations which we don't want to 
happen by default. Patch 1/3 causes apparent "regressions" in the rather 
ineffective neon intrinsics tests that we currently carry soon hopefully 
to be replaced by Christophe Lyon's rewrite that is being reviewed. On 
the whole I deem this patch stack to be safe to go in if necessary. 
These "regressions" are for -O0 with the vbic and vorn intrinsics which 
don't now get combined and well, so be it.

	This then left us in the happy position of being able to delete code 
but I was worried about LTO streaming as these "builtins" are 
essentially streamed out in LTO object code format. However since we 
make no promises about LTO compatibility across releases, that's safe 
but I structured the dead code elimination as Patch 2/3. This will be 
committed separately in case folks want to backport Patch 1/3 separately 
and want to assure their users of LTO compatibility within a release 
branch (if that even works :)  ) .

	Patch 3/3 removes the ML to generate Neon intrinsics and the 
documentation and updates the comments in the files to show that these 
are now hand crafted rather than auto-generated. We've had these for 
many years now and I think it's time we got rid of this. Not everyone 
groks ML and it doesn't help that only one or 2 folks can actually do 
this properly everytime. Instead of having these bottlenecks and given 
the fact that the intrinsics are pretty stable now, there's no point in 
retaining the generator interface. I'd rather get rid of them. The only 
bit left is neon-schedgen.ml, neon.ml and neon-testgen.ml. I think we 
can safely remove neon-testgen.ml once Christophe's testsuite is done 
and we'll probably just have to carry neon-schedgen.ml / neon.ml as it 
still generates the neon descriptions for both a8 and a9.

	The patch stack was caught up in the C++ type info mess recently and 
I've tested this on a cross arm-linux-gnueabihf testsuite run and it 
looks ok module the issues mentioned for Patch 1/3. I've deliberately 
resisted deleting the entire gcc.target/arm/neon and neon-testgen.ml in 
the hope that Christophe's testsuite will do the honours at that point 
:). Given we're in stage 1 and that I think we're getting some where 
with clyon's testsuite I feel that is reasonably practical in just 
carrying the noise with these extra failures. Christophe and I will 
testdrive his testsuite work in this space with these patches to see how 
the conversion process works and if there are any issues with these patches.

If there are issues I'm happy to hear about them.

Will apply to trunk in a couple of days if no regressions with clyon's 
testsuite for these intrinsics.


regards
Ramana
-- 
Ramana Radhakrishnan
Principal Engineer
ARM Ltd.



More information about the Gcc-patches mailing list