Bug 43724 - GCC produces suboptimal ARM NEON code for zero vector assignment
Summary: GCC produces suboptimal ARM NEON code for zero vector assignment
Status: NEW
Alias: None
Product: gcc
Classification: Unclassified
Component: target (show other bugs)
Version: 4.4.3
: P3 enhancement
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords: missed-optimization
Depends on:
Blocks: 47562
  Show dependency treegraph
 
Reported: 2010-04-12 05:39 UTC by Liran Nuna
Modified: 2015-08-19 09:38 UTC (History)
5 users (show)

See Also:
Host: x86_64-linux-gnu
Target: arm-linux-gnueabi
Build: x86_64-linux-gnu
Known to work:
Known to fail: 4.5.0
Last reconfirmed: 2010-04-15 09:17:28


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Liran Nuna 2010-04-12 05:39:25 UTC
The intrinsic family for vdupq_n_XXX with argument of 0.

The code generated is:

	mov	r0, #0
	vdup.32	q8, r0

Instead of the faster

	veor.32	q8, q8, q8

Thing to note is that GCC will use xorps on x86[_64] for SSE when using _mm_setzero_ps() or _mm_set1_ps(0).
Comment 1 Siarhei Siamashka 2010-04-12 06:17:52 UTC
Or just "vmov.i32 q8, #0" would be better to avoid any potential data dependency.
Comment 2 Ramana Radhakrishnan 2010-04-15 09:17:28 UTC
Confirmed.