This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug tree-optimization/52252] An opportunity for x86 gcc vectorizer (gain up to 3 times)
- From: "evstupac at gmail dot com" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: Wed, 29 Feb 2012 12:32:20 +0000
- Subject: [Bug tree-optimization/52252] An opportunity for x86 gcc vectorizer (gain up to 3 times)
- Auto-submitted: auto-generated
- References: <bug-52252-4@http.gcc.gnu.org/bugzilla/>
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52252
--- Comment #2 from Stupachenko Evgeny <evstupac at gmail dot com> 2012-02-29 12:32:20 UTC ---
The difference of 2 dumps from
Arm: gcc -O3 -mfpu=neon test.c -S -ftree-vectorizer-verbose=12
X86: gcc -O3 -m32 -msse3 test.c -S -ftree-vectorizer-verbose=12
Starts at:
For Arm (can use vec_load_lanes):
6: === vect_make_slp_decision ===
6: === vect_detect_hybrid_slp ===
6: === vect_analyze_loop_operations ===
6: examining phi: in_35 = PHI <in_22(7), in_5(D)(4)>
ââ
6: can use vec_load_lanes<CI><V16QI>
6: vect_model_load_cost: unaligned supported by hardware.
6: vect_model_load_cost: inside_cost = 2, outside_cost = 0 .
For x86 (no array mode for V16QI[3]):
6: === vect_make_slp_decision ===
6: === vect_detect_hybrid_slp ===
6: === vect_analyze_loop_operations ===
6: examining phi: in_35 = PHI <in_22(7), in_5(D)(4)>
.ââ
6: no array mode for V16QI[3]
6: the size of the group of strided accesses is not a power of 2
6: not vectorized: relevant stmt not supported: r_8 = *in_35;
As I mentioned before, there is an ability for x86 to handle this (Arm can
shuffle than loads, x86 can use pshufb).