Created attachment 47431 [details] simplified testcase Clang is able to vectorize LightPixel which leads to about 10% improvements in rasterflood-svg Firefox benchmark.
t.C:260:16: missed: Build SLP failed: incompatible vector types for: patt_288 = (unsigned char) patt_290; t.C:260:16: note: old vector type: vector(8) unsigned char t.C:260:16: note: new vector type: vector(4) unsigned char we're also re-trying for all vector sizes even though the above issue will prevail. But one of the main issues is likely that we don't pattern match static inline unsigned umax(unsigned a, unsigned b) { return a - ((a - b) & -(a < b)); } static inline unsigned umin(unsigned a, unsigned b) { return a - ((a - b) & -(a > b)); } I wonder if they desperately searched for sth not handled by GCC ;) Btw, my copy of clang (OK, 7.0...) doesn't vectorize the color.components[] compute but only the leading three-component compute involving reverse sqrt, not vectorized by us because we don't handle BB vectorization of reductions. There's duplicate PRs about this. Which part is important to vectorize? Also - what flags did you use? (tried -Ofast -mavx2)
Created attachment 47436 [details] Clang assembly from perf It is clang9 build https://treeherder.mozilla.org/#/jobs?repo=try&revision=7d7ee02817ab1ea39a6415862ab7889f5e416598&selectedJob=278948829 it has full logs and binary, too /builds/worker/fetches/sccache/sccache /builds/worker/fetches/clang/bin/clang++ -o Unified_cpp_gfx_2d2.o -c -flto=thin -I/builds/worker/workspace/build/src/obj-firefox/dist/system_wrappers -include /builds/worker/workspace/build/src/config/gcc_hidden.h -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=2 -fstack-protector-strong -DMOZILLA_CLIENT -include /builds/worker/workspace/build/src/obj-firefox/mozilla-config.h -Qunused-arguments -Qunused-arguments -Wall -Wbitfield-enum-conversion -Wempty-body -Wignored-qualifiers -Woverloaded-virtual -Wpointer-arith -Wshadow-field-in-constructor-modified -Wsign-compare -Wtype-limits -Wunreachable-code -Wunreachable-code-return -Wwrite-strings -Wno-invalid-offsetof -Wclass-varargs -Wfloat-overflow-conversion -Wfloat-zero-conversion -Wloop-analysis -Wc++1z-compat -Wc++2a-compat -Wcomma -Wimplicit-fallthrough -Werror=non-literal-null-conversion -Wstring-conversion -Wtautological-overlap-compare -Wtautological-unsigned-enum-zero-compare -Wtautological-unsigned-zero-compare -Wno-error=tautological-type-limit-compare -Wno-inline-new-delete -Wno-error=type-limits -Wno-error=pessimizing-move -Wno-error=nonnull -Wno-error=deprecated-declarations -Wno-error=array-bounds -Wno-error=backend-plugin -Wno-error=return-std-move -Wno-error=atomic-alignment -Wformat -Wformat-security -Wno-gnu-zero-variadic-macro-arguments -Wno-unknown-warning-option -Wno-return-type-c-linkage -D_GLIBCXX_USE_CXX11_ABI=0 -fno-sized-deallocation -fno-aligned-new -fcrash-diagnostics-dir=/builds/worker/artifacts -fno-strict-aliasing -fno-strict-aliasing -fno-exceptions -fno-rtti -fno-exceptions -fno-math-errno -pthread -pipe -I/builds/worker/workspace/build/src/obj-firefox/dist/stl_wrappers -DNDEBUG=1 -DTRIMMED=1 -DUSE_SSE2 -DOS_POSIX=1 -DOS_LINUX=1 -DUSE_CAIRO -DMOZ2D_HAS_MOZ_CAIRO -DMOZ_ENABLE_FREETYPE -DSTATIC_EXPORTABLE_JS_API -DMOZ_HAS_MOZGLUE -DMOZILLA_INTERNAL_API -DIMPL_LIBXUL -I/builds/worker/workspace/build/src/gfx/2d -I/builds/worker/workspace/build/src/obj-firefox/gfx/2d -I/builds/worker/workspace/build/src/obj-firefox/ipc/ipdl/_ipdlheaders -I/builds/worker/workspace/build/src/ipc/chromium/src -I/builds/worker/workspace/build/src/ipc/glue -I/builds/worker/workspace/build/src/gfx/skia -I/builds/worker/workspace/build/src/gfx/skia/skia -I/builds/worker/workspace/build/src/obj-firefox/dist/include -I/builds/worker/workspace/build/src/obj-firefox/dist/include/nspr -I/builds/worker/workspace/build/src/obj-firefox/dist/include/nss -fPIC -g -Xclang -load -Xclang /builds/worker/workspace/build/src/obj-firefox/build/clang-plugin/libclang-plugin.so -Xclang -add-plugin -Xclang moz-check -O2 -fno-omit-frame-pointer -funwind-tables -Werror -Wno-error=shadow -I/builds/worker/workspace/build/src/obj-firefox/dist/include/cairo -I/usr/include/freetype2 -MD -MP -MF .deps/Unified_cpp_gfx_2d2.o.pp Unified_cpp_gfx_2d2.cpp
Created attachment 47437 [details] gcc10-pr92834.patch This untested patch just undoes the fancy way of writing blend operation and turns it into ?: that we are able to optimize better.
Author: jakub Date: Mon Dec 9 10:13:18 2019 New Revision: 279113 URL: https://gcc.gnu.org/viewcvs?rev=279113&root=gcc&view=rev Log: PR tree-optimization/92834 * match.pd (A - ((A - B) & -(C cmp D)) -> (C cmp D) ? B : A, A + ((B - A) & -(C cmp D)) -> (C cmp D) ? B : A): New simplifications. * gcc.dg/tree-ssa/pr92834.c: New test. Added: trunk/gcc/testsuite/gcc.dg/tree-ssa/pr92834.c Modified: trunk/gcc/ChangeLog trunk/gcc/match.pd trunk/gcc/testsuite/ChangeLog
I think this is now the standard GCC SLP does not vectorize starting at loads but rather starts at stores issue.
So there is a store and we vectorize it: _188 = {_10, _14, _18, _22}; MEM[(union ._anon_7 *)&color] = _188; _43 = color.bgra; color ={v} {CLOBBER}; return _43; but we fail to vectorize more because there's a mix of min/max operations here. In fact we even recognize the reduction on the 'A' component but again we fail to be clever enough to exploit it (it's also 3 lanes and thus difficult to handle).