[RFC][Vectorizer, AArch64] Fix PR/61114 by redefining REDUC_xxx_EXPR tree codes to return scalars

Fri Aug 1 10:19:00 GMT 2014

This fixes PR/61114 by redefining the REDUC_{MIN,MAX,PLUS}_EXPR tree codes.

These are presently documented as producing a vector with the result in element 
0, and this is inconsistent with their use in tree-vect-loop.c (which on 
bigendian targets pulls the bits out of the other end of the vector result). 
This leads to bugs on bigendian targets - see 
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61114 for a small testcase.

I discounted "fixing" the vectorizer (to always read from element 0) and then 
making the optab reverse the result on bigendian targets (whose architectural 
insn produces the result in lane N-1), as optimization of vectors in RTL seems 
unlikely to remove such a reverse/permute and so this would lead to a 
performance regression (specifically on PowerPC).

Instead it seems more natural for the tree code to produce a scalar result 
(producing a vector with the result in lane 0 has already caused confusion, e.g. 
https://gcc.gnu.org/ml/gcc-patches/2012-10/msg01100.html).

This patch preserves the meaning of the existing optab (producing a result in 
lane 0 on little-endian architectures or N-1 on bigendian), thus generally 
avoiding the need to change backends. Hence, expr.c extracts an 
endianness-dependent element from the optab result to give the result expected 
for the tree code.

Significant complication in the AArch64 backend stems from the existence of 
builtins for reduction operations, which are gimple_fold'd to the tree code. 
Hence, I introduce new define_expands, and map the existing 
__builtin_aarch64_reduc_s{plus,min,max}_<mode> functions to those, with scalar 
result types, matching the result of the tree code to which these are still 
gimple_folded.

If the above/proposed solution is acceptable, I'd make a longer patch series, 
including some cleanup to tree-vect-loop.c (vect_create_epilog_for_reduction now 
has only one case where extract_scalar_result == true), and separating out 
AArch64 changes. Further, I'd like to propose creating a new optab that directly 
outputs a scalar, as a migration path away from the existing optab whose meaning 
is endianness-dependent, i.e. such that expand_unop falls back to the existing 
optab only if the new one is not defined.

Patch as it stands has been bootstrapped on x86_64 and regression tested on 
aarch64 and aarch64_be without regressions. On x86_64 there is a regression in 
gcc.target/i386/pr51235.c, where it seems my check in tree-cfg.c is too strict - 
we end up with a reduction from "vector (4) unsigned long int" to "void *". 
(Even if I modify tree-vect-loop.c to build the REDUC_..._EXPR as returning the 
element type of the input vector, its return type is later changed.) It seems I 
can "get away with" a less-strict check in tree-cfg.c, i.e. allowing the case 
where the modes of the expected and actual result types match (rather than 
"useless_type_conversion_p" holding between said types), but if anyone can 
suggest an alternative/better check then it'd be great to hear it...

--Alan
-------------- next part --------------
A non-text attachment was scrubbed...
Name: vectorizer_rfc.patch
Type: text/x-patch
Size: 17695 bytes
Desc: not available
URL: <http://gcc.gnu.org/pipermail/gcc-patches/attachments/20140801/8b84612e/attachment.bin>