Test case: using T = unsigned char; // or ushort, or uint using V [[gnu::vector_size(8)]] = T; V f(V x) { return x >> 8 * sizeof(T); } GCC 10 compiles to either xor or shift (which should better be xor, as well) GCC 9.2 compiles to: vmovq rax, xmm0 mov ecx, 64 shr rax, cl sal rax, (64 - 8*sizeof(T)) vmovq xmm0, rax The `shr rax, cl`, where cl == 64 is a nop, because shr (and shrx, which is used when BMI2 is enabled) mask the count with 0x3f. Consequently the last element of the input vector is unchanged in the output. In any case, the use of shr/shrx with shifts > 64 (or 32 in case of the 32-bit variant) should not occur.
https://godbolt.org/z/zxmCTz
So the issue started with r8-349-gcc5b8f3d568e95ce74e03d8d87ada71117a6c106 and disappeared in r10-2445-g5fbc8ab48a57a75e0ce064befc30dee3dc63327a.
For unsigned int this would be undefined behavior via attempt to shift by 32 (but we fail to emit the corresponding warning). For narrower types (char, short) this seems well-defined.
Good point. Since gnu::vector_size(N) types are defined by you, you might be able to say that for char and short this is also UB. After all the left operand isn't actually promoted to int. Consequently the the right operand is equal to the width of the "promoted" left operand in all cases.
Ah, indeed, it should be explicitly UB, and the documentation should mention that as well as that implicit integer promotion does not happen for vector shifts and other operations.
FWIW, I'd prefer gnu::vector_size(N) to not introduce any additional UB over the scalar arithmetic types. I.e. behave like if promotion would happen, just with final assignment back to T (truncation). While I hate integer promotion, in this case I'd vote for consistency.
(In reply to Martin Liška from comment #2) > So the issue started with r8-349-gcc5b8f3d568e95ce74e03d8d87ada71117a6c106 > and disappeared in r10-2445-g5fbc8ab48a57a75e0ce064befc30dee3dc63327a. Hi Martin, How do I build gcc around those commits? I have tried building x86_64 around 5b8f3d568e95ce74e03d8d87ada71117a6c106 and it keeps failing for the commits around there with a libgcc compile error. In file included from /data/tamchr01/write-access/gcc-git/libgcc/unwind-dw2.c:403:0: ./md-unwind-support.h: In function 'x86_64_fallback_frame_state': ./md-unwind-support.h:65:47: error: dereferencing pointer to incomplete type 'struct ucontext' sc = (struct sigcontext *) (void *) &uc_->uc_mcontext; and disabling libgcc doesn't get me very far because of the dependencies of things on it.
(In reply to Tamar Christina from comment #7) > In file included from > /data/tamchr01/write-access/gcc-git/libgcc/unwind-dw2.c:403:0: > ./md-unwind-support.h: In function 'x86_64_fallback_frame_state': > ./md-unwind-support.h:65:47: error: dereferencing pointer to incomplete type > 'struct ucontext' > sc = (struct sigcontext *) (void *) &uc_->uc_mcontext; > > and disabling libgcc doesn't get me very far because of the dependencies of > things on it. You must be using a too new glibc, g:883312dc79806f513275b72502231c751c14ff72 is what fixed libgcc to support newer glibc.
Thanks Andrew, managed to build it now. Testing a patch.
The master branch has been updated by Tamar Christina <tnfchris@gcc.gnu.org>: https://gcc.gnu.org/g:e60b1e23626701939e8a2f0cf6fc1e48abdf867b commit r10-6371-ge60b1e23626701939e8a2f0cf6fc1e48abdf867b Author: Tamar Christina <tamar.christina@arm.com> Date: Fri Jan 31 14:39:38 2020 +0000 middle-end: Fix logical shift truncation (PR rtl-optimization/91838) This fixes a fall-out from a patch I had submitted two years ago which started allowing simplify-rtx to fold logical right shifts by offsets a followed by b into >> (a + b). However this can generate inefficient code when the resulting shift count ends up being the same as the size of the shift mode. This will create some undefined behavior on most platforms. This patch changes to code to truncate to 0 if the shift amount goes out of range. Before my older patch this used to happen in combine when it saw the two shifts. However since we combine them here combine never gets a chance to truncate them. The issue mostly affects GCC 8 and 9 since on 10 the back-end knows how to deal with this shift constant but it's better to do the right thing in simplify-rtx. Note that this doesn't take care of the Arithmetic shift where you could replace the constant with MODE_BITS (mode) - 1, but that's not a regression so punting it. gcc/ChangeLog: PR rtl-optimization/91838 * simplify-rtx.c (simplify_binary_operation_1): Update LSHIFTRT case to truncate if allowed or reject combination. gcc/testsuite/ChangeLog: PR rtl-optimization/91838 * g++.dg/pr91838.C: New test.
The master branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>: https://gcc.gnu.org/g:5910b14503dd82772dfeca5336a0176f9b1d260a commit r10-6381-g5910b14503dd82772dfeca5336a0176f9b1d260a Author: Jakub Jelinek <jakub@redhat.com> Date: Fri Jan 31 19:35:11 2020 +0100 testsuite: Fix up pr91838.C test [PR91838] The test FAILs on i686-linux with: FAIL: g++.dg/pr91838.C (test for excess errors) Excess errors: /home/jakub/src/gcc/gcc/testsuite/g++.dg/pr91838.C:7:8: warning: MMX vector return without MMX enabled changes the ABI [-Wpsabi] /home/jakub/src/gcc/gcc/testsuite/g++.dg/pr91838.C:7:3: warning: MMX vector argument without MMX enabled changes the ABI [-Wpsabi] and on x86_64-linux with -m32 testing with failure to match the expected pattern in there (or both with e.g. -m32/-mno-mmx/-mno-sse testing). The test is also in a wrong directory, has non-standard specification that it requires c++11 or later. 2020-01-31 Jakub Jelinek <jakub@redhat.com> PR rtl-optimization/91838 * g++.dg/pr91838.C: Moved to ... * g++.dg/opt/pr91838.C: ... here. Require c++11 target instead of dg-skip-if for c++98. Pass -Wno-psabi -w to avoid psabi style warnings on vector arg passing or return. Add -masm=att on i?86/x86_64. Only check for pxor %xmm0, %xmm0 on lp64 i?86/x86_64.
The releases/gcc-8 branch has been updated by Tamar Christina <tnfchris@gcc.gnu.org>: https://gcc.gnu.org/g:0e35a433f8ec02ac46eb5ceb4a9bc6a25e88b05c commit r8-9975-g0e35a433f8ec02ac46eb5ceb4a9bc6a25e88b05c Author: Tamar Christina <tamar.christina@arm.com> Date: Tue Feb 11 10:47:19 2020 +0000 middle-end: Fix logical shift truncation (PR rtl-optimization/91838) (gcc-8 backport) This fixes a fall-out from a patch I had submitted two years ago which started allowing simplify-rtx to fold logical right shifts by offsets a followed by b into >> (a + b). However this can generate inefficient code when the resulting shift count ends up being the same as the size of the shift mode. This will create some undefined behavior on most platforms. This patch changes to code to truncate to 0 if the shift amount goes out of range. Before my older patch this used to happen in combine when it saw the two shifts. However since we combine them here combine never gets a chance to truncate them. The issue mostly affects GCC 8 and 9 since on 10 the back-end knows how to deal with this shift constant but it's better to do the right thing in simplify-rtx. Note that this doesn't take care of the Arithmetic shift where you could replace the constant with MODE_BITS (mode) - 1, but that's not a regression so punting it. gcc/ChangeLog: Backport from mainline 2020-01-31 Tamar Christina <tamar.christina@arm.com> PR rtl-optimization/91838 * simplify-rtx.c (simplify_binary_operation_1): Update LSHIFTRT case to truncate if allowed or reject combination. gcc/testsuite/ChangeLog: Backport from mainline 2020-01-31 Tamar Christina <tamar.christina@arm.com> Jakub Jelinek <jakub@redhat.com> PR rtl-optimization/91838 * g++.dg/opt/pr91838.C: New test.
The releases/gcc-9 branch has been updated by Tamar Christina <tnfchris@gcc.gnu.org>: https://gcc.gnu.org/g:f6e9ae4da8f040ab2ef2eb37d0fb4da6f823bf81 commit r9-8210-gf6e9ae4da8f040ab2ef2eb37d0fb4da6f823bf81 Author: Tamar Christina <tamar.christina@arm.com> Date: Tue Feb 11 10:50:12 2020 +0000 middle-end: Fix logical shift truncation (PR rtl-optimization/91838) (gcc-9 backport) This fixes a fall-out from a patch I had submitted two years ago which started allowing simplify-rtx to fold logical right shifts by offsets a followed by b into >> (a + b). However this can generate inefficient code when the resulting shift count ends up being the same as the size of the shift mode. This will create some undefined behavior on most platforms. This patch changes to code to truncate to 0 if the shift amount goes out of range. Before my older patch this used to happen in combine when it saw the two shifts. However since we combine them here combine never gets a chance to truncate them. The issue mostly affects GCC 8 and 9 since on 10 the back-end knows how to deal with this shift constant but it's better to do the right thing in simplify-rtx. Note that this doesn't take care of the Arithmetic shift where you could replace the constant with MODE_BITS (mode) - 1, but that's not a regression so punting it. gcc/ChangeLog: Backport from mainline 2020-01-31 Tamar Christina <tamar.christina@arm.com> PR rtl-optimization/91838 * simplify-rtx.c (simplify_binary_operation_1): Update LSHIFTRT case to truncate if allowed or reject combination. gcc/testsuite/ChangeLog: Backport from mainline 2020-01-31 Tamar Christina <tamar.christina@arm.com> Jakub Jelinek <jakub@redhat.com> PR rtl-optimization/91838 * g++.dg/opt/pr91838.C: New test.
Fixed in master and backported to GCC 9 and GCC 8
This test returns different results, depending on whether V8QImode shift pattern is present in target *.md files. For x86, if the following expander is added: +(define_expand "<insn>v8qi3" + [(set (match_operand:V8QI 0 "register_operand") + (any_shift:V8QI (match_operand:V8QI 1 "register_operand") + (match_operand:DI 2 "nonmemory_operand")))] + "TARGET_MMX_WITH_SSE" +{ + ix86_expand_vecop_qihi_partial (<CODE>, operands[0], + operands[1], operands[2]); + DONE; +}) then the tree optimizers produce: V f (V x) { V _2; <bb 2> [local count: 1073741824]: _2 = x_1(D) >> 8; return _2; } otherwise, without the named expander: V f (V x) { <bb 2> [local count: 1073741824]: return { 0, 0, 0, 0, 0, 0, 0, 0 }; }
So the testcase g++.dg/opt/pr91838.C depends on the out come of the discussion at: https://gcc.gnu.org/pipermail/gcc-patches/2023-June/620378.html Which I think is saying this testcase is undefined really.
Interestingly even with -mno-sse we somehow have a shift for V2QImode. When a scalar shift by precision is in the IL (for example via veclower) then it's CCPs bit value tracking that makes it zero, if that's disabled then VRP will compute it zero. But we don't have any pattern that would consistently do this. I'm going to add one.
(In reply to Richard Biener from comment #17) > Interestingly even with -mno-sse we somehow have a shift for V2QImode. This is implemented by a combination of shl rl,cl and shl rh,cl, so no XMM registers are needed.
The master branch has been updated by Richard Biener <rguenth@gcc.gnu.org>: https://gcc.gnu.org/g:d1c072a1c3411a6fe29900750b38210af8451eeb commit r14-2821-gd1c072a1c3411a6fe29900750b38210af8451eeb Author: Richard Biener <rguenther@suse.de> Date: Thu Jul 27 13:08:32 2023 +0200 tree-optimization/91838 - fix FAIL of g++.dg/opt/pr91838.C The following fixes the lack of simplification of a vector shift by an out-of-bounds shift value. For scalars this is done both by CCP and VRP but vectors are not handled there. This results in PR91838 differences in outcome dependent on whether a vector shift ISA is available and thus vector lowering does or does not expose scalar shifts here. The following adds a match.pd pattern to catch uniform out-of-bound shifts, simplifying them to zero when not sanitizing shift amounts. PR tree-optimization/91838 * gimple-match-head.cc: Include attribs.h and asan.h. * generic-match-head.cc: Likewise. * match.pd (([rl]shift @0 out-of-bounds) -> zero): New pattern.
The testcase is again fixed in GCC 14.
The releases/gcc-13 branch has been updated by Andre Simoes Dias Vieira <avieira@gcc.gnu.org>: https://gcc.gnu.org/g:c6e171ff827f8ff1bd160babac0dd24933972664 commit r13-8498-gc6e171ff827f8ff1bd160babac0dd24933972664 Author: Richard Biener <rguenther@suse.de> Date: Thu Jul 27 13:08:32 2023 +0200 tree-optimization/91838 - fix FAIL of g++.dg/opt/pr91838.C The following fixes the lack of simplification of a vector shift by an out-of-bounds shift value. For scalars this is done both by CCP and VRP but vectors are not handled there. This results in PR91838 differences in outcome dependent on whether a vector shift ISA is available and thus vector lowering does or does not expose scalar shifts here. The following adds a match.pd pattern to catch uniform out-of-bound shifts, simplifying them to zero when not sanitizing shift amounts. PR tree-optimization/91838 * gimple-match-head.cc: Include attribs.h and asan.h. * generic-match-head.cc: Likewise. * match.pd (([rl]shift @0 out-of-bounds) -> zero): New pattern. (cherry picked from commit d1c072a1c3411a6fe29900750b38210af8451eeb)
The releases/gcc-12 branch has been updated by Andre Simoes Dias Vieira <avieira@gcc.gnu.org>: https://gcc.gnu.org/g:1ddd9f9e53bd649d3d236f7941106d8cc46e7358 commit r12-10293-g1ddd9f9e53bd649d3d236f7941106d8cc46e7358 Author: Richard Biener <rguenther@suse.de> Date: Thu Jul 27 13:08:32 2023 +0200 tree-optimization/91838 - fix FAIL of g++.dg/opt/pr91838.C The following fixes the lack of simplification of a vector shift by an out-of-bounds shift value. For scalars this is done both by CCP and VRP but vectors are not handled there. This results in PR91838 differences in outcome dependent on whether a vector shift ISA is available and thus vector lowering does or does not expose scalar shifts here. The following adds a match.pd pattern to catch uniform out-of-bound shifts, simplifying them to zero when not sanitizing shift amounts. PR tree-optimization/91838 * gimple-match-head.cc: Include attribs.h and asan.h. * generic-match-head.cc: Likewise. * match.pd (([rl]shift @0 out-of-bounds) -> zero): New pattern. (cherry picked from commit d1c072a1c3411a6fe29900750b38210af8451eeb)