I would like to share a [simple example](https://gcc.godbolt.org/z/dbTsb3YMG) In this, as can be seen in the _second function_ 1. GCC is able to take advantage of builtin_unreachable (because removing it would change the line count) 2. GCC is able to take advantage of __builtin_assume_aligned. (Aligned Loads and Stores) Both of these seem fair However, in the _first function_:- 1. [Assume Keyword](https://en.cppreference.com/w/cpp/language/attributes/assume) is used 2. Unaligned Load Store is generated by GCC 3. Modulo operation advantage is not taken by GCC. Clang seems to be taking advantage of the same though. It seems that thanks to the assume keyword there is scope for further optimization gains. The following syntax also seems to work in Clang ```cpp [[assume(__builtin_assume_aligned(x_array, 32))]]; [[assume(__builtin_assume_aligned(mul_array, 32))]]; [[assume(__builtin_assume_aligned(add_array, 32))]]; ``` Why I wanted to use the assume keyword? 1. There is scope for increased const correctness 2. Looks nicer ```cpp /// @note This prototype works void MulAddLoop(const float* const __restrict mul_array, const float*const __restrict add_array, const ::std::size_t size, float* const __restrict x_array) { ``` PS:- 1. This is my first time filing a GCC Issue Report (I wouldn't really call this a bug) 2. I hope I linked to the correct component as I am not as well versed as I hope I could be with the internals of GCC.
Created attachment 58135 [details] testcase
I suspect the syntax you want instead is: [[assume(uintptr_t(x_array) & (32-1) == 0]]; Becuase __builtin_assume_aligned takes a pointer and returns a pointer that has the assumed alignment
Yeah definitely. My bad. Sorry. @Andrew Pinski [however even that change does not seem to change the results for GCC with Aligned Loads not being used](https://gcc.godbolt.org/z/9WbMbePc1) Added the more corrected 2 Power Trick function at the very end Any other mistake I could have made?
Oh this is the more correct syntax: [[assume((uintptr_t(x_array) & (32-1)) == 0)]]; [[assume((uintptr_t(mul_array) & (32-1)) == 0)]]; [[assume((uintptr_t(add_array) & (32-1)) == 0)]];
Right now we don't always prop back what information we get from the assume attributes. Maybe with the recent prange addition, it can for pointers ...
> [[assume((uintptr_t(x_array) & (32-1)) == 0)]]; The Parans in the & have definitely given someone sleepless nights LOL. I myself was saved by the warnings. > Right now we don't always prop back what information we get from the assume attributes. Maybe with the recent prange addition, it can for pointers ... Aah Guess we will switch to assume in the future. I tried [something else just about now](https://gcc.godbolt.org/z/d8aPjzMhq) I think its a bit wrong. Clang seems to be able to handle it. Is this syntax even valid? ```cpp if(reinterpret_cast<::std::uintptr_t>(__builtin_assume_aligned((void*)(mul_array), 32)) != reinterpret_cast<::std::uintptr_t>(mul_array)) { __builtin_unreachable(); } if(reinterpret_cast<::std::uintptr_t>(__builtin_assume_aligned((void*)(add_array), 32)) != reinterpret_cast<::std::uintptr_t>(add_array)) { __builtin_unreachable(); } if(reinterpret_cast<::std::uintptr_t>(__builtin_assume_aligned((void*)(x_array), 32)) != reinterpret_cast<::std::uintptr_t>(x_array)) { __builtin_unreachable(); } ``` I have my doubts on the previous one But this should ideally be valid ```cpp if((reinterpret_cast<::std::uintptr_t>(__builtin_assume_aligned((void*)(mul_array), 32)) & (32-1)) != 0) { __builtin_unreachable(); } if((reinterpret_cast<::std::uintptr_t>(__builtin_assume_aligned((void*)(add_array), 32)) & (32-1)) != 0) { __builtin_unreachable(); } if((reinterpret_cast<::std::uintptr_t>(__builtin_assume_aligned((void*)(x_array), 32)) & (32-1)) != 0) { __builtin_unreachable(); } ``` But either of them are unable to change the Load Stores from Unaligned to Aligned. Maybe victims of aggressive Dead Code elimination here? GCC intrinsics believe that there can be no case its false and the code is deleted for the same? Because __builtin_assume_aligned should always be a multiple of 32 in the above cases. Or __builtin_assume_aligned does not support usage like this in GCC and in Clang it does? And due to that difference, we have a difference in behavior. Its pretty interesting either way.
The above examples just show misunderstanding what __builtin_assume_aligned is and what it is not. You need to use the result of the built-in function in the accesses to be able to use the alignment information, if you just try to compare __builtin_assume_aligned (x, 32) == x, it will just fold as always true. The design of the builtin is to attach the alignment information to the result of the builtin function only. CCing Aldy/Andrew for whether prange can or could be taught to handle the assume cases with uintptr_t and bitwise and + comparison.
> if you just try to compare __builtin_assume_aligned (x, 32) == x, it will just fold as always true Aah. Dead code elimination. > CCing Aldy/Andrew for whether prange can or could be taught to handle the assume cases with uintptr_t and bitwise and + comparison. Yeah this could be very helpful in cases such as this. @Jakub @Andrew I think [this](https://gcc.godbolt.org/z/MEre8hr71) also has scope for taking advantage of the same. ```cpp void MulAddLoopWorksWithBuitInUnreachableAndConst(const float* const __restrict mul_array, const float* const __restrict add_array, const ::std::size_t size, float* const __restrict x_array) { if((reinterpret_cast<::std::uintptr_t>(mul_array) & (32-1)) != 0) { __builtin_unreachable(); } if((reinterpret_cast<::std::uintptr_t>(add_array) & (32-1)) != 0) { __builtin_unreachable(); } if((reinterpret_cast<::std::uintptr_t>(x_array) & (32-1)) != 0) { __builtin_unreachable(); } if ((size & (32 - 1)) != 0) __builtin_unreachable(); for (::std::size_t i = 0; i != size; i++) [[likely]] { const auto mul = *(mul_array + i); const auto add = *(add_array + i); x_array[i] = x_array[i] * mul + add; // x_array[i] *= mul; // x_array[i] += add; } } ``` Here we are working under the assumption that the memory addresses themselves are multiples of 32 if aligned for AVX2. Clang seems to be able to take advantage of the same here. If the __builtin_assume_aligned is kinda not supported due to dead code elimination, then this looks like a nice enough alternative. It also retains const correctness for me.
(In reply to Jakub Jelinek from comment #7) > The above examples just show misunderstanding what __builtin_assume_aligned > is and what it is not. You need to use the result of the built-in function > in the accesses to be able to use the alignment information, if you just try > to compare __builtin_assume_aligned (x, 32) == x, it will just fold as > always true. The design of the builtin is to attach the alignment > information to the result of the builtin function only. > > CCing Aldy/Andrew for whether prange can or could be taught to handle the > assume cases with uintptr_t and bitwise and + comparison. All the pieces are there to make it work, both with the assume aligned and with the uintptr_t case. And we could probably get it all without prange. For example: #include <cstdint> void foo (const float *); void bar1 (const float *array) { [[assume(array != nullptr)]]; const float *aligned = (const float *) __builtin_assume_aligned (array, 32); foo (aligned); } The __builtin_assume_aligned hasn't been expanded by evrp, so we should be able to add a range-op entry for it. This is what evrp sees: void bar1 (const float * array) { const float * aligned; <bb 2> : aligned_2 = __builtin_assume_aligned (array_1(D), 32); foo (aligned_2); return; } All we need is a range-op implementation for builtin_assume_aligned. The attached crude implementation does it. =========== BB 2 ============ <bb 2> : aligned_2 = __builtin_assume_aligned (array_1(D), 32); foo (aligned_2); return; aligned_2 : [prange] const float * [0, +INF] MASK 0xffffffff00000000 VALUE 0x0 That is, the bottom 32 bits are cleared. Andrew will have to comment on the uintptr_t idiom, because it gets expanded into an .ASSUME() function which I'm unfamiliar with. For this small function: void bar2 (const float *array) { [[assume((uintptr_t (array) & (32 - 1)) == 0)]]; foo (array); } evrp expands to: =========== BB 2 ============ Partial equiv (array.0_3 pe64 array_2(D)) <bb 2> : array.0_3 = (long unsigned int) array_2(D); _4 = array.0_3 & 31; _5 = _4 == 0; return _5; _4 : [irange] long unsigned int [0, 31] MASK 0x1f VALUE 0x0 I don't see any reason why we couldn't get that array.0_3 and array_2 are aligned to 32-bits. Maybe we don't set the value/mask pair for the bitwise_and::op1_range? The value/mask stuff is not very fleshed out, especially for the op1_range operators.
Created attachment 58202 [details] proof of concept implementing a range-op entry for builtin_assume_aligned Something like this (properly coded and enhanced) would work. The pointers_handled_p() cruft is temporary boilerplate to indicate that we only support a fold-range() of PRANGE = IRANGE op PRANGE for this operation. This is going away when the dust settles.
Just to clarify. prange as well as irange keep a value/mask pair where we can store alignment info, so every calculation (range-op) along the way can track this properly. Now, for pointers we loose this information across passes because of the union in SSA_NAME_RANGE_INFO (struct tree_ssa_name) which keeps the range info in an ptr_info_def, not a proper vrange_storage. It's in my long term TODO list to propose we properly track pointer ranges once prange comes live. Note that IPA keeps alignment info on the side, so part of this info is kept across passes. But I assume they're doing this, because originally ranges didn't have alignment (value/mask) info associated with them.
The master branch has been updated by Aldy Hernandez <aldyh@gcc.gnu.org>: https://gcc.gnu.org/g:c400b2100719d0a9e5989c63e0827b9e98919df3 commit r15-504-gc400b2100719d0a9e5989c63e0827b9e98919df3 Author: Aldy Hernandez <aldyh@redhat.com> Date: Tue May 14 16:21:50 2024 +0200 [prange] Default pointers_handled_p() to false. The pointers_handled_p() method is an internal range-op helper to help catch dispatch type mismatches for pointer operands. This is what caught the IPA mismatch in PR114985. This method is only a temporary measure to catch any incompatibilities in the current pointer range-op entries. This patch returns true for any *new* entries in the range-op table, as the current ones are already fleshed out. This keeps us from having to implement this boilerplate function for any new range-op entries. PR tree-optimization/114995 * range-op-ptr.cc (range_operator::pointers_handled_p): Default to true.
(In reply to Aldy Hernandez from comment #10) > Created attachment 58202 [details] > proof of concept implementing a range-op entry for builtin_assume_aligned > > Something like this (properly coded and enhanced) would work. > > The pointers_handled_p() cruft is temporary boilerplate to indicate that we > only support a fold-range() of PRANGE = IRANGE op PRANGE for this operation. > This is going away when the dust settles. FYI, I've removed the pointers_handled_p requirement for new range-op entries.