In the straightforward code: ``` unsigned gcd(unsigned a, unsigned b) { if (b == 0) return a; return gcd(b, a % b); } ``` GCC for x86 at -O2 (or -O3) generates the following assembly, which has a number of redundant register moves (marked with ?): ```asm gcd(unsigned int, unsigned int): mov eax, edi ; eax (return value) = a test esi, esi ; is b == 0? jne .L2 ; if b != 0; goto starting point L2 jmp .L10 ; if b = 0; goto special case .L11: ; loop top mov esi, edx ; set b to be a % b from the previous iteration .L2: xor edx, edx ; set up for divide div esi ; eax = edx:eax (0:a) / esi (b) ; edx = edx:eax (0:a) % esi (b) mov eax, esi ; eax = b test edx, edx ; was a % b zero? jne .L11 ; if not, loop around mov eax, esi ; return esi (? eax is already esi) ret ; returns a .L10: ; This seems daft; when we get here, eax already has ; the right value, so the compiler has overcomplicated things. mov esi, edi ; (?) esi = a mov eax, esi ; (?) eax (return value) = esi ret ; returns a ``` Nothing obvious (to me, anyway) in the optimisation report. Clang 21 produces much tighter code (unannotated here): ``` gcd(unsigned int, unsigned int): mov eax, edi test esi, esi je .LBB0_4 mov edx, esi .LBB0_2: mov ecx, edx xor edx, edx div ecx mov eax, ecx test edx, edx jne .LBB0_2 mov eax, ecx .LBB0_4: ret ``` CE link: https://godbolt.org/z/34jfKrW57
99% sure this is a dup of bug 106688. The number of extra moves is fixed with -fno-tree-coalesce-vars even. Basically out of ssa sometimes with coalescing turned on will chose the wrong coalescing and that causes extra moves for returns.
Related to the other bug I am working on.