$ cat test.c int smmul(int a, int b) { return ((long long)a * b) >> 32; } $ arm-none-linux-gnueabi-gcc -O2 -S -mcpu=cortex-a8 test.c $ cat test.s .cpu cortex-a8 .fpu softvfp .eabi_attribute 20, 1 .eabi_attribute 21, 1 .eabi_attribute 23, 3 .eabi_attribute 24, 1 .eabi_attribute 25, 1 .eabi_attribute 26, 2 .eabi_attribute 30, 2 .eabi_attribute 18, 4 .file "test.c" .text .align 2 .global smmul .type smmul, %function smmul: @ args = 0, pretend = 0, frame = 0 @ frame_needed = 0, uses_anonymous_args = 0 @ link register save eliminated. smull r0, r1, r0, r1 mov r0, r1 bx lr .size smmul, .-smmul .ident "GCC: (GNU) 4.7.0 20110624 (experimental)" .section .note.GNU-stack,"",%progbits
And clang 2.9 has no problems optimizing this code: $ cat test.c int smmul(int a, int b) { return ((long long)a * b) >> 32; } $ clang -ccc-host-triple arm-none-linux -O2 -mcpu=cortex-a8 -S test.c $ cat test.s .syntax unified .cpu cortex-a8 .eabi_attribute 6, 10 .eabi_attribute 7, 65 .eabi_attribute 8, 1 .eabi_attribute 9, 2 .fpu neon .eabi_attribute 10, 3 .eabi_attribute 12, 1 .eabi_attribute 20, 1 .eabi_attribute 21, 1 .eabi_attribute 23, 3 .eabi_attribute 24, 1 .eabi_attribute 25, 1 .file "test.c" .text .globl smmul .align 2 .type smmul,%function smmul: smmul r0, r1, r0 bx lr .Ltmp0: .size smmul, .Ltmp0-smmul
Confirmed. Also need patterns for the accumulate and subtract variants, plus rounding variants.
A patch for the smmul, smmla and smmls instructions is proposed at: https://gcc.gnu.org/ml/gcc-patches/2015-11/msg00686.html Adding patterns for the rounding variants seems to expose some LRA ICE that I'll analyse and report separate
I'm clearly not working this one...
So... what happened to this patch? Why was it never applied?