[PATCH v3] [aarch64] Correct the maximum shift amount for shifted operands.

Tue Nov 27 20:54:00 GMT 2018

> On 27.11.2018, at 14:04, Sam Tebbs <Sam.Tebbs@arm.com> wrote:
> 
> 
> On 11/26/18 7:50 PM, Christoph Muellner wrote:
>> The aarch64 ISA specification allows a left shift amount to be applied
>> after extension in the range of 0 to 4 (encoded in the imm3 field).
>> 
>> This is true for at least the following instructions:
>> 
>>  * ADD (extend register)
>>  * ADDS (extended register)
>>  * SUB (extended register)
>> 
>> The result of this patch can be seen, when compiling the following code:
>> 
>> uint64_t myadd(uint64_t a, uint64_t b)
>> {
>>   return a+(((uint8_t)b)<<4);
>> }
>> 
>> Without the patch the following sequence will be generated:
>> 
>> 0000000000000000 <myadd>:
>>    0:d37c1c21 ubfizx1, x1, #4, #8
>>    4:8b000020 addx0, x1, x0
>>    8:d65f03c0 ret
>> 
>> With the patch the ubfiz will be merged into the add instruction:
>> 
>> 0000000000000000 <myadd>:
>>    0:8b211000 addx0, x0, w1, uxtb #4
>>    4:d65f03c0 ret
> 
> Hi Christoph,
> 
> Thanks for this, I'm not a maintainer but this looks good to me. A good
> point to mention would be if it affects shifts smaller than 4 bits,
> since I don't see any testing for that in the files you have changed.
> 

Hi Sam,

shifts smaller than 4 bits are already tested in gcc/testsuite/gcc.target/aarch64/extend.c.
E.g. subdi_sxtw() does so for shift-by-3.

All existing test cases where executed and did not show any regressions.
In other words: shifts smaller than 4 bits are not affected.

>> Tested with "make check" and no regressions found.
> Could you perhaps elaborate on your testing? So what triplets you
> tested, if you bootstrapped successfully etc.

For the "make check" regression test, I compiled native on an AArch64 machine
running CentOS 7.5. Configure flags were "--enable-bootstrap".
I did so with and without the patch and compared the results for differences.
I think that's the essential requirement to get an OK for this patch.

Besides that we've had this change in our aarch64-linux-gnu toolchain since 2014.
This toolchain has been used to compile a wide range of OSS projects, benchmarks
as well as proprietary code over the years.

For example we ran the SPEC CPU2000, CPU2006, CPU2017 benchmarks,
built Linux kernels (at least from 4.4 to 4.19), glibc (from 2.17 to 2.28), binutils
(ancient times till 2.30), TCmalloc (2.6 and 2.7), jemalloc (4 and 5), buildroot (2017, 2018),
U-Boot (2015-2018), Tianocore, HHVM, OpenSSL, MySQL,...

Our work involved interwork with existing libraries (e.g. HHVM and its dependencies
to 100 external shared libraries) on several operating systems (Ubuntu, Fedora,
CentOS, OpenSuse, Oracle Linux, Debian, ...).

Besides LP64 we also used (and still use) the ILP32 ABI.

The binaries created by the compilers using that change were running
at least on APM Xgene processors, Ampere Computer's eMAG processors,
as well as on ARM's Cortex-A53, Cortex-A72 cores.

If you want me to test something specific, then just let me know.

Thanks,
Christoph