54816 – [avr] shift is better than widening mul

Bug 54816 - [avr] shift is better than widening mul

Summary: [avr] shift is better than widening mul

Status:	RESOLVED FIXED

Alias:	None

Product:	gcc
Classification:	Unclassified
Component:	target (show other bugs)
Version:	4.8.0

Importance:	P3 normal
Target Milestone:	13.0
Assignee:	Not yet assigned to anyone

URL:
Keywords:	missed-optimization

Depends on:
Blocks:

Reported:	2012-10-04 18:27 UTC by Georg-Johann Lay
Modified:	2023-04-22 20:02 UTC (History)
CC List:	3 users (show)

See Also:
Host:
Target:	avr
Build:
Known to work:	13.0, 4.7.2
Known to fail:	12.2.1
Last reconfirmed:

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Georg-Johann Lay 2012-10-04 18:27:57 UTC

The following C test case 

int wmul (char a, char b)
{
    return a * (char) (b << 3);
}

$ avr-gcc wmul.c -S -Os -mmcu=atmega8 -dp

produces with current avr-gcc:


wmul:
	ldi r25,lo8(8)	 ;  25	movqi_insn/2	[length = 1]
	muls r22,r25	 ;  26	mulqihi3	[length = 3]
	movw r22,r0
	clr __zero_reg__
	muls r24,r22	 ;  17	mulqihi3	[length = 3]
	movw r24,r0
	clr __zero_reg__
	ret	 ;  29	return	[length = 1]
	.ident	"GCC: (GNU) 4.8.0 20121004 (experimental)"


avr-gcc-4.7 was smarter with its code:

wmul:
	lsl r22	 ;  10	*ashlqi3/5	[length = 3]
	lsl r22
	lsl r22
	muls r24,r22	 ;  12	mulqihi3	[length = 3]
	movw r22,r0
	clr __zero_reg__
	movw r24,r22	 ;  31	*movhi/1	[length = 1]
	ret	 ;  30	return	[length = 1]
	.ident	"GCC: (GNU) 4.7.2"


The 4.7 code is faster, smaller and has smaller register pressure.

Comment 1 Wilhelm M 2023-04-11 05:22:57 UTC

The following code has the same problem:

#include <avr/io.h>
#include <stdint.h>

uint16_t b;
uint8_t a;

template<typename A, typename B>
B Mul(const A a, const B b) {
    static constexpr uint8_t shift = (sizeof(B) - sizeof(A)) * 8;
    return static_cast<A>(b >> shift) * a ;
}

int main() {
    return Mul(a, b);
}

with 4.6.4. it produces:

main:
        lds r24,a
        lds r25,b+1
        mul r25,r24
        movw r24,r0
        clr r1
        ret

with actual 12.2 it produces missing optimization:

main:
        lds r24,b+1
        ldi r25,0
        lds r18,a
        movw r20,r24
        mul r18,r20
        movw r24,r0
        mul r18,r21
        add r25,r0
        clr __zero_reg__
ret

Interistingly the follwing code produces optimal code also with 12.2:

template<typename A, typename B>
B MulX(const A a, const B b) {
    static const uint8_t shift = (sizeof(B) - sizeof(A)) * 8;
    return static_cast<A>((b >> shift) + 1) * a ;
}

Comment 2 Roger Sayle 2023-04-15 13:27:03 UTC

The original problem looks to be fixed on mainline.  Can you confirm this Wilhelm?  If so we can close this PR.

With -Os -mmcu=atmega8, we currently generate (the desired):
wmul:   lsl r22
        lsl r22
        lsl r22
        muls r22,r24
        movw r24,r0
        clr __zero_reg__
        ret

Comment 3 Wilhelm M 2023-04-15 15:48:10 UTC

(In reply to Roger Sayle from comment #2)
> The original problem looks to be fixed on mainline.  Can you confirm this
> Wilhelm?  If so we can close this PR.
> 
> With -Os -mmcu=atmega8, we currently generate (the desired):
> wmul:   lsl r22
>         lsl r22
>         lsl r22
>         muls r22,r24
>         movw r24,r0
>         clr __zero_reg__
>         ret

Yes, this seems to be fixed in mainline.

Comment 4 GCC Commits 2023-04-16 12:04:41 UTC

The master branch has been updated by Roger Sayle <sayle@gcc.gnu.org>:

https://gcc.gnu.org/g:f006d1a5a1e136be29c78b96c8742ebd3710f4d0

commit r13-7197-gf006d1a5a1e136be29c78b96c8742ebd3710f4d0
Author: Roger Sayle <roger@nextmovesoftware.com>
Date:   Sun Apr 16 13:03:10 2023 +0100

    [Committed] New test case gcc.target/avr/pr54816.c
    
    PR target/54816 is now fixed on mainline.  This adds a test case to
    check that it doesn't regress in future.  Tested with a cross compiler
    to avr-elf.  Committed as obvious.
    
    2023-04-16  Roger Sayle  <roger@nextmovesoftware.com>
    
    gcc/testsuite/ChangeLog
            PR target/54816
            * gcc.target/avr/pr54816.c: New test case.

Comment 5 Roger Sayle 2023-04-16 12:11:11 UTC

This is now fixed on mainline [but was present in GCC 12.2], and a new test case added to ensure this stays fixed.

Comment 6 Georg-Johann Lay 2023-04-21 20:09:11 UTC

(In reply to Roger Sayle from comment #5)
> This is now fixed on mainline [but was present in GCC 12.2], and a new test
> case added to ensure this stays fixed.

Hi Roger,

I am having a problem with your new test case in gcc.target/avr/pr54816.c :

When we run the testsuite for any device other than ATmega8, it will fail due to the explicit -mmcu=atmega8 in dg-options:

xgcc: error: specified option '-mmcu' more than once
compiler exited with status 1
FAIL: gcc.target/avr/pr54816.c (test for excess errors)

Usually, one would run the testsuite several times for a variety of different devices like ATmega128, ATtiny40, etc. so that explicit -mmcu in dg-options is to be avoided. (The -mmcu will be provided by the board description file like atmega128-sim.exp).

If a test requires a specific device, then place it at gcc.target/avr/mmcu/. The avr-mmcu.exp will care to remove unwanted -mmcu to that testcases can set -mmcu as they wish.

In your case, as you scan assembly for "muls" instruction, you need some -mmcu that supports MULS (like ATmega8).

Hence, could you move pr54816.c to the gcc.target/avr/mmcu subfolder?

Alternatively, you can extend lib/target-supports.exp by a new feature like  check_effective_target_avr_mul.  A new function could be similar to already existing check_effective_target_avr_tiny, but check for built-in macro __AVR_HAVE_MUL__.  Then use the new functon as a filter like in

/* { dg-do compile { target { avr_mul } } } */

Comment 7 GCC Commits 2023-04-22 20:02:57 UTC

The master branch has been updated by Roger Sayle <sayle@gcc.gnu.org>:

https://gcc.gnu.org/g:911db256258004b2eec9a0ca3fa47f9bcb5c5856

commit r14-168-g911db256258004b2eec9a0ca3fa47f9bcb5c5856
Author: Roger Sayle <roger@nextmovesoftware.com>
Date:   Sat Apr 22 20:57:28 2023 +0100

    [Committed] Move new test case to gcc.target/avr/mmcu/pr54816.c
    
    AVR test cases that specify a specific -mmcu option need to be placed
    in the gcc.target/avr/mmcu subdirectory.  Moved thusly.
    
    2023-04-22  Roger Sayle  <roger@nextmovesoftware.com>
    
    gcc/testsuite/ChangeLog
            PR target/54816
            * gcc.target/avr/pr54816.c: Move to...
            * gcc.target/avr/mmcu/pr54816.c: ... here.