Bug 39250 - unsigned char times 64U produces long slow loop
Summary: unsigned char times 64U produces long slow loop
Status: RESOLVED DUPLICATE of bug 49687
Alias: None
Product: gcc
Classification: Unclassified
Component: target (show other bugs)
Version: 4.3.2
: P3 normal
Target Milestone: 4.7.0
Assignee: aesok
URL:
Keywords: missed-optimization
Depends on:
Blocks:
 
Reported: 2009-02-19 17:51 UTC by Michael Hennebry
Modified: 2011-08-15 17:42 UTC (History)
4 users (show)

See Also:
Host:
Target: avr-*-*
Build:
Known to work:
Known to fail:
Last reconfirmed: 2009-02-19 18:37:06


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Michael Hennebry 2009-02-19 17:51:18 UTC
Multiplying an unsigned char by 64U produces bigger slower code than necessary.

avr-gcc (WinAVR 20081205) 4.3.2
Copyright (C) 2008 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Windows XP

avr-gcc -c -mmcu=atmega168 -save-temps -Wall -std=gnu99 -Os ../64.c
No terminal output.

64.i:
# 1 "../64.c"
# 1 "<built-in>"
# 1 "<command-line>"
# 1 "../64.c"
unsigned mult(unsigned char arg)
{
    return arg*64U;
}

compiled into this:
Code:
    mov r18,r24
    ldi r19,lo8(0)
    ldi r24,6
1:  lsl r18
    rol r19
    dec r24
    brne 1b
    movw r24,r18
    ret
Not this:
Code:
    mov r19,r24
    ldi r18,0
    lsr r19
    ror r18
    lsr r19
    ror r18
    movw r24,r18
    ret
or this
Code:
    mov r25,r24
    ldi r24,0
    lsr r25
    ror r24
    lsr r25
    ror r24
    ret 

Each example is faster than the previous.
If R0 and R1 had been deemed available,
using MUL would have been even faster,
but MUL doesn't get used even in that case.
Comment 1 aesok 2009-02-23 19:10:27 UTC
Hi.

The GCC always use a shift for optimizing multiply by power of 2 constant.

expr.c:expand_expr_real_1:8680
....
      /* Check for a multiplication with matching signedness.  */
      else if (TREE_CODE (TREE_OPERAND (exp, 0)) == NOP_EXPR
	  && TREE_CODE (type) == INTEGER_TYPE
	  && (TYPE_PRECISION (TREE_TYPE (TREE_OPERAND (TREE_OPERAND (exp, 0), 0)))
	      < TYPE_PRECISION (TREE_TYPE (TREE_OPERAND (exp, 0))))
	  && ((TREE_CODE (TREE_OPERAND (exp, 1)) == INTEGER_CST
	       && int_fits_type_p (TREE_OPERAND (exp, 1),
				   TREE_TYPE (TREE_OPERAND (TREE_OPERAND (exp, 0), 0)))
	       /* Don't use a widening multiply if a shift will do.  */
	       && ((GET_MODE_BITSIZE (TYPE_MODE (TREE_TYPE (TREE_OPERAND (exp, 1))))
		    > HOST_BITS_PER_WIDE_INT)
		   || exact_log2 (TREE_INT_CST_LOW (TREE_OPERAND (exp, 1))) < 0))

expmed.c:expand_mult
...
      if (coeff != 0)
	{
	  /* Special case powers of two.  */
	  if (EXACT_POWER_OF_2_OR_ZERO_P (coeff))
	    return expand_shift (LSHIFT_EXPR, mode, op0,
				 build_int_cst (NULL_TREE, floor_log2 (coeff)),
				 target, unsignedp);


For the AVR target for multiply by 2 with using  a shift give better code,
but for multiply by 4,8, ... using a shift is bad and for code size and for
speed. 

I think this optimization should not be hard coded, but should be chosen
based on the insn cost data. Perhaps there are other targets, which is better
to use multiplication rather than a shift.

Anatoly.
Comment 2 Georg-Johann Lay 2011-08-11 21:43:47 UTC
This is solved in 4.7

*** This bug has been marked as a duplicate of bug 49687 ***
Comment 3 Michael Hennebry 2011-08-15 17:34:46 UTC
(In reply to comment #2)
> This is solved in 4.7
> 
> *** This bug has been marked as a duplicate of bug 49687 ***

49687 is still unassigned.
Did you mean to be solved in 4.7?
Comment 4 Michael Hennebry 2011-08-15 17:42:04 UTC
(In reply to comment #3)
> (In reply to comment #2)
> > This is solved in 4.7
> > 
> > *** This bug has been marked as a duplicate of bug 49687 ***
> 
> 49687 is still unassigned.
> Did you mean to be solved in 4.7?

Oops.
I should learn to scroll.