Bug List: (This bug is not in your last search results)   Show last search results      Search page      Enter new bug
Bug#: 39250
Product:  
Component:  
Status: ASSIGNED
Resolution:
Assigned To: aesok@gcc.gnu.org
Host:
Reported against  
Priority:  
Severity:  
Target Milestone:  
 
 
Target:
Reporter: Michael Hennebry <hennebry@web.cs.ndsu.nodak.edu>
Add CC:
CC:
Remove selected CCs
Build:
URL:
Summary:
Keywords:
Known to work:
Known to fail:

Attachment Description Type Created Size Actions
Create a New Attachment (proposed patch, testcase, etc.) View All

Bug 39250 depends on: 36467 Show dependency tree
Show dependency graph
Bug 39250 blocks:

Additional Comments:




Mark bug as waiting for feedback
Mark bug as suspended




View Bug Activity   |   Format For Printing   |   Clone This Bug


Description:   Last confirmed: 2009-02-19 18:37 Opened: 2009-02-19 17:51
Multiplying an unsigned char by 64U produces bigger slower code than necessary.

avr-gcc (WinAVR 20081205) 4.3.2
Copyright (C) 2008 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Windows XP

avr-gcc -c -mmcu=atmega168 -save-temps -Wall -std=gnu99 -Os ../64.c
No terminal output.

64.i:
# 1 "../64.c"
# 1 "<built-in>"
# 1 "<command-line>"
# 1 "../64.c"
unsigned mult(unsigned char arg)
{
    return arg*64U;
}

compiled into this:
Code:
    mov r18,r24
    ldi r19,lo8(0)
    ldi r24,6
1:  lsl r18
    rol r19
    dec r24
    brne 1b
    movw r24,r18
    ret
Not this:
Code:
    mov r19,r24
    ldi r18,0
    lsr r19
    ror r18
    lsr r19
    ror r18
    movw r24,r18
    ret
or this
Code:
    mov r25,r24
    ldi r24,0
    lsr r25
    ror r24
    lsr r25
    ror r24
    ret 

Each example is faster than the previous.
If R0 and R1 had been deemed available,
using MUL would have been even faster,
but MUL doesn't get used even in that case.

------- Comment #1 From aesok@gcc.gnu.org 2009-02-23 19:10 -------
Hi.

The GCC always use a shift for optimizing multiply by power of 2 constant.

expr.c:expand_expr_real_1:8680
....
      /* Check for a multiplication with matching signedness.  */
      else if (TREE_CODE (TREE_OPERAND (exp, 0)) == NOP_EXPR
          && TREE_CODE (type) == INTEGER_TYPE
          && (TYPE_PRECISION (TREE_TYPE (TREE_OPERAND (TREE_OPERAND (exp, 0),
0)))
              < TYPE_PRECISION (TREE_TYPE (TREE_OPERAND (exp, 0))))
          && ((TREE_CODE (TREE_OPERAND (exp, 1)) == INTEGER_CST
               && int_fits_type_p (TREE_OPERAND (exp, 1),
                                   TREE_TYPE (TREE_OPERAND (TREE_OPERAND (exp,
0), 0)))
               /* Don't use a widening multiply if a shift will do.  */
               && ((GET_MODE_BITSIZE (TYPE_MODE (TREE_TYPE (TREE_OPERAND (exp,
1))))
                    > HOST_BITS_PER_WIDE_INT)
                   || exact_log2 (TREE_INT_CST_LOW (TREE_OPERAND (exp, 1))) <
0))

expmed.c:expand_mult
...
      if (coeff != 0)
        {
          /* Special case powers of two.  */
          if (EXACT_POWER_OF_2_OR_ZERO_P (coeff))
            return expand_shift (LSHIFT_EXPR, mode, op0,
                                 build_int_cst (NULL_TREE, floor_log2 (coeff)),
                                 target, unsignedp);


For the AVR target for multiply by 2 with using  a shift give better code,
but for multiply by 4,8, ... using a shift is bad and for code size and for
speed. 

I think this optimization should not be hard coded, but should be chosen
based on the insn cost data. Perhaps there are other targets, which is better
to use multiplication rather than a shift.

Anatoly.

Bug List: (This bug is not in your last search results)   Show last search results      Search page      Enter new bug