This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Re: How to avoid de-optimization


On Sun, Aug 26, 2001 at 05:17:32PM +0200, Jan Hubicka wrote:
> > On Sun, Aug 26, 2001 at 10:13:21AM +0200, Jan Hubicka wrote:
> > > Hi,
> > > Actually the MUL->arithmetic converison is controlled by costs information
> > > near the beggining of i386.c file and is CPU model specific.
> > > For instance K6 cost is 3, while cost of simple operation is 1. This means
> > > that gcc will replace mul by one, or two simple operations.
> > > 
> > > I Athlon case it is set to 5, pentiumII 4 and Pentium4 30. Always representing
> > > the relative latency of simple arithmetic compared to imul instruction.
> > > 
> > > In what CPU are you experiencing slowdown?
> > > 
> > Athlon.
> > 
> > IMUL takes 2 clocks, shift operations/adds something around 0.6...0.7
> > clocks.
> I've jsut cross checked the Athlon Optimization Manual:
> 

> Use Alternative Code When Multiplying by a Constant
> 
> 			A 32-bit integer multiply by a constant has a latency
> of five cycles. Therefore, use alternative code when multiplying by certain
> constants. In addition, because there is just one multiply unit, the
> replacement code may provide better throughput.  The following code samples are
> designed such that the original source also receives the final result. Other
> sequences are possible if the result is in a different register. Adds have been
> favored over shifts to keep code size small. Generally, there is a fast
> replacement if the constant has very few 1 bits in binary.  More constants are
> found in the file multiply_by_constants.txt located in the "opt_utilities"
> directory of the documentation CDROM.
> 
> So the latency is 5 and the gcc optimization is one of directly recommended
> by the optimization manual.
>
instruction                        throughput            latency

imul0x03                        :  2.17011 clocks        5.00795 clocks
imul0x7F                        :  2.17012 clocks        5.00795 clocks
imul0x7FFFFFFF                  :  2.17011 clocks        5.00794 clocks
imulvar                         :  2.17010 clocks        4.00633 clocks
imul64                          :  6.00947 clocks        6.00947 clocks
fast2                           :  0.50079 clocks        1.00158 clocks
fast8                           :  0.66772 clocks        1.00157 clocks
fast0x80000000                  :  0.83464 clocks        1.00158 clocks
fast5                           :  0.66771 clocks        2.00319 clocks
fast10                          :  1.14406 clocks        3.00475 clocks
fast80                          :  1.16851 clocks        3.00476 clocks
fast25                          :  1.16853 clocks        4.00634 clocks
fast50                          :  1.39109 clocks        5.00844 clocks
fast100                         :  1.58585 clocks        5.00794 clocks
fast125                         :  1.58585 clocks        6.00957 clocks
fast250                         :  2.00319 clocks        7.01115 clocks
fast1000                        :  2.00801 clocks        7.01115 clocks
fast10000                       :  2.59737 clocks        9.01427 clocks


(CPU: Athlon 700)

See code at the appendix.

example2.zip


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]