This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

[Bug rtl-optimization/26244] [4.2 Regression] FAIL: gcc.c-torture/execute/builtin-bitops-1.c execution, -O3 -fomit-frame-pointer -funroll-loops

From: "dave at hiauly1 dot hia dot nrc dot ca" <gcc-bugzilla at gcc dot gnu dot org>
To: gcc-bugs at gcc dot gnu dot org
Date: 24 Jun 2006 23:08:25 -0000
Subject: [Bug rtl-optimization/26244] [4.2 Regression] FAIL: gcc.c-torture/execute/builtin-bitops-1.c execution, -O3 -fomit-frame-pointer -funroll-loops
References: <bug-26244-276@http.gcc.gnu.org/bugzilla/>
Reply-to: gcc-bugzilla at gcc dot gnu dot org


------- Comment #20 from dave at hiauly1 dot hia dot nrc dot ca  2006-06-24 23:08 -------
Subject: Re:  [4.2 Regression] FAIL: gcc.c-torture/execute/builtin-bitops-1.c
execution,  -O3 -fomit-frame-pointeRO

> The transformations in the invariant motion pass look correct to me, as well as
> the code immediately after it.  Maybe it is a latent bug in some of the later
> passes, but the code produced for the shift is looong and I was not able to
> find where things go wrong.

I totally agree ;(

I think things go wrong in this snippet of code:

0x00010760 <main+244>:  mtsar r20
0x00010764 <main+248>:  shrpw r0,r25,sar,r4
0x00010768 <main+252>:  ldo 1(r20),r21
0x0001076c <main+256>:  extrw,u r4,31,1,ret1
0x00010770 <main+260>:  bb,*< r21,1a,0x10850 <main+484>
0x00010774 <main+264>:  add,l r5,ret1,r20

The last instruction is one of many places where "count" is incremented.
r25 contains the most significant 32-bits of the long long that fails
(i.e., 1).  The mtsar and shrpw perform a lshiftrt operation.  However,
the sar register is special.  It only holds 5 or 6 bit shift amounts
depending on instruction.  In the case of shrpw, only 5 bits are used
(i.e, the leftmost bit of the sar register is ignored).  Thus, valid
shift amounts range between 0 and 31.

The first time this code is entered, r20 contains the value 32.  As
a result, r25 is simply copied to r4 (i.e., no shift occurs).  So,
"count" gets incremented.  This bit is also counted at the beginning
of the function due to extracting loop invariants and various other
optimizations:

0x0001069c <main+48>:   ldw 0(r7),r25
0x000106a0 <main+52>:   ldw 4(r7),r26
0x000106a4 <main+56>:   copy r25,r5
0x000106a8 <main+60>:   ldi 3,r19
0x000106ac <main+64>:   or r5,r26,r8
0x000106b0 <main+68>:   depw,z r5,30,31,r31
0x000106b4 <main+72>:   ldi 1,r20
0x000106b8 <main+76>:   cmpb,= r6,r20,0x10800 <main+404>
0x000106bc <main+80>:   extrw,u r8,31,1,r5

The long long value is loaded in the first two instructions.  The
leftmost 32 bits are copied to r5.  Don't ask me why the left and
right halves are or'ed and put in r8 because I don't know.  The last
instruction extracts the least significant bit from the or'ed value.
This is the initial value for "count".  This all seems very broken
to me and it doesn't happen in the non-inlined version of my_parityll.

I get the same assembly code with an x86 cross built today.

I looked at the documentation for lshiftrt, but it doesn't say
whether the shift amount range handled by the PA insn for variable
shifts is ok or not.  On the otherhand, there's no obvious reason
why we should need to support logical right shifts larger than 31
bits and I have to wonder why the generated code does this.

Dave


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26244

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]