This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug target/39942] Nonoptimal code - leaveq; xchg %ax,%ax; retq
- From: "vvv at ru dot ru" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: 14 May 2009 09:01:09 -0000
- Subject: [Bug target/39942] Nonoptimal code - leaveq; xchg %ax,%ax; retq
- References: <bug-39942-17483@http.gcc.gnu.org/bugzilla/>
- Reply-to: gcc-bugzilla at gcc dot gnu dot org
------- Comment #30 from vvv at ru dot ru 2009-05-14 09:01 -------
Created an attachment (id=17863)
--> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=17863&action=view)
Testing tool.
Here is results of my testing.
Code:
align 128
test_cikl:
rept 14 ; 14 if SH=0, 15 if SH=1, 16 if SH=2
{
nop
}
cmp al,0 ; 2 bytes
jz $+10h+NOPS ; 2 bytes offset=xxxx0
cmp al,1 ; 2 bytes offset=xxxx2
jz $+0Ch+NOPS ; 2 bytes offset=xxxx4
cmp al,2 ; 2 bytes offset=xxxx6
jz $+08h+NOPS ; 2 bytes offset=xxxx8
cmp al,3 ; 2 bytes offset=xxxxA
match =1, NOPS
{
nop
}
match =2, NOPS
{
xchg eax,eax ; 2-bytes NOP
}
jz $+04h ; 2 bytes offset=xxxxC
ja $+02h ; 2 bytes offset=xxxxE
mov eax,ecx
and eax,7h
loop test_cikl
This code tested on Core2,Xeon and P4 CPU. Results in RDTSC ticks.
; Core 2 Duo
; NOPS/tick/Max NOPS/tick/Max NOPS/tick/Max
; SH=0 0/571/729 1/306/594 2/315/630
; SH=1 0/338/612 1/338/648 2/339/648
; SH=2 0/339/666 1/339/675 2/333/693
; Xeon 3110
; NOPS/tick/Max NOPS/tick/Max NOPS/tick/Max
; SH=0 0/586/693 1/310/675 2/310/675
; SH=1 0/333/657 1/330/648 2/464/630
; SH=2 0/333/657 1/470/594 2/474/603
; P4
; NOPS/tick/Max NOPS/tick/Max NOPS/tick/Max
; SH=0 0/1027/1317 1/1094/1258 2/1028/1207
; SH=1 0/1151/1377 1/1068/1352 2/902/1275
; SH=2 0/1124/1275 1/1148/1335 2/979/1139
Conclusion:
1. Core2 and Xeon - similar results. P4 - something strange.
For Core2 & Xeon padding very effective. Code with padding almoust 2 times
faster. No sence for P4?
2. My previous sentence
VVV> 1. AMD limitation for 16-bytes page (memory range XXX0 - XXXF),but
VVV> Intel limitation for 16-bytes chunk (memory range XXXX - XXXX+10h)
is wrong. At leat for Core2 & Xeon. For this CPU "16-bytes chunk" means
memory range XXX0 - XXXF.
Unfortunately, I can't test AMD.
PS. My testing tool in attachmen. It start under MSDOS, switch to 32-bit mode,
switch to 64-bit mode and measure rdtsc ticks for test code.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39942