This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug inline-asm/39847] 16 symbolic register names generates error: more than 30 operands in 'asm'
- From: "pinskia at gcc dot gnu dot org" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: 22 Apr 2009 15:45:44 -0000
- Subject: [Bug inline-asm/39847] 16 symbolic register names generates error: more than 30 operands in 'asm'
- References: <bug-39847-17610@http.gcc.gnu.org/bugzilla/>
- Reply-to: gcc-bugzilla at gcc dot gnu dot org
------- Comment #7 from pinskia at gcc dot gnu dot org 2009-04-22 15:45 -------
(In reply to comment #6)
> Pinska: Actually, no. I started with the intrinsics and looked hard at what the
> code scheduler was doing before settling on rewriting this in inline assembly.
>
> The intrinsics have several problems that effect the code quality in this case.
>
> 1) They don't issue a request from memory for many instructions, such as
> cvtps2pd. Doing oneliners for stuff like is feasible but even harder to
> understand and debug than pure assembly. Gcc also seems to have a misguided
> sense for how many clocks cvtX2Y instructions take.
Are you using the correct -mtune= value for the processor you are tuning for?
Because different processors have different clock cycles. If you have an issue
with the optimizers, I rather see the bugs filed there rather you working
around it with inline-asm.
>
> 2) The combination of intrinsics, C, and assembly gcc was generating included a
> lot of extra instructions, promoting ints to longs, leas, etc.
Int to Long, that is normal and a different issue and really you should have
filed this one.
>
> 3) The optimizer tends to push prefetches to the end of the loop when it really
> needs to happen as early as possible. This particular bit of code *might*
> benefit from prefetching (it is not a very predictable access pattern) but at
> the end of the loop prefetches hurt more than they help.
file a bug.
>
> 4) this code is right up against the edge of the x86_64 register set (all the
> xmm registers (for 8 channel resampling) and 7 integer registers)
try 4.4.0 which was just released, it has a better register allocator.
> I can show you oprofiles of the gcc generated code, but the larger point
> remains that doing complex vectorized operations tends to use up a lot of
> registers and doing it well requires hand optimized assembly... and to do that
> well, it would be helpful to have as many named parameters available as in the
> register set.
No, GCC should be doing a better job with the intrinsics which is much better
than you doing it manually in the inline-asm. Inline-asm should be used when
there are no intrinsics for the instruction or something which you really
cannot do using intrinsics.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39847