This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug inline-asm/39847] 16 symbolic register names generates error: more than 30 operands in 'asm'



------- Comment #7 from pinskia at gcc dot gnu dot org  2009-04-22 15:45 -------
(In reply to comment #6)
> Pinska: Actually, no. I started with the intrinsics and looked hard at what the
> code scheduler was doing before settling on rewriting this in inline assembly. 
> 
> The intrinsics have several problems that effect the code quality in this case.
> 
> 1) They don't issue a request from memory for many instructions, such as
> cvtps2pd. Doing oneliners for stuff like is feasible but even harder to
> understand and debug than pure assembly.  Gcc also seems to have a misguided
> sense for how many clocks cvtX2Y instructions take.

Are you using the correct -mtune= value for the processor you are tuning for? 
Because different processors have different clock cycles.  If you have an issue
with the optimizers, I rather see the bugs filed there rather you working
around it with inline-asm.  

> 
> 2) The combination of intrinsics, C, and assembly gcc was generating included a
> lot of extra instructions, promoting ints to longs, leas, etc. 

Int to Long, that is normal and a different issue and really you should have
filed this one.

> 
> 3) The optimizer tends to push prefetches to the end of the loop when it really
> needs to happen as early as possible. This particular bit of code *might*
> benefit from prefetching (it is not a very predictable access pattern) but at
> the end of the loop prefetches hurt more than they help.

file a bug.

> 
> 4) this code is right up against the edge of the x86_64 register set (all the
> xmm registers (for 8 channel resampling) and 7 integer registers) 

try 4.4.0 which was just released, it has a better register allocator.

> I can show you oprofiles of the gcc generated code, but the larger point
> remains that doing complex vectorized operations tends to use up a lot of
> registers and doing it well requires hand optimized assembly... and to do that
> well, it would be helpful to have as many named parameters available as in the
> register set.

No, GCC should be doing a better job with the intrinsics which is much better
than you doing it manually in the inline-asm.  Inline-asm should be used when
there are no intrinsics for the instruction or something which you really
cannot do using intrinsics.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39847


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]