This is the mail archive of the
java@gcc.gnu.org
mailing list for the Java project.
Re: Stack ILP issues (Was: Slow recursive functions
- From: Andrew Haley <aph at redhat dot com>
- To: "Mladen Adamovic" <adamm at etfbl dot net>
- Cc: java at gcc dot gnu dot org
- Date: Wed, 17 Aug 2005 10:05:14 +0100
- Subject: Re: Stack ILP issues (Was: Slow recursive functions
- References: <20050817080950.M41443@etfbl.net>
Mladen Adamovic writes:
>
> Andrew Haley wrote on Fri, 15 Apr 2005 11:43:39 +0100 :
>
> (the original thread was about bad performance in recursive functions -
> Akerman test ran slow)
>
> Andrew> We use the x86 system calling convention throughout gcj,
> Andrew> and this is slower than passing args in registers. There's
> Andrew> also the possibility that some JITs might be optimized for
> Andrew> this kind of benchmark.
>
> I guess this means that stack functions push, pop etc. don't exploit ILP.
It's not just that, it's that stack access explicitly touches memory,
so even push a; pop d has a side effect. The interesting question for
me is whether moving to a different ABI that passes args in registers
really would help. I'm just guessing here.
> I found that it might be true
> http://csdl2.computer.org/persagen/DLAbsToc.jsp?resourcePath=/dl/proceedings/&toc=comp/proceedings/dsd/2004/2203/00/2203toc.xml&DOI=10.1109/DSD.2004.1333267
>
> But real performance issue is that STACK instructions have AFA I remember 8%
> in SPEC2000 tests.
I don't quite understand your point here.
> I don't have book "Modern computer design" at the moment to check
> the real percentage.
>
> I think that for compilers the easiest way to compile expresssions like
> (expr1 * expr2) / expr3
> is extensivly using stack.
That's true, but gcj doesn't do that -- it converts everything into
SSA form, which should reduce data dependencies to the minimum that is
really needed.
> So, way to awoid using stack might be important performance issue.
> MOV is better idea because it can exploit ILP better.
> Somebody might check the stack performance in x86-64.
>
> Anyway, can somebody of developers say which techniques did they use to
> exploit ILP in GCJ?
Nor here. What exactly are you asking? Instruction Level Parallelism
is exploited in gcc several ways, but mostly by the scheduling pass.
> Also, maybe good idea will be to ask in gcc mailing list about
> stack ilp issues?
>
> Speed of gcj might be important because JVM have awfull performance
> in matrix multiplication and nested loops. Probably they don't do
> compiler techniques to exploit ILP for loops.
Probably not, no. gcc is getting better at this, but in gcj we still
don't hoist bounds checks out of loops so we still have difficulty
doing aggressive scheduling. Once we get bounds checks hoisted,
things will be much better.
> JVM language was done in 1995 so maybe they will have a lot
> problems with ILP and TLP in the future because in 1995 just few
> people think about that.
I don't see why -- a JIT translates its source code, which is
bytecode, into object code, which can do loop optimizations like any
other compiler.
> I'm new here, I'm graduate student a bit involved in ILP and
> compiler issues. If you think that I can help somehow in gcj
> development in these issues you can let me know.
I'm not going to deny a generous offer like that one! It all depends
on how deeply you want to get involved in the details of gcc.
Andrew.