This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [lkcl@lkcl.net: has gcc been reworked so that code/templates can be "outsourced" e.g. to perl yet?]

From: Luke Kenneth Casson Leighton <lkcl at lkcl dot net>
To: Mike Stump <mrs at apple dot com>
Cc: gcc at gcc dot gnu dot org
Date: Mon, 16 May 2005 00:11:16 +0100
Subject: Re: [lkcl@lkcl.net: has gcc been reworked so that code/templates can be "outsourced" e.g. to perl yet?]
References: <20050515201434.GB9949@lkcl.net> <F3CE5F19-C584-11D9-B874-003065BDF310@apple.com>

On Sun, May 15, 2005 at 02:04:40PM -0700, Mike Stump wrote:
> On Sunday, May 15, 2005, at 01:14  PM, Luke Kenneth Casson Leighton 
> wrote:
> > i think you may find that a less stringent goal - of doing
> > "outsourcing" - may result in an intermediate useable compromise
> > that would keep most people happy or at least a whole damn lot
> > more happy than they are at the moment.
> 
> My take, the numbers of people that face 50+ wide things is very small. 

 *grin* :)

 hey, the ASP is _verry_ cool - the fact that you can do 3-bit
 video interpolation or 5-bit averaging etc. makes it just
 _stupidly_ fast.  in that specialised area.

 the only other people that i know of who give anything other
 than 8-bit processors _any_ serious massively parallel
 consideration these days are elixent, with their 4-bit
 ALUs times god-knows-what.  and last thing i heard, they're
 programming it in a modifed version of VLSI *gibber*.

> The compiler is _part_ of the architecture costs

 i have to be honest - i think that's one of the things that
 aspex does _not_ understand, they've been working with their
 architecture for so long in assembly code, they _just_ don't
 understand why everyone else in the world finds it so awkward,
 and because it's a proprietary tool-chain, they can't take
 that leap-of-faith to make it free software, for fear of
 giving away too many architectural details.

> > and the key reason why all these things are a pain in the
> > arse is because you cannot "abstract" it out to c++
> 
> I've not familiar with that proof.  

 ah - it's not a proof, i apologise for giving the impression
 that it was.

 it was an empirical observation, based solely on my own
 experience.

> I've seen very many odd things done 
> with C++ and it seems to me that it would be possible to abstract it 
> out.

 yes, in the form of the valarray STL: most definitely.

 see tail-end of this message for details.

> > - you  _have_ to go to assembler, you _have_ to make use of macros
> > (which don't mix with c++ templates).
> 
> ?  I don't know that.  I'd prefer you went to comp.lang.c++ and 
> challenged them with I bet this can't be done, and had someone sketch 
> out how to do it, or do it.  Certainly we designed it (C++) in part 
> with the idea that one could have exotic hardware behind some of the 
> classes (valarray) and get the speed.

 [btw i'm re-reading this section _after_ what i wrote below it]

 *click* - so you .... you... ooooooo :)

 holy cow.

 you looked at valarray, and went "how could this be automatically
 speeded up by gcc, if gcc had access to a hardware vector processing
 unit"?

 i'm... genuinely impressed.

> See the altivec 

 _altivec_ - _that's_ what i was racking my brains to think of the other
 well-known vector processor thing.

> intrinsics or the mmx/sse/sse2/sse3 intrinsics for a 
> sketch of an existance proof on why one never would need assembler to 
> do anything.

 sorry - assembler, wrong word.  well, almost the wrong word.

 i was referring to the concept of using inline assembly -
 "asm { ....  }" rather than _actual_ assembly-level hard-wired
 instructions.

 i meant "write your own language that is specially interspersed
 with c code in a manner where you would typically need to process
 the stuff with a different compiler tool".

 this isn't quite as mad as it sounds.

 having your own in-line assembly instructions, esp. when you have
 support for c++ templates, means that you can write a c++ class that
 "wraps" an absolutely minimal amount of your in-line assembly code...

 ... basically, instead of doing what you envisage doing _inside_ gcc,
 it is done _outside_, using c++ and using a well-known standard
 template library.

 with the advantage that you _don't_ end up with the expertise
 hard-coded into the gcc compiler.

 [and the disadvantage that it will take effing ages to compile].

> >well, i do, but it's many _many_ steps removed from becoming a reality 
> >- funding,
> 
> No, funding is a _primary_ concern.  Either, people that want the 
> architecture to succeed pay for it, convince grad students that it 
> would be a good research area, or, well, you write in assembler.  This 
> list is a poor place, when it comes to funding issues.

 there's a project i've been consulted on: it's a parallel processor
 architecture - funding is being sought elsewhere, don't worry :)

 i'm just "Mr Techie" :)

> >  ... you can tell that i really loved the processor design and the
> > opportunity to work with something that radical, though.
> 
> There have been neat architectures that I liked, but, that failed the 
> market reality testcase, in the end, it isn't about being neat, it is 
> about price/performance and giving a customer what they want.

 yes.

 one of the things that was great about the ASP is that compared to the
 processors _at the time_ it knocked the stuffing out of the available
 hardware by a factor of ... 20:1 in performance.

 ... of course, 18 months later, that was reduced to 5:1, which when
 you factor in development lead time ... forget it.

 ... and then, the _next_ generation of the ASP, that takes it back
 _up_ by a factor of 16, and whilst memory access is stunting
 the speed of "serial" processors, ASPs just ... keep on expanding
 up and up.

 every doubling in clock speed as you go down the microns
 in gate size results linearly in a factor of EIGHT times
 performance - squared for number of APEs per area, and double
 for clock speed.

 ... but it's a _reeeall_ close run thing, and exactly like you say:
 the toolchain is what's _really_ holding things back.

> > it's because i am looking to recommend to a company that is
> > doing a parallel processor design that they also provide a
> > vector processor unit.
> 
> :-)  If they only support constructs that can be easily be used by gcc, 
> and if they customers need it, and if there is a large market for the 
> architecture where they can beat ia32 and AMD-64 now, and in the next 3 
> years, then, they might have a chance to not fail.  

 yeh.  qty 16of 500mhz risc cpus on a chip, plus a few vector-processor
 units...  reckon that ought to do it?

 hmmm.. 3 years... probably not.

 damn, back to drawing board.

> Otherwise, well, 
> you can do anything you want, because they will fail, just be sure to 
> get your money up front, don't work for free or stock...  :-)

 .. or make sure it's GPL'd!

> > if gcc don't make the grade as a viable vector processing aware
> > compiler or as part of a vector processing aware toolchain,
> > the recommendation ain't gonna happen - i've seen what happens
> > when you don't have a good enough development toolchain.
> >
> > companies fail.
> 
> Yup.
> 
> 
> So, while it may seem like you can escape out to perl and code up a 
> compiler assist in perl, 

 [well, in aspex's case, it's not perl, it's a pre-existing
  well-developed and stable toolchain written over the past
  15 years, in modula-2, but i know what you mean... ]

> our best sense tells us to dissuade you from 
> that idea.  

 *lol*

> Better to recognize the code (autovec) or use tagging 
> techniques (OpenMP/altivec/sse) and then just do up that support 
> directly in the compiler, 

 yes - for the project i have been asked about, _yes_ i'd recommend it
 be done as _hardware_ assembly instructions - that there exist an
 assembly op-code for doing a vector add.

 that _would_ make "plural int x;" - tagging - a viable option.

 in aspex's case, that's _just_ not viable - the c++ option is about the
 only _sane_ option :)

 can you _imagine_ the number of different tags you'd need to say
 "i want this register to be 1-bit wide, spread across 16 processors each,
  i want _this_ register array to be 4-bits wide, spread across 32 processors.."

 ... it just goes _nuts_.

> plus, now-a-days, I would recommend doing up 
> a template library that makes the architecture rock; to make that work, 
> it also needs to be ported to all the market relevant architectures as 
> well and have a massive user following.  That part is beyond the scope 
> of this list.

 well, the approach taken by aspex _makes_ it portable, already
 [because it's a macro pre-processing step, turning inline-asp
  instructions into c-code].

 it just sucks :)

 ... hang on, i just re-read what you said.

 valarray STL is an ISO/ANSI _standard_.

 you declare a valarray<int> x(20) or something.

 you then do x += 5 and all 20 integers in the array x get 5 added to
 them.

 i believe it to be quite straightforward to modify valarray on a
 per-vector-based-architecture basis to provide support for whatever
 accelerated instructions are available.

 in fact - right now - you could probably do it _now_ for MMX, altivec,
 Sony Playstation and MasPar hardware: all of these have hardware-based
 assembly instruction opcodes, yes?

 [just not the ASP, because of their proprietary assembler-based toolchain]

 so, if you've got assembler opcodes, you simply do a version of the
 valarray template library that divides the for-loops up by the
 vector-unit length, and goes into inline assembly code.

 instead of doing
 for (i = 0; i < this->get_size(); i++)
 	this->data[i] += op1->data[i]

 you'd do

 for (i = 0; i < this->get_size(); i+= vector_unit->get_size())
 {
 	asm { .... }
 }

 remembering to take into account the last bit of the loop, of course :)

 i imagine this to be a _whole_ lot less grief than putting support
 in gcc for vectors / autodetection / tagging.

 ... don't get me wrong - i'd be _delighted_ to see vector
 autodetection and tagging in gcc!

 l.

 p.s. mike - thank you _ever_ so much for responding, you have given me
 much hope [for the future project], much guidance, and made me think.

-- 
--
<a href="http://lkcl.net";>http://lkcl.net</a>
--

Follow-Ups:
- Re: [lkcl@lkcl.net: has gcc been reworked so that code/templates can be "outsourced" e.g. to perl yet?]
  - From: Mike Stump

References:
- Re: [lkcl@lkcl.net: has gcc been reworked so that code/templates can be "outsourced" e.g. to perl yet?]
  - From: Luke Kenneth Casson Leighton
- Re: [lkcl@lkcl.net: has gcc been reworked so that code/templates can be "outsourced" e.g. to perl yet?]
  - From: Mike Stump

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]