hash_map<> and string...

george@moberg.com george@moberg.com
Wed Nov 22 09:05:00 GMT 2000


While reading my response to this, keep in mind my somewhat extreme
position:

	ANYTHING that I can do to improve the reliability of my software, I
WILL DO.

Also, please take this e-mail with a Monty-Python's-Life-of-Brian-style
sense of humor.

> His point was that the class template hash<> should also be specialized
> for std::string since it is the C++ politically correct way of spelling
> "character string".

EXACTLY.  What we do here at this point now that we're using 2.90.8
internally for development is SMACK UPSIDE THE HEAD anyone who even
THINKS about using char* as a string.  "char*" as a string type is NOT
MAINTAINABLE or DEBUGGABLE in any meaningful way in a large C++
program.  std::string is the character string type going forward and we
are extremely committed to its use, as we are not C programmers using a
better compiler.  Using std::string is even more important knowing that
we have to internationalize our programs.

I cannot possibly stress enough how much the Standard C++ Library, and
in particular std::string and the container template classes have
improved the reliability and maintainability of our large multithreaded
programs.  We were using C-style string manipulation and traditional
pointers for composite data structures and I was READY TO QUIT MY JOB
because writing correct code and debugging said code was TOO DIFFICULT. 
Using the V3 of the Standard C++ Library with gcc has been a revelation
of near religious significance to our group.  The Standard C++ Library,
and in particular std::string and the container classes, complete the
C++ language in a way that it hasn't been before.  I'm not quite sure
that "the world at large" realizes how significant this is.

So, as we religious zealots are wont to do (;^), we are prepared to
request an unusually strict level of compliance with doctrine.  (;^)

(As an aside, we're _very_ interested in correctness, as we work on
medical monitoring products.  Interested to the point that we have a
template for when we use pointers that compiles to the same code as
pointers with -O2 but with -O1 -DCHECKED_POINTERS=1 will protect every
pointer operation with an assertion.  I oughta donate this to you guys,
if you want it, as it has helped immensely in debugging our programs. 
We regularly search our source code for uses of bare pointers and
C-style strings and make the author of such code justify, individually,
each new use.)

> | I suspect this will get you where you want to go. If not, you're
> | pretty much out of luck, as only fundamental types are going in this
> | file, apparently.
> 
> Well, any reasonable hash table library should come with a hash
> function for fundmantal types or nearby types (i.e. pointer types).
> It is also understandable to require the library to come with hash
> functions defined for any type it happens to define or use, for
> completeness purpose.
>
> | Is it really that much extra work to define this yourself?
> 
> It is not really hard to add support for hash<std::string> but that has
> a low priority on my TODO list.  And if we start providing support
> for std::string, there is no reason we should stop there... 

As far as defining this myself, I do that now.  My reasons for bringing
this up are for those of religious zealotry.

std::string is effectively a _fundamental type_.  Moving towards a
pervasive use of std::string for character strings is the "Right
Thing(tm) to do" (IMHO), and it is my intention (NSH) to nudge the
development of this library in service of that goal.  For example, I
believe that eventually constructors for iostreams will eventually take
std::string for pathnames in future revisions of the relevant
standard(s).  

Forget about pointers to char in C++.  Accept that it's a "Really Bad
Thing(tm) to do" to represent strings this way except in rare,
performance-critical, isolated sections that you can hide away from any
public interface.

Take a hard look at the Standard C++ Library and your implementation of
it, and imagine that everywhere you see char*, it will eventually be
more common to use std::string, and that those places will eventually be
changed.  _That_ is the reason I want to see support for std::string in
places like hash_map<> in the library now.






I'd also like to make a couple of unrelated observations on performance
of user-level code transitioning to this library:

When we changed over to std::string, we were worried about the fact that
std::string uses heap-based allocation for strings.  As one might
expect, our program was in fact slower after the transition.  After
profiling, we changed 3 methods in our program (out of well over a
thousand) in an isolated manner not visible to the callers to use
internal buffers and traditional C string manipulation, and got about
98% of the lost performance back.  This is analagous to profiling and
rewriting about 1-5% of your C program in assembly to get 90% speed
improvement.  So std::string _works_ in a way that the transition to
compiled languages (i.e. C instead of assembly language) _worked_, and
as a software designer, I can apply my skills to the transition in the
same way I did in previous computing paradigm shifts.

As far as performance of the container classes goes, our programs
actually got _faster_ and performed more predictably, speed wise, when
we transitioned to the Standard C++ Library and got rid of our
home-grown templates.  How cool is that?






Given that you've gotta ship this thing soon, I know you can't
necessarily afford the time to be bothered with this.  Having said my
piece, I will now shut the **** up on this issue.  ;^)

--
George T. Talbot
<george at moberg dot com>

P.S.  Please let me know if these sort of user-level obeservations are
useful.  It's a sort-of shot in the dark for me writing this.


More information about the Libstdc++ mailing list