970910 C++ optimization error causes segfault on i386 Linux
Derek Upham
sand@celia.serv.net
Tue Sep 16 14:25:00 GMT 1997
At the end of this messsage is a C++ program that implements part of
an anagram generator. The code provided simply loads in words that
will form the search space. The command line to run the program is
./badtest 'My Basis String' < /usr/dict/words
If the code is compiled without optimization, it works perfectly,
printing the line "collection has 332 entries" on my Linux box.
However, if the code is compiled with optimization flag "-O", it
raises a SEGFAULT signal during the dictionary map lookup (search for
the words "BUG OCCURS HERE").
My environment has the following particulars:
System: i586-pc-linux-gnu, Linux 2.0.30
Compiler: cc1plus, EGCS snapshot 970910
Binutils: version 2.8.1
C Library: glibc-2.0.5c
C++ Library: libg++-2.8.0b6 with H. J. Lu's libc6 patches
Derek
--
Derek Upham
sand@celia.serv.net http://www.serv.net/~sand
"Ha! Your Leaping Tiger Kung Fu is no match for my Frightened Piglet style!"
--------------------------------- cut here ----------------------------------
/*
BADTEST.CC
Look for collections of words whose letters, when combined, exactly
match the letters in a given string.
[Explanation skipped. The bug occurs in the dictionary loading phase.]
The current implementation of the program assumes that the
search-space dictionary will be available on stdin, in a format
described in the "istream &operator>>(istream &in, DElem &obj)"
function. Typically, this dictionary is "/usr/dict/words", which
means the command line to run the program is:
./badtest 'foo bar baz' < /usr/dict/words
*/
#include <algo.h>
#include <stdlib.h>
#include <function.h>
#include <iostream.h>
#include <iterator.h>
#include <list.h>
#include <map.h>
#include <string>
#include <vector.h>
//
// Global, constant values.
//
// Don't bother trying to perform a match if the word is less than
// this many characters long. This value must always be at least
// one.
const size_t MINIMAL_WORD_SIZE = 3;
// Use vectors of characters as words. Strings don't quite fit into
// the STL interface scheme (no "push_back()", for example).
typedef vector<char> word;
// Takes an input sequence of characters and converts it into a sorted
// vector of lowercase alphanumeric characters.
template <class InputIterator>
word
canonicalize(InputIterator first, InputIterator last)
{
word tmp;
transform(first, last, back_inserter(tmp), ptr_fun(tolower));
tmp.erase(remove_if(tmp.begin(), tmp.end(), not1(ptr_fun(isalpha))),
tmp.end());
sort(tmp.begin(), tmp.end());
return tmp;
}
// We will be matching on sorted vectors of characters, but we will
// need to print them in their original format. We choose to trade
// space for time, and keep track of both in the DElem structure.
class DElem
{
public:
word original, canonical;
};
istream &operator>>(istream &in, DElem &obj)
{
// The program expects a dictionary of words to be available on
// stdin. This dictionary should be in the form
//
// word1
// word2
// word3
// ...
//
// with the words separated by whitespace and containing no
// whitespace themselves. "/usr/dict/words" is typically a valid
// dictionary.
string s;
if (in >> s)
{
obj.original = word(s.begin(), s.end());
obj.canonical = canonicalize(s.begin(), s.end());
}
return in;
}
typedef map< char, list<DElem> > dict_t;
// Load a collection of words from the input stream, storing the
// original and canonical forms in a list. The particular list to use
// is determined by the first character of the canonical form.
//
// "name" is the string which will provide the source letters for the
// anagrams. It allows us to ignore words that can never help form an
// anagram.
void
load(istream &in_stream, const word &name, dict_t &dict)
{
DElem temp;
long count = 0;
while (in_stream >> temp)
if (MINIMAL_WORD_SIZE <= temp.canonical.size() &&
includes(name.begin(), name.end(),
temp.canonical.begin(), temp.canonical.end()))
{
// BUG OCCURS HERE
list<DElem> &collection = dict[temp.canonical[0]];
collection.push_back(temp);
++count;
}
cerr << "collection has " << count << " entries" << endl;
}
int
main(int argc, char *argv[])
{
if (argc != 2)
return EXIT_FAILURE;
dict_t dict;
load(cin, canonicalize(argv[1], argv[1]+strlen(argv[1])), dict);
return EXIT_SUCCESS;
}
More information about the Gcc-bugs
mailing list