Difference between revisions of "Random name generation"

From RogueBasin
Jump to navigation Jump to search
(→‎See also: added some links to articles from Roguelike News)
Line 1: Line 1:
There are a lot of different ways to tackle random name generation. There are games, such as [[ADOM]], that simply pick a random name from a hard coded list, however, what many developers pursue is a generator that will procedurally generate an arbitrarily large number of names that will:
Markov-Chains are usually used in the field of thermodynamics, but I will be using a fudged version of Markov-Chains to generate some random names. The basic premise of Markov-Chains is that if we have a big enough data set we should be able to predict the next event in a series of seemingly random events. So using this we need to create a dataset of names that we want to generate, lucky for you I have already done this. Here is the [[List of Names]].
* be pronounceable
* have an adequate "feel"
Many attempts have been made to write such generators, and different approaches have been used. Markov chains are one of the methods that have been used, although methods involving bigger data blocks, such as syllables or even entire words, have proven to be easier to manage. The articles listed below refer to the subject of random name generation.


==Related articles==
This list is every male and gender-neutral name from babynames.com. Now we need to create a program to parse this list for it's statistics. To make this interesting I have split the statistics up, 1. Letter combinations that begin names, 2. Letter combinations that end names, and 3. Letter combinations that are in betwee the beginning and end. This should give us slightly better names. The following is my code samples in C++, after compilation this program will output some randomly generated names.


* [[Markov chains-based name generation]]
<TT>
* [[Markov chains name generator in Python]]
// CWordFrequncy.h
* [[Finite state name generator]]
#ifndef CWORDFREQUENCY_H_
* [[Syllable-based name generation]]
#define CWORDFREQUENCY_H_
* [[Cluster chaining name generator]]


==See also==
class CWordFrequency
{


* [http://groups.google.com/group/rec.games.roguelike.development/msg/905b2028961e4c76 (from rgrd) Phonemes based name generator]
    private:
* [http://www.seventhsanctum.com/library.php (www.seventhsanctum.com) Word based random generators, code and advice]
        int countBeginning;
* [http://www.seventhsanctum.com/writings.php?Writnum=2 (www.seventhsanctum.com) Sleight of Mind: Why do generators work]
        int countEnd;
* [http://mistermishap.com/rlnews/dev00038.html Random names]
        int countWithin;
* [http://mistermishap.com/rlnews/dev00040.html Random Name Generation Using Regular Expressions]
 
[[Category:Articles]]
    public:
        CWordFrequency();
        ~CWordFrequency();
        void incrementCountBeginning();
        void incrementCountEnd();
        void incrementCountWithin();
        int returnCountBeginning();
        int returnCountEnd();
        int returnCountWithin();
 
};
 
#endif
 
//CWordFrequency.cpp
#include "CWordFrequency.h"
 
CWordFrequency::CWordFrequency() : countBeginning(0), countEnd(0),
countWithin(0)
{
}
 
CWordFrequency::~CWordFrequency()
{
}
 
void CWordFrequency::incrementCountBeginning()
{
    ++countBeginning;
}
 
void CWordFrequency::incrementCountEnd()
{
    ++countEnd;
}
 
void CWordFrequency::incrementCountWithin()
{
    ++countWithin;
}
int CWordFrequency::returnCountBeginning()
{
    return countBeginning;
}
int CWordFrequency::returnCountEnd()
{
    return countEnd;
}
 
int CWordFrequency::returnCountWithin()
{
    return countWithin;
}
//CRandomName.h
#include <fstream>
#include <map>
#include <string>
#include <cstdlib>
#include <ctime>
#include <vector>
#include <algorithm>
#include "CWordFrequency.h"
#ifndef CRANDOMNAME_H_
#define CRANDOMNAME_H_
class CRandomName
{
    private:
        std::string errorMessage;
        std::ifstream *fileStreamIn;
        std::ofstream *fileStreamOut;
        std::map<char, std::map<char, CWordFrequency> > baseMap;
        std::map<char, CWordFrequency> sequenceFrequencyMap;
        CWordFrequency tempFrequency;
    public:
        CRandomName();
        ~CRandomName();
        void inputFile(std::ifstream &streamHandle);
        void processFile();
        void outputList(std::ofstream &streamHandle);
        std::string outputName(double minLength, double maxLength);
};
#endif
//CRandomName.cpp
#include <iostream>
#include "CRandomName.h"
CRandomName::CRandomName()
{
  srand(time(NULL));
}
CRandomName::~CRandomName()
{
  fileStreamOut->close();
  fileStreamIn->close();
}
void CRandomName::inputFile(std::ifstream &streamHandle)
{
    fileStreamIn = &streamHandle;
}
void CRandomName::processFile()
{
    std::string word;
    char base;
    char sequence;
    int wordPosition;
    while(!fileStreamIn->eof())
    {
        *fileStreamIn >> word;
        for (wordPosition = 0; (wordPosition + 1) < (word.length()); wordPosition++)
        {
            base = word[wordPosition];
            sequence = word[wordPosition + 1];
            CWordFrequency &wf = baseMap[base][sequence];
           
            if (wordPosition == 0) {wf.incrementCountBeginning();}
            else if ((wordPosition + 1) >= (word.length() - 1)) {wf.incrementCountEnd();}
            else if ((wordPosition > 0) && ((wordPosition + 1) < (word.length() - 1))) {wf.incrementCountWithin();}           
        }
    }
}
 
void CRandomName::outputList(std::ofstream &streamHandle)
{
    fileStreamOut = &streamHandle;
 
    std::map<char, std::map<char, CWordFrequency> >::iterator itr;
 
    std::map<char, CWordFrequency>::iterator itr2;
 
    for (itr = baseMap.begin(); itr != baseMap.end(); itr++)
    {
        sequenceFrequencyMap = itr->second;
        for (itr2 = sequenceFrequencyMap.begin(); itr2 != sequenceFrequencyMap.end(); itr2++)
        {
            tempFrequency = itr2->second;
            *fileStreamOut << itr->first << " " << itr2->first << " " << tempFrequency.returnCountBeginning() << " " << tempFrequency.returnCountWithin() << " " << tempFrequency.returnCountEnd() << std::endl;
        }
    }
}
 
std::string CRandomName::outputName(double minLength, double maxLength)
{
    std::string name;
    std::vector<char> freqVector;
    double range = static_cast<double>((maxLength - minLength) + 1);
    int rangeLength = static_cast<int>(minLength + (range * ((double)rand() / (double)(RAND_MAX + 1))));
    char a = static_cast<char> (65 + (26 * rand() / ( RAND_MAX + 1.0 )));
//I made this only go to
//'Z' because I haven't finished compileing my list of names
    name += a;
 
    for(int counter = 1; counter < rangeLength; counter++)
    {
        int cdc = 0;
        if(baseMap.find(a) != baseMap.end())
        {
        for (char b = 'A'; b <= 'Z'; b++)
        {
            if(baseMap[a].find(b) != baseMap[a].end())
            {
                if(counter == 1)
                {
                    for(int cc = 0; cc < (baseMap[a][b].returnCountBeginning()); cc++)
                    {
                        freqVector.push_back(b);
                        cdc++;
                    }
                }
                else if((counter + 1) >= (rangeLength - 1))
                {
                    for(int cc = 0; cc < baseMap[a][b].returnCountEnd();cc++)
                    {
                        freqVector.push_back(b);
                        cdc++;
                    }
                }
                else
                {
                    for(int cc = 0; cc < baseMap[a][b].returnCountWithin();cc++)
                    {
                        freqVector.push_back(b);
                        cdc++;
                    }
                }
            }
        }
    }
        std::random_shuffle(freqVector.begin(), freqVector.end());
        std::random_shuffle(freqVector.begin(), freqVector.end());
        std::random_shuffle(freqVector.begin(), freqVector.end());
        int c = (int)(((cdc) * rand() / ( RAND_MAX + 1.0 )));
        name += freqVector.at(c);
        a = freqVector.at(c);
    }
    return name;
}
 
//main.cpp
#include <iostream>
#include "CRandomName.h"
int main()
{
    CRandomName name;
   
    std::ifstream inFile("NameList.txt");
    std::ofstream outFile("Stats.txt", std::ios_base::trunc);
   
    name.inputFile(inFile);
    name.processFile();
    name.outputList(outFile);
   
    std::cout << name.outputName(3, 9) << '\n' << name.outputName(3, 9) << '\n' << name.outputName(3, 9) << '\n' << name.outputName(3, 9) << '\n' << name.outputName(3, 9);
}
</TT>
 
And here is some sample output:
 
ONMY
TERLEMDA
ZENEEVML
CHIA
SAWH
SPTKIVCDE
ILAN
COONR
KETADAN
VIAN
XAIIN
URRIRERON
FRTOKRR
YONNAN
IDANSZEDS
CAALLSTF
FLVAHE
XHAM
DUSEXNCST
DEBAD
SHEUO
WARFA
MUASS
INEE
 
As you can see, some are ok but most aren't. This is because the dataset although big isn't quite big enough. To compensate for this, you could try to implement artificial rules to make the names more readable, but I will leave that as an exercise for the reader.

Revision as of 18:53, 29 August 2011

Markov-Chains are usually used in the field of thermodynamics, but I will be using a fudged version of Markov-Chains to generate some random names. The basic premise of Markov-Chains is that if we have a big enough data set we should be able to predict the next event in a series of seemingly random events. So using this we need to create a dataset of names that we want to generate, lucky for you I have already done this. Here is the List of Names.

This list is every male and gender-neutral name from babynames.com. Now we need to create a program to parse this list for it's statistics. To make this interesting I have split the statistics up, 1. Letter combinations that begin names, 2. Letter combinations that end names, and 3. Letter combinations that are in betwee the beginning and end. This should give us slightly better names. The following is my code samples in C++, after compilation this program will output some randomly generated names.

// CWordFrequncy.h

#ifndef CWORDFREQUENCY_H_
#define CWORDFREQUENCY_H_
class CWordFrequency
{
   private:
       int countBeginning;
       int countEnd;
       int countWithin;
   public:
       CWordFrequency();
       ~CWordFrequency();
       void incrementCountBeginning();
       void incrementCountEnd();
       void incrementCountWithin();
       int returnCountBeginning();
       int returnCountEnd();
       int returnCountWithin();
};
#endif

//CWordFrequency.cpp

#include "CWordFrequency.h"
CWordFrequency::CWordFrequency() : countBeginning(0), countEnd(0),
countWithin(0)
{

}
CWordFrequency::~CWordFrequency()
{

}
void CWordFrequency::incrementCountBeginning()
{

    ++countBeginning;

}
void CWordFrequency::incrementCountEnd()
{

    ++countEnd;

}
void CWordFrequency::incrementCountWithin()
{

    ++countWithin;

}

int CWordFrequency::returnCountBeginning()
{

    return countBeginning;

}

int CWordFrequency::returnCountEnd()
{

    return countEnd;

}
int CWordFrequency::returnCountWithin()
{

    return countWithin;

}

//CRandomName.h

#include <fstream>
#include <map>
#include <string>
#include <cstdlib>
#include <ctime>
#include <vector>
#include <algorithm>

#include "CWordFrequency.h"

#ifndef CRANDOMNAME_H_
#define CRANDOMNAME_H_

class CRandomName
{

    private:

        std::string errorMessage;
        std::ifstream *fileStreamIn;
        std::ofstream *fileStreamOut;
        std::map<char, std::map<char, CWordFrequency> > baseMap;
        std::map<char, CWordFrequency> sequenceFrequencyMap;
        CWordFrequency tempFrequency;

    public:

        CRandomName();
        ~CRandomName();
        void inputFile(std::ifstream &streamHandle);
        void processFile();
        void outputList(std::ofstream &streamHandle);
        std::string outputName(double minLength, double maxLength);

};

#endif

//CRandomName.cpp

#include <iostream>
#include "CRandomName.h"

CRandomName::CRandomName()
{
 srand(time(NULL));
}

CRandomName::~CRandomName()
{
 fileStreamOut->close();
 fileStreamIn->close();
}

void CRandomName::inputFile(std::ifstream &streamHandle)
{

    fileStreamIn = &streamHandle;

}

void CRandomName::processFile()
{

    std::string word;
    char base;
    char sequence;
    int wordPosition;

    while(!fileStreamIn->eof())
    {
        *fileStreamIn >> word;
        for (wordPosition = 0; (wordPosition + 1) < (word.length()); wordPosition++)
        {
            base = word[wordPosition];
            sequence = word[wordPosition + 1];

            CWordFrequency &wf = baseMap[base][sequence];
            
            if (wordPosition == 0) {wf.incrementCountBeginning();}
            else if ((wordPosition + 1) >= (word.length() - 1)) {wf.incrementCountEnd();}
            else if ((wordPosition > 0) && ((wordPosition + 1) < (word.length() - 1))) {wf.incrementCountWithin();}            
       }
   }
}
void CRandomName::outputList(std::ofstream &streamHandle)
{
   fileStreamOut = &streamHandle;
   std::map<char, std::map<char, CWordFrequency> >::iterator itr;
   std::map<char, CWordFrequency>::iterator itr2;
   for (itr = baseMap.begin(); itr != baseMap.end(); itr++)
   {
       sequenceFrequencyMap = itr->second;
       for (itr2 = sequenceFrequencyMap.begin(); itr2 != sequenceFrequencyMap.end(); itr2++)
       {
           tempFrequency = itr2->second;
           *fileStreamOut << itr->first << " " << itr2->first << " " << tempFrequency.returnCountBeginning() << " " << tempFrequency.returnCountWithin() << " " << tempFrequency.returnCountEnd() << std::endl;
       }
   }
}
std::string CRandomName::outputName(double minLength, double maxLength)
{
   std::string name;
   std::vector<char> freqVector;
   double range = static_cast<double>((maxLength - minLength) + 1);
   int rangeLength = static_cast<int>(minLength + (range * ((double)rand() / (double)(RAND_MAX + 1))));
   char a = static_cast<char> (65 + (26 * rand() / ( RAND_MAX + 1.0 ))); 

//I made this only go to //'Z' because I haven't finished compileing my list of names

   name += a;
   for(int counter = 1; counter < rangeLength; counter++)
   {
       int cdc = 0;
       if(baseMap.find(a) != baseMap.end())
       {
        for (char b = 'A'; b <= 'Z'; b++)
        {
           if(baseMap[a].find(b) != baseMap[a].end())
           {
               if(counter == 1)
               {
                   for(int cc = 0; cc < (baseMap[a][b].returnCountBeginning()); cc++)
                   {
                       freqVector.push_back(b);
                       cdc++;
                   }
               }
               else if((counter + 1) >= (rangeLength - 1))
               {
                   for(int cc = 0; cc < baseMap[a][b].returnCountEnd();cc++)
                   {
                       freqVector.push_back(b);
                       cdc++;
                   }
               }
               else
               {
                   for(int cc = 0; cc < baseMap[a][b].returnCountWithin();cc++)
                   {
                       freqVector.push_back(b);
                       cdc++;
                   }
               }
           }
       }
   }
       std::random_shuffle(freqVector.begin(), freqVector.end());
       std::random_shuffle(freqVector.begin(), freqVector.end());
       std::random_shuffle(freqVector.begin(), freqVector.end());
       int c = (int)(((cdc) * rand() / ( RAND_MAX + 1.0 )));
       name += freqVector.at(c);
       a = freqVector.at(c);
   }
   return name;
}

//main.cpp

#include <iostream>
#include "CRandomName.h"
int main()
{
   CRandomName name;
   
   std::ifstream inFile("NameList.txt");
   std::ofstream outFile("Stats.txt", std::ios_base::trunc);
   
   name.inputFile(inFile);
   name.processFile();
   name.outputList(outFile);
   
   std::cout << name.outputName(3, 9) << '\n' << name.outputName(3, 9) << '\n' << name.outputName(3, 9) << '\n' << name.outputName(3, 9) << '\n' << name.outputName(3, 9);
}

And here is some sample output:

ONMY
TERLEMDA
ZENEEVML
CHIA
SAWH
SPTKIVCDE
ILAN
COONR
KETADAN
VIAN
XAIIN
URRIRERON
FRTOKRR
YONNAN
IDANSZEDS
CAALLSTF
FLVAHE
XHAM
DUSEXNCST
DEBAD
SHEUO
WARFA
MUASS
INEE

As you can see, some are ok but most aren't. This is because the dataset although big isn't quite big enough. To compensate for this, you could try to implement artificial rules to make the names more readable, but I will leave that as an exercise for the reader.