Difference between revisions of "Random name generation"

From RogueBasin
Jump to navigation Jump to search
m
(Remove dead links)
 
(8 intermediate revisions by 8 users not shown)
Line 1: Line 1:
There are a lot of different ways to tackle random name generation. Some use existing lists of names to generate variants of these names, such as the Markov-Chain Approach. Other variants use lists of syllables to randomly select names. And others combine different words or parts of words into new random names.
There are a lot of different ways to tackle random name generation. There are games, such as [[ADOM]], that simply pick a random name from a hard coded list, however, what many developers pursue is a generator that will procedurally generate an arbitrarily large number of names that will:
* be pronounceable
* have an adequate "feel"
Many attempts have been made to write such generators, and different approaches have been used. Markov chains are one of the methods that have been used, although methods involving bigger data blocks, such as syllables or even entire words, have proven to be easier to manage. The articles listed below refer to the subject of random name generation.


==Markov-Chains Approach==
==Related articles==


Markov-Chains are usually used in the field of thermodynamics, but I will be using a fudged version of Markov-Chains to generate some random names. The basic premise of Markov-Chains is that if we have a big enough data set we should be able to predict the next event in a series of seemingly random events. So using this we need to create a dataset of names that we want to generate, lucky for you I have already done this. Here is the [[List of Names]].
* [[Markov chains-based name generation]]
* [[Markov chains name generator in Python]]
* [[Finite state name generator]]
* [[Syllable-based name generation]]
* [[Cluster chaining name generator]]
* [[Names from a high order Markov Process and a simplified Katz back-off scheme]]


This list is every male and gender-neutral name from babynames.com. Now we need to create a program to parse this list for it's statistics. To make this interesting I have split the statistics up, 1. Letter combinations that begin names, 2. Letter combinations that end names, and 3. Letter combinations that are in between the beginning and end. This should give us slightly better names. The following is my code samples in C++, after compilation this program will output some randomly generated names.
==See also==
 
----
 
// CWordFrequency.h
#ifndef CWORDFREQUENCY_H_
#define CWORDFREQUENCY_H_
class CWordFrequency
{
    private:
        int countBeginning;
        int countEnd;
        int countWithin;
    public:
        CWordFrequency();
        ~CWordFrequency();
        void incrementCountBeginning();
        void incrementCountEnd();
        void incrementCountWithin();
        int returnCountBeginning();
        int returnCountEnd();
        int returnCountWithin();
};
#endif
----
//CWordFrequency.cpp
#include "CWordFrequency.h"
CWordFrequency::CWordFrequency() : countBeginning(0), countEnd(0),
countWithin(0)
{
}
CWordFrequency::~CWordFrequency()
{
}
void CWordFrequency::incrementCountBeginning()
{
    ++countBeginning;
}
void CWordFrequency::incrementCountEnd()
{
    ++countEnd;
}
void CWordFrequency::incrementCountWithin()
{
    ++countWithin;
}
int CWordFrequency::returnCountBeginning()
{
    return countBeginning;
}
int CWordFrequency::returnCountEnd()
{
    return countEnd;
}
int CWordFrequency::returnCountWithin()
{
    return countWithin;
}
----
//CRandomName.h
#include <fstream>
#include <map>
#include <string>
#include <cstdlib>
#include <ctime>
#include <vector>
#include <algorithm>
#include "CWordFrequency.h"
#ifndef CRANDOMNAME_H_
#define CRANDOMNAME_H_
class CRandomName
{
    private:
        std::string errorMessage;
        std::ifstream *fileStreamIn;
        std::ofstream *fileStreamOut;
        std::map<char, std::map<char, CWordFrequency> > baseMap;
        std::map<char, CWordFrequency> sequenceFrequencyMap;
        std::vector<char> startChars;
        std::vector<char> availableChars;
        CWordFrequency tempFrequency;
    public:
        CRandomName();
        ~CRandomName();
        void inputFile(std::ifstream &streamHandle);
        void processFile();
        void outputList(std::ofstream &streamHandle);
        std::string outputName(double minLength, double maxLength);
};
#endif
 
----
 
//CRandomName.cpp
#include <iostream>
#include <string>
#include <set>
#include <iterator>
#include "CRandomName.h"
using namespace std;
CRandomName::CRandomName()
{
  srand(time(NULL));
}
CRandomName::~CRandomName()
{
  fileStreamOut->close();
  fileStreamIn->close();
}
void CRandomName::inputFile(std::ifstream &streamHandle)
{
    fileStreamIn = &streamHandle;
}
//Set the boolean to true if you have a file that contains names with whitespaces. This does mean that the
//end of each name should be marked by a end of line character.
//Also the words will be converted to uppercase automatically
void CRandomName::processFile(bool NoWhiteSpaceSkip)
{
    std::string word;
    std::set<char> startCharsSet; //these sets are used to list the different starting and available chars.
    std::set<char> availableCharsSet; //they will be converted to vectors later.
                                        //I needed sets for their key ability, and vectors later because they
                                        //offer better access to the elements contained in the vector.
    char base;
    char sequence;
    int wordPosition;
    startChars.clear();
    availableChars.clear();
    while(fileStreamIn->good())
    {
        //Ignore whitespaces between words. Different names should be seperated by a '\n'
        if(NoWhiteSpaceSkip) {
          getline(*fileStreamIn,word);
        } else
            *fileStreamIn >> word;
       
        if(word.length() > 1)
        {
            startCharsSet.insert(word[0]);
            availableCharsSet.insert(word[0]);
       
            for (wordPosition = 0; (wordPosition + 1) < (word.length()); wordPosition++)
            {
              availableCharsSet.insert(word[wordPosition+1]);
              base = word[wordPosition];
              sequence = word[wordPosition + 1];
              CWordFrequency &wf = baseMap[base][sequence];
       
              if (wordPosition == 0) {wf.incrementCountBeginning();}
              else if ((wordPosition + 1) >= (word.length() - 1)) {wf.incrementCountEnd();}
              else if ((wordPosition > 0) && ((wordPosition + 1) < (word.length() - 1))) {wf.incrementCountWithin();}           
            }
        }
    }
    set<char>::iterator it;
    for ((it = startCharsSet.begin()); it != startCharsSet.end(); it++)
        startChars.push_back(*it);
    for ((it = availableCharsSet.begin()); it != availableCharsSet.end(); it++)
        availableChars.push_back(*it);
}
void CRandomName::outputList(std::ofstream &streamHandle)
{
    fileStreamOut = &streamHandle;
    std::map<char, std::map<char, CWordFrequency> >::iterator itr;
    std::map<char, CWordFrequency>::iterator itr2;
    for (itr = baseMap.begin(); itr != baseMap.end(); itr++)
    {
        sequenceFrequencyMap = itr->second;
        for (itr2 = sequenceFrequencyMap.begin(); itr2 != sequenceFrequencyMap.end(); itr2++)
        {
            tempFrequency = itr2->second;
            *fileStreamOut << itr->first << " " << itr2->first << " "
                << tempFrequency.returnCountBeginning() << " " <<
                tempFrequency.returnCountWithin() << " "
                << tempFrequency.returnCountEnd() << std::endl;
        }
    }
}
//Known bugs, if there are not enough names in the seed the program will crash.
std::string CRandomName::outputName(double minLength, double maxLength)
{
    std::string name;
    std::vector<char> freqVector;
    double range = static_cast<double>((maxLength - minLength) + 1);
    int rangeLength = static_cast<int>(minLength + (range * ((double)rand() / (double)(RAND_MAX + 1))));
    char a =  startChars.at(static_cast<int> (startChars.size() * rand() / ( RAND_MAX + 1.0 ) ));
    name += a;
   
    for(int counter = 1; counter < rangeLength; counter++)
    {
        int cdc = 0;
        if(baseMap.find(a) != baseMap.end())
        {
        for (int count = 0; count < availableChars.size(); count++)
        {
            char b = availableChars.at(count);
            if(baseMap[a].find(b) != baseMap[a].end())
            {
                if(counter == 1)
                {
                    for(int cc = 0; cc < (baseMap[a][b].returnCountBeginning()); cc++)
                    {
                        freqVector.push_back(b);
                        cdc++;
                    }
                }
                else if((counter + 1) >= (rangeLength - 1))
                {
                    for(int cc = 0; cc < baseMap[a][b].returnCountEnd();cc++)
                    {
                        freqVector.push_back(b);
                        cdc++;
                    }
                }
                else
                {
                    for(int cc = 0; cc < baseMap[a][b].returnCountWithin();cc++)
                    {
                        freqVector.push_back(b);
                        cdc++;
                    }
                }
            }
        }
    }
        std::random_shuffle(freqVector.begin(), freqVector.end());
        std::random_shuffle(freqVector.begin(), freqVector.end());
        std::random_shuffle(freqVector.begin(), freqVector.end());
        int c = (int)(((cdc) * rand() / ( RAND_MAX + 1.0 )));
        name += freqVector.at(c);
        a = freqVector.at(c);
    }
    return name;
}
 
----
 
//main.cpp
#include <iostream>
#include "CRandomName.h"
int main()
{
    CRandomName name;
   
    std::ifstream inFile("NameList.txt");
    std::ofstream outFile("Stats.txt", std::ios_base::trunc);
   
    name.inputFile(inFile);
    name.processFile();
    name.outputList(outFile);
   
    std::cout << name.outputName(3, 9) << '\n' <<
        name.outputName(3, 9) << '\n' <<
        name.outputName(3, 9) << '\n' <<
        name.outputName(3, 9) << '\n' <<
        name.outputName(3, 9) << '\n';
}
 
(The code is public domain)
 
And here is some sample output:
 
ONMY
TERLEMDA
ZENEEVML
CHIA
SAWH
SPTKIVCDE
ILAN
COONR
KETADAN
VIAN
XAIIN
URRIRERON
FRTOKRR
YONNAN
IDANSZEDS
CAALLSTF
FLVAHE
XHAM
DUSEXNCST
DEBAD
SHEUO
WARFA
MUASS
INEE
 
As you can see, some are ok but most aren't. This is because the dataset although big isn't quite big enough. To compensate for this, you could try to implement artificial rules to make the names more readable, but I will leave that as an exercise for the reader.
 
==Markov-Chains Approach in Python==
Here is the piece of code used in [[Mines of Elderlore]] to generate random names:
 
import random
# from http://www.geocities.com/anvrill/names/cc_goth.html
PLACES = ['Adara', 'Adena', 'Adrianne', 'Alarice', 'Alvita', 'Amara', 'Ambika', 'Antonia', 'Araceli', 'Balandria', 'Basha',
'Beryl', 'Bryn', 'Callia', 'Caryssa', 'Cassandra', 'Casondrah', 'Chatha', 'Ciara', 'Cynara', 'Cytheria', 'Dabria', 'Darcei',
'Deandra', 'Deirdre', 'Delores', 'Desdomna', 'Devi', 'Dominique', 'Drucilla', 'Duvessa', 'Ebony', 'Fantine', 'Fuscienne',
'Gabi', 'Gallia', 'Hanna', 'Hedda', 'Jerica', 'Jetta', 'Joby', 'Kacila', 'Kagami', 'Kala', 'Kallie', 'Keelia', 'Kerry',
'Kerry-Ann', 'Kimberly', 'Killian', 'Kory', 'Lilith', 'Lucretia', 'Lysha', 'Mercedes', 'Mia', 'Maura', 'Perdita', 'Quella',
'Riona', 'Safiya', 'Salina', 'Severin', 'Sidonia', 'Sirena', 'Solita', 'Tempest', 'Thea', 'Treva', 'Trista', 'Vala', 'Winta']
###############################################################################
# Markov Name model
# A random name generator, by Peter Corbett
# http://www.pick.ucam.org/~ptc24/mchain.html
# This script is hereby entered into the public domain
###############################################################################
class Mdict:
    def __init__(self):
        self.d = {}
    def __getitem__(self, key):
        if key in self.d:
            return self.d[key]
        else:
            raise KeyError(key)
    def add_key(self, prefix, suffix):
        if prefix in self.d:
            self.d[prefix].append(suffix)
        else:
            self.d[prefix] = [suffix]
    def get_suffix(self,prefix):
        l = self[prefix]
        return random.choice(l) 
class MName:
    """
    A name from a Markov chain
    """
    def __init__(self, chainlen = 2):
        """
        Building the dictionary
        """
        if chainlen > 10 or chainlen < 1:
            print "Chain length must be between 1 and 10, inclusive"
            sys.exit(0)
   
        self.mcd = Mdict()
        oldnames = []
        self.chainlen = chainlen
   
        for l in PLACES:
            l = l.strip()
            oldnames.append(l)
            s = " " * chainlen + l
            for n in range(0,len(l)):
                self.mcd.add_key(s[n:n+chainlen], s[n+chainlen])
            self.mcd.add_key(s[len(l):len(l)+chainlen], "\n")
   
    def New(self):
        """
        New name from the Markov chain
        """
        prefix = " " * self.chainlen
        name = ""
        suffix = ""
        while True:
            suffix = self.mcd.get_suffix(prefix)
            if suffix == "\n" or len(name) > 9:
                break
            else:
                name = name + suffix
                prefix = prefix[1:] + suffix
        return name.capitalize() 
for i in range(100):
    print MName().New()
 
==Finite State Approach==
 
[http://groups.google.com/group/rec.games.roguelike.development/msg/185c90b7e8986d30 source] : Original author [[Jeff Lait]]
 
This generator uses a very simple finite state machine to select between individual letters in a fashion that creates pronounceable player names.
 
The comments at the top explain the nature of the beast. The only syllable based component is where I trigger the termination timer using a quickly accelerating random chance to keep names short.
 
Following code is in [[C|C++]]
 
#include <stdio.h>
#include <stdlib.h>
#include <ctype.h>
#include <string.h>
#include <time.h>
//this function returns true if a random roll (1-100) is lower then c.
//So: rand_chance(20) has a 20% chance of returning true.
bool rand_chance(int c) {
return( (rand() % 100 + 1) <= c);
}
//this function returns a random int between 0 and i-1.
int rand_choice(int i) {
return rand() % i;
}
void rand_name(char *text, int len)
{
    // Very simple markov generator.
    // We repeat letters to make them more likely.
    const char *vowels = "aaaeeeiiiooouuyy'";
    const char *frictive = "rsfhvnmz";
    const char *plosive = "tpdgkbc";
    const char *weird = "qwjx";
    // State transitions..
    // v -> f, p, w, v'
    // v' -> f, p, w
    // f -> p', v
    // p -> v, f'
    // w, p', f' -> v
    int        syllables = 0;
    char        state;
    int        pos = 0;
    bool        prime = false;
    // Initial state choice
    if (rand_chance(30))
        state = 'v';
    else if (rand_chance(40))
        state = 'f';
    else if (rand_chance(70))
        state = 'p';
    else
        state = 'w';
    while (pos < len-1)
    {
        // Apply current state
        switch (state)
        {
            case 'v':
                text[pos++] = vowels[rand_choice(strlen(vowels))];
                if (!prime)
                    syllables++;
                break;
            case 'f':
                text[pos++] = frictive[rand_choice(strlen(frictive))];
                break;
            case 'p':
                text[pos++] = plosive[rand_choice(strlen(plosive))];
                break;
            case 'w':
                text[pos++] = weird[rand_choice(strlen(weird))];
                break;
        }
        // Chance to stop..
        if (syllables && pos >= 3)
        {
            if (rand_chance(20+pos*4))
                break;
        }
        // Transition...
        switch (state)
        {
            case 'v':
                if (!prime && rand_chance(10))
                {
                    state = 'v';
                    prime = true;
                    break;
                }
                else if (rand_chance(40))
                    state = 'f';
                else if (rand_chance(70))
                    state = 'p';
                else
                    state = 'w';
                prime = false;
                break;
            case 'f':
                if (!prime && rand_chance(50))
                {
                    prime = true;
                    state = 'p';
                    break;
                }
                state = 'v';
                prime = false;
                break;
            case 'p':
                if (!prime && rand_chance(10))
                {
                    prime = true;
                    state = 'f';
                    break;
                }
                state = 'v';
                prime = false;
                break;
            case 'w':
                state = 'v';
                prime = false;
                break;
        }
    }
    text[0] = toupper(text[0]);
    text[pos++] = '\0';
}
//Arguments:    First is max length of random name
//              Second is numer of random names
int main(int argc, char *argv[]) {
srand ( time(NULL) );
int j = 1;
if(argc > 2) {
j = atoi(argv[2]);
}
while(j > 0) {
if(argc > 1) {
int i = atoi(argv[1]);
char * a = (char*) malloc(sizeof(char) * i);
rand_name(a, i);
printf("%s\n",a);
} else {
char a [5];
rand_name(a, 5);
printf("%s\n",a);
}
j--;
}
return 0;
}
(The code is public domain)
 
This program accepts the following arguments, the first is the max length of the name, but most will be smaller. And the second is the number of names to generate. Defaults are 5 1.
 
Results:
./randomName 10 20
'tosezivc
Fdo
'dybze
Wask
Bof
Yqa
'vymi
Fatyope
Xar
Qow
Midu
Tezk
Tucik
Hgy
Eqyhp
Hti
Cevidi
Sd'gaquz
Ubeqy
Spa
 
As you can see the algorithm has a bias for short names, and is not always pronounceable. But it is a start. Also as this is a proof of concept implementation, the [[random number generator]] is not properly seeded. Running multiple types after each other could result in the same random names (depending on the operating system).
 
==Syllable-based Approach==
 
If you have one or more information about it, add it here.
 
==Other Approaches==


* [http://groups.google.com/group/rec.games.roguelike.development/msg/905b2028961e4c76 (from rgrd) Phonemes based name generator]
* [http://groups.google.com/group/rec.games.roguelike.development/msg/905b2028961e4c76 (from rgrd) Phonemes based name generator]
* [http://www.seventhsanctum.com/library.php (www.seventhsanctum.com) Word based random generators, code and advice]
* [https://github.com/Tw1ddle/MarkovNameGenerator Markov Process name generator] based on [http://www.roguebasin.com/index.php?title=Names_from_a_high_order_Markov_Process_and_a_simplified_Katz_back-off_scheme Jeff Lund's article]
* [http://www.seventhsanctum.com/writings.php?Writnum=2 (www.seventhsanctum.com) Sleight of Mind: Why do generators work]
[[Category:Articles]]
[[Category:Articles]]

Latest revision as of 21:41, 20 November 2015

There are a lot of different ways to tackle random name generation. There are games, such as ADOM, that simply pick a random name from a hard coded list, however, what many developers pursue is a generator that will procedurally generate an arbitrarily large number of names that will:

  • be pronounceable
  • have an adequate "feel"

Many attempts have been made to write such generators, and different approaches have been used. Markov chains are one of the methods that have been used, although methods involving bigger data blocks, such as syllables or even entire words, have proven to be easier to manage. The articles listed below refer to the subject of random name generation.

Related articles

See also