Use Deep Learning to Generate Fantasy Names: Build a Language Model from Scratch

To truly grasp the intricacies of Language Models (LM) and become familiar with their underlying principles, there is no other way than rolling up our sleeves and starting to write code. In this article, I present the creation of a Recurrent Neural Network (RNN) built entirely from scratch, without the aid of any deep learning library.
Tensorflow, Keras, Pytorch make building deep and complex neural networks effortless. Undoubtedly, this is a great advantage for Machine Learning practitioners, however, this approach has the massive downside of leaving the functioning of those networks unclear as they happen "under the hood".
This is why today we will perform the inspiring exercise of building a Language Model using only the Numpy Python library!
Understanding Recurrent Neural Networks and Language Models
Standard fully connected neural networks are not suited for Natural Language Processing (NLP) tasks like text generation. The main reasons are:
- For NLP tasks, input and outputs may take different forms and dimensions.
- Standard neural networks don't simultaneously use features learned at different steps of the network.
The main breakthrough in AI application within the NLP field is undeniably represented by Recurrent Neural Networks (RNN).
RNNs are a class of artificial neural networks particularly well suited for NLP tasks and text generation. The reason for their efficacy lies in their ability of capturing sequential dependencies in data. Human language deeply relies on considering the context and linking the first words in a sentence to the last ones. Consider these sentences:
- He said, "Teddy Roosevelt was the US President."
- He said, "Teddy bears are on sale!"
The word "Teddy" has a completely different meaning in the two sentences. We humans easily understand that by considering the context and the words written in the opposite part of the sentence. Surprisingly enough, RNNs can do the same!
RNN Architecture
The architecture of RNNs is fairly straightforward. They are made of sequential cells, each one of them takes as input a word x (or a single character), outputs a word/character y, and passes an activation a to the next cell.

What happens inside a RNN cell is more interesting. The steps are the following:
- The activation of the previous cell is multiplied by some weights _Waa
- The input x is multiplied by some weights _Wax
- The results of the previous steps are summed together and with a bias _ba
- An activation function like the hyperbolic tangent is applied to compute the activation which is passed to the next cell
- The activation is multiplied by some weights _Wya and summed with a bias _by
- Finally, a softmax function is applied to the resulting vector and the output _yhat is returned

To summarize, the formulas for the activation and output are as follows:


I'm unable to provide a comprehensive theoretical introduction of RNNs and LMs here. For that, I refer you to the resources listed at the end of this article.
Let's dive now into the actual Python code. The most important parts of the code will be explained in detail in this article. In order to keep the article concise, some intuitive parts are omitted. The entire commented code can be accessed in my GitHub repository:
articles/names-generator-RNN at main · andreoniriccardo/articles
Data Preparation
Our goal is to teach a Language Model to invent novel Fantasy character names. For this reason, the first thing we need to provide our Language Model is a database of actual fantasy names. The LM will be trained and will take inspiration from that.
Thanks to this Wikipedia page, we can easily access a list of all the characters mentioned in the _Lord Of The Ring or The Hobbit_ books. Using the BeautifulSoupand Regex libraries, the following lines of code will collect the data:
from bs4 import BeautifulSoup
import requests
import re
url = "https://en.wikipedia.org/wiki/List_of_Middle-earth_characters"
# Get the contents of the webpage in text format
data = requests.get(url).text
soup = BeautifulSoup(data,"html.parser")
# Parse the HTML code to isolate rows containing character names
names = soup.find_all('li')
# Apply a second filter to isolate actual names
names_2 = []
for i in names:
if """
For the explanation of the Regex functions, I leave you with this comprehensive guide.
At this point, our dataset will look like this:

I aim to simplify our vocabulary by replacing unwanted characters such as ä, é, î.
replace_dict = {'a':['â','ä','á'],
'e':['ê','ë','é'],
'i':['î','í'],
'o':['ô','ö','ó'],
'u':['ú','û',],
' ':['-']
}
for new_char in replace_dict.keys():
for old_char in replace_dict[new_char]:
lotr_names = lotr_names.replace(old_char, new_char)
Finally, our vocabulary consists in 27 characters:
[‘x', ‘j', ‘f', ‘t', ‘c', ‘b', ‘o', ‘l', ‘y', ‘w', ‘i', ‘e', ‘g', ‘m', ‘k', ‘d', ‘v', ‘n', ‘u', ‘a', ‘z', ‘r', ‘n', ‘s', ‘ ‘, ‘p', ‘h']
Note that the special character ‘n' serves as an End of Name token, indicating when the generated name should terminate.
The dataset is ready and we can now focus on modeling the Recurrent Neural Network.
Building the Language Model
In this section, I present the Python implementation of the Language Model. The whole code can be divided into 2 sections:
- Forward Propagation
- Backprop
In the following sections, I will present the code that actually trains the model and how to generate fantasy character names.
Forward Propagation
As illustrated above, a RNN is a network composed of multiple cells. It is advantageous to model a single RNN cell in Python and then integrate its output through multiple cells.
The code that models a single RNN cell is the following:
def RNN_forward_prop_step(parameters, a_prev, x):
W_aa = parameters['W_aa']
W_ax = parameters['W_ax']
W_ya = parameters['W_ya']
b_y = parameters['b_y']
b = parameters['b']
# Compute hidden state
a_next = np.tanh(np.dot(W_ax, x) + np.dot(W_aa, a_prev) + b)
# Compute log probabilities for next character
p_t = softmax(np.dot(W_ya, a_next) + b_y)
return a_next, p_t
As you can see, it is not complex at all. We simply take as input the model's parameters and multiply them by the cell's input and previous activation function. Finally, we apply the softmax function to return a vector of probabilities for what the output character should be.
The next step is to iterate through several RNN cells. This is exactly what the RNN_roward_prop()
function does.
def RNN_forward_prop(X, Y, a0, parameters, vocab_size):
x = {}
a = {}
y_hat = {}
a[-1] = np.copy(a0)
loss= 0
# Iterate for the T timesteps
for t in range(len(X)):
# One-hot representation of the t-th character
x[t] = np.zeros((vocab_size, 1))
if X[t] != None:
x[t][X[t]] = 1
# Run one timestep of forward prop
a[t], y_hat[t] = RNN_forward_prop_step(parameters, a[t-1], x[t])
# Update loss function
loss -= np.log(y_hat[t][Y[t],0])
cache = (y_hat, a, x)
return loss, cache
It calls the previous RNN_roward_pro_step()
function T times, where T is the number of characters of the input word. At the end, a loss function and the output is returned.
Backprop
Backprop, short for backward propagation, is the process of adjusting the network's weights to get closer and closer to the desired output, i.e. reducing the model loss function. This is done by applying gradient descent.
Detailing the formulas of gradient descent or other viable optimization algorithms is out of this article‘s scope. I will leave instead with this article, which deals exactly with the selection of the right Optimization Algorithm for your Deep Networks.
Choose the Right Optimization Algorithm for your Neural Network
The strategy I used to model the backprop flow is the same as the one for the forward propagation: code the backprop of a single RNN cell and iterate it multiple times.
def RNN_back_prop_step(d_y, grads, parameters, x, a, a_prev):
grads['dW_ya'] += np.dot(d_y, a.T)
grads['db_y'] += d_y
da = np.dot(parameters['W_ya'].T, d_y) + grads['da_next']
da_raw = (1 - a * a) * da
grads['db'] += da_raw
grads['dW_ax'] += np.dot(daraw, x.T)
grads['dW_aa'] += np.dot(daraw, a_prev.T)
grads['da_next'] = np.dot(parameters['W_aa'].T, daraw)
return grads
The above function takes as input the output of the forward propagation and the previous gradients and it computes the updated gradients.
The ideration of the RNN_back_prop_function()
function is performed with these lines of code:
def RNN_back_prop(X, Y, parameters, cache):
# Initialize gradients as an empty dictionary
grads = {}
# Retrieve from cache and parameters
(y_hat, a, x) = cache
W_aa = parameters['W_aa']
W_ax = parameters['W_ax']
W_ya = parameters['W_ya']
b_y = parameters['b_y']
b = parameters['b']
# Initialize gradients
grads['dW_ax'], grads['dW_aa'], grads['dW_ya'] = np.zeros_like(W_ax), np.zeros_like(W_aa), np.zeros_like(W_ya)
grads['db'], grads['db_y'] = np.zeros_like(b), np.zeros_like(b_y)
grads['da_next'] = np.zeros_like(a[0])
# Backpropagate through timesteps
for t in reversed(range(len(X))):
dy = np.copy(y_hat[t])
dy[Y[t]] -= 1
grads = RNN_back_prop_step(dy, grads, parameters, x[t], a[t], a[t-1])
return grads, a
Training the Language Model
At this point, all the model's components are set and ready to be executed. I wrote the following code to put together the functions we saw above.
def RNN_optimization(X, Y, a_prev, parameters, alpha, vocab_size):
# 1. Forward propagation
loss_now, cache = RNN_forward_prop(X, Y, a_prev, parameters, vocab_size)
# 2. Backward propagation
grads, a = RNN_back_prop(X, Y, parameters, cache)
# 3. Clip gradients
grads = clip_grads(grads, 10)
# 4. Update parameters
parameters = update_parameters(parameters, grads, alpha)
return loss_now, parameters, a[len(X)-1]
The actual magic happens in the following code snipped. This is the main part of the whole algorithm and once this function is called, the Language Model will be trained.
def train_model(data, n_a=50, max_iter = 100000):
# Get the list of characters
chars = list(set(data))
# Get the dictionary size (number of characters)
vocab_size = len(chars)
# Get encoding and decoding dictionaries
chars_to_encoding = encode_chars(chars)
encoding_to_chars = decode_chars(chars)
# Get dataset as a list of names and strip, then shuffle the dataset
data = data.split('n')
data = [x.strip() for x in data]
np.random.shuffle(data)
# Define n_x, n_y parameters
n_x, n_y = vocab_size, vocab_size
# Initialize the hidden state
# a_prev = initialize_hidden_state(n_a)
a_prev = np.zeros((n_a, 1))
# Initialize the parameters
parameters = initialize_parameters(n_a, n_x, n_y)
# for k in parameters.keys():
# print('{}: tipo {}, Datatype {}'.format(k, type(parameters[k]), parameters[k].dtype))
# Get current loss function value
loss_now = get_initial_loss(vocab_size, len(data))
# Perform max_iter iteration to train the model's parameters
for iter in range(max_iter):
# print(iter)
# Get the index of the name to pick
name_idx = iter % len(data)
example = data[name_idx]
# Convert encoded and decoded example into a list
example_chars = [char for char in example]
example_encoded = [chars_to_encoding[char] for char in example]
# Create training input X. The value None is used to consider the first input character
# as a vector of zeros
X = [None] + example_encoded
# Create the label vector Y by appending the 'n' encoding to the end of the vector
Y = example_encoded + [chars_to_encoding['n']]
# Perform one step of the optimization cycle:
# 1. Forward propagation
# 2. Backward propagation
# 3. Gradient clipping
# 4. Parameters update
loss_tmp, parameters, a_prev = RNN_optimization(X, Y, a_prev, parameters, alpha=0.01, vocab_size=vocab_size)
# for k in parameters.keys():
# print('{}: tipo {}, Datatype {}'.format(k, type(parameters[k]), parameters[k].dtype))
loss_now = smooth(loss_now, loss_tmp)
return parameters
The result is a model able to mimics the creative process of Tolkien and effortlessly produce unique character names.
Generating Fantasy Character Names
To sample novel character names from our trained Language Model, I developed two functions.
The sample()
function takes as input the network's parameters and the vocabulary mapping the characters to numbers. The idea is to apply the forward propagation steps multiple times until either the ‘n' special character is return, or we reach an upper bound for the number of generated characters (in this case set to 50).
Finally, it returns a list of indices that encode the generated fantasy character name.
def sample(parameters, chars_to_encoding):
W_aa = parameters['W_aa']
W_ax = parameters['W_ax']
W_ya = parameters['W_ya']
b_y = parameters['b_y']
b = parameters['b']
vocab_size = b_y.shape[0]
n_a = W_aa.shape[1]
x = np.zeros((vocab_size,))
a_prev = np.zeros((n_a,))
indices = []
idx = -1
counter = 0
newline_character = chars_to_encoding['n']
while (idx != newline_character and counter != 50):
a = np.tanh(np.dot(W_ax,x)+np.dot(W_aa,a_prev)+np.ravel(b))
z = np.dot(W_ya,a) + np.ravel(b_y)
y = softmax(z)
idx = np.random.choice(list(chars_to_encoding.values()), p=np.ravel(y))
indices.append(idx)
x = np.zeros((vocab_size,))
x[idx] = 1
a_prev = a
counter +=1
if (counter == 50):
indices.append(chars_to_encoding['n'])
return indices
To print the generated fantasy names in a human-readable form, we need to call the get_sample()
function, which takes as input the list of indices previously generated and the decoding dictionary mapping indices to characters.
def get_sample(sample_ix, encoding_to_chars):
txt = ''.join(encoding_to_chars[ix] for ix in sample_ix)
txt = txt[0].upper() + txt[1:]
return txt
With everything in place, you can now appreciate some original fantasy character names, as shown below:

Evaluating Model Performance
During the first phases of the training process, the model is still unable to effectively mimic Tolkien's style. If we sample random names at this stage we obtain pure gibberish such as:
- "Orvnnvfufufiiubx" at iteration 2000 and loss 27.96
- "Aotvux" at iteration 4000 and loss 26.43
We can clearly see how these names don't meet the iconic Middle Earth patterns and sounds. However, by letting the model learn the dataset features for a longer period of time, we start to obtain progressively more plausible generated names:
- "Furun I" at iteration 14000 and loss 21.53
- "Flutto Balger" at iteration 16000 and loss 21.11
Finally, the trained model seems to be able to emulate the characters' name style.
The improvement I described qualitatively can be also quantitatively visualized as follows.
By plotting the loss function through the iterations, we can clearly see how the optimization algorithm progressively tunes the weights and biases in the right direction.

The oscillating behavior of the loss function is motivated by the fact that we are using a single training example in each iteration step to train the model. Using a larger batch would result in a smoother loss curve.
Conclusion
In conclusion, building a Language Model for fantasy name generation has brought us several insights.
We discovered how RNNs and LMs are capable of appreciating sequential dependencies in the data, making them essential elements for NLP and all the tasks that involve text.
Moreover, I cannot enphasize more the importance of an hands-on approach to gain experience in any Data Science related subject. Studying the theory is fundamental, but provides you only half of what you need.
Finally, I want to stretch again the flexibility of the tool we just created. I recommend training it with a different set of names. Try using Disney character names or typical pet names as the input dataset and share what you generate with me!
If you liked this story, consider following me to be notified of my upcoming projects and articles!
Here are some of my past projects:
Ensemble Learning with Scikit-Learn: A Friendly Introduction
Euro Trip Optimization: Genetic Algorithms and Google Maps API Solve the Traveling Salesman Problem
The Birth of Data Science: History's First Hypothesis Test & Python Insights