Craft A Customized Word Cloud Trivia Game with Python

Author:Murphy  |  View: 29616  |  Time: 2025-03-22 22:29:32

Quick Success Data Science

Game Night word cloud (by the author)

Are you tired of the same old board games on game night? If you know a little Python, you can easily make a customized trivia game with word clouds.

A word cloud is a visual representation of text data used to display keyword metadata, called tags on websites. In a word cloud, font size or color shows the importance of each tag or word.

Here are three example word cloud quizzes; can you guess the two movies and the one song (the answers are at the end of this article):

A movie word cloud (by the author)
Another movie word cloud (by the author)
A pop song word cloud (by the author)

You can use this technique to generate customized quizzes for movies, music, novels, historical events, and more. They're also adaptable to more serious applications such as training exercises. Best of all, they're customizable to any subject you desire.

In this Quick Success Data Science project, we'll make quiz cards and answer sheets for a movie trivia game. As part of the process, we'll scrape the data straight off Wikipedia movie pages. Using this basic template, you should be able to adapt the program to other uses.


Installing Libraries

In addition to Python, you'll need the following libraries:

You can find installation instructions for pip in the previous links.

If you're an Anaconda user, create and activate a conda environment and enter the following in the command line:

conda install numpy matplotlib pillow requests beautifulsoup4

followed by:

pip install wordcloud


The Code

The following code was written in JupyterLab and can be downloaded from this Gist.

Importing Libraries and Adding Links and Stop Words

The following code imports the third-party libraries, creates a dictionary of movie names and the link to their Wikipedia pages, and creates a set of stop words. Stop words are short, non-contextual words (such as "so," "if," and "the") that we don't want to clutter our word clouds.

import matplotlib.pyplot as plt
from matplotlib import patches  
import requests
from bs4 import BeautifulSoup
from wordcloud import WordCloud, STOPWORDS

# Create a dictionary of movie Wikipedia pages:
urls = {'avengers infinity war': 'https://w.wiki/3hxu',
        'avengers end game': 'https://w.wiki/3hHY',
        'deathly hallows 1': 'https://w.wiki/9PuP',
        'deathly hallows 2': 'https://w.wiki/8u8Y'}

# Capture stopwords for later removal:
stopwords = set(STOPWORDS)
# stopwords.update(['us', 'one']  # Add additional stopwords if needed. 

# colormap = 'Dark2'  # Option to change default color scheme.

To get the URLs, I navigated to the appropriate Wikipedia page and then clicked the Tools menu, followed by Get shortened URL. This keeps the line lengths from becoming long and unwieldy.

Note also that you can add stop words to the pre-populated list obtained from the word cloud library. Just uncomment the stopwords.update() line in the previous cell and add the new words to the list. You'll only need to do this if you find a useless word has slipped through in some of the word clouds.

Finally, if you want to change the word cloud's color palette, uncomment the last line and provide a color map. You can find example color maps at this site.

Defining a Function to Extract Text from Wikipedia

The following code defines a function to load a Wikipedia page URL, extract the film synopsis under the "Plot" header, and return the text as a string. We'll use this text to build the word cloud.

def extract_plot_text(url):
    """Extract text from 'Plot' section of Wikipedia film page."""
    response = requests.get(url, timeout=10)  # 10 second timeout.
    soup = BeautifulSoup(response.content, 'html.parser')    
    plot_header = soup.find('span', {'id': 'Plot'})

    if plot_header:
        plot_text = ''
        next_element = plot_header.find_next()

        while next_element and next_element.name != "h2":
            if next_element.name == "p":
                plot_text += next_element.get_text() + "n"
            next_element = next_element.find_next()

        return plot_text.strip()

The function uses the requests library to get the URL and BeautifulSoup to parse the output. We use BeautifulSoup to find the spans with an ID of "Plot." If the plot header is found, we initiate a string (plot_text) to hold the text.

Next, we start looping through the results, finding paragraphs (tagged as p) and concatenating them to the string. We then use the strip() string method to remove leading and trailing whitespace characters before returning the string.

Defining a Function to Make the Word Cloud

Next, we define a function that uses the word cloud library to make a word cloud from the extracted text.

def make_wordcloud(text):
    """Return a word cloud object for a corpus."""
    return WordCloud(max_words=50, 
                     width=800,
                     height=500,
                     relative_scaling=0.2, 
                     mask=None,
                     background_color='white', 
                     stopwords=stopwords, 
                     margin=5, 
                     random_state=1).generate(text)

Some of the key parameters here are max_words, which determines the number of words in the cloud, stopwords, which filters out our set of stop words, and random_state, which sets the random seed number so that we can reproduce the word cloud. For a list and description of all the parameters available, visit this site.

If you choose to use a non-default colormap, as mentioned previously, you'll want to add the following argument when calling WordCloud():

colormap=colormap

Defining a Function to Draw a Figure Outline

Now we define a function that uses Matplotlib patches to draw a rectangular outline around each word cloud.

def add_outline_to_figure(fig):
    """Add a black outline to the given figure."""
    rect = patches.Rectangle((0, 0), 1, 1, 
                             transform=fig.transFigure, 
                             fill=False, 
                             color='black', 
                             linewidth=2, 
                             zorder=1000)
    fig.add_artist(rect)

This function works directly on an existing figure and thus returns nothing.

Defining a Function to Generate the Quiz

The final function brings it all together to make the word cloud cards and an answer key. The cards are saved as PNG files and the answer key as a text file.

def make_quiz(url_dict):
    """Generate final figures and return answer key."""
    answers = []

    for i, (key, value) in enumerate(url_dict.items()):
        answers.append((i + 1, key))
        plot = extract_plot_text(value)
        wc = make_wordcloud(plot)

        # Convert cloud into NumPy array to use with matplotlib:
        colors = wc.to_array()  

        # Make the word cloud figure:
        fig = plt.figure()
        plt.title(f'Quiz #{i + 1}')
        plt.imshow(colors, interpolation="bilinear")
        plt.axis("off")
        plt.tight_layout()

        # Add outline with dimensions of the figure:
        add_outline_to_figure(fig)

        # Save and show figure:
        fig.savefig(f'{key}.png', dpi=600)
        plt.show()

    return answers

Note that we add a title to each card, such as, "Quiz 1," so that we can match it to the answer key.

Running the Program

The following code runs the program by first calling the previous function and then saving the answer key as a text file.

# Generate the figures and answer key:
answer_key = make_quiz(urls)

# Save the answers as a text file:
with open('answer_key.txt', 'w') as f:
    for item in answer_key:
        print(f"Quiz {item[0]}: {item[1]}", file=f)

Here's an example of the answer key:

The answer key (by the author)

And here are the word cloud cards:

Avengers Infinity War word cloud (by the author)
Avengers End Game word cloud (by the author)
Harry Potter and the Deathly Hallows Part 1 word cloud (by the author)
Harry Potter and the Deathly Hallows Part 2 word cloud (by the author)

Summary

Word clouds provide a quick way to generate customizable quizzes for games or training exercises. With Python, you can easily create a pipeline for scraping data off the internet and using it to populate the word clouds.


Answers

  1. How to Train Your Dragon
  2. Prince of Persia
  3. Donald Fagan's "IGY"

Thanks!

Thanks for reading and please follow me for more Quick Success Data Science projects in the future.

Tags: Harry Potter Python Programming Trivia Games Web Scraping Word Cloud

Comment