How Many Keys Are Enough to Play the Piano?

Author:Murphy  |  View: 26830  |  Time: 2025-03-22 23:38:12

Learning to play the piano is a challenging and fun process, but every novice will have a dilemma: what kind of instrument to buy? The choice in the market is wide, from the tiny two-octave instruments like in the photo above to 70kg full-size keyboards in a wooden cabinet. On the one hand, having extra keys will never hurt; on the other hand, there is always a trade-off between size, weight, and price.

Obviously, one of the ways is to just ask a teacher about what he or she can recommend, but maybe most musicians just never thought about music from that perspective. Is there any more quantitative way to find an answer? Actually, the answer is yes; we can easily make a musical note distribution using Python.

This tutorial can be useful for beginners in Data Science; it does not require any complex math or libraries, and the results are easy to interpret. It can also be useful for those who want to learn to play music but have not yet decided what kind of instrument to buy.

Let's get into it!

Data Source

For data analysis, I will be using MIDI files. It's a pretty old format; the first MIDI (Musical Instrument Digital Interface) specification was published in 1983. The key feature of MIDI files is that they store the Music not as raw audio but in the "original" notation form. Every record in a midi file contains the instrument type, pitch, timing, and other parameters. For example, I can open a Bagatelle in C minor written by Beethoven in a free MuseScore application and see something like this:

MuseScore editor, Screenshot by author

Let's open the same file in Python and dump its content:

import mido  # pip3 install mido

mid = mido.MidiFile("Beethoven/Bagatelle.mid", clip=True)
for ind, track in enumerate(mid.tracks):
    print(f"Track {ind}")
    for item in track[:10]:
        print(item)

The output looks like this (a full file is longer; here, I print only the first lines from each track):

Track 0
MetaMessage('track_name', name='L.v.Beethoven Bagatelle in C minor WoO 52', time=0)
MetaMessage('text', text='XFhd:2001/08/21:JP:Classical::::L.v.Beethoven::', time=0)
MetaMessage('time_signature', numerator=3, denominator=4, clocks_per_click=96, notated_32nd_notes_per_beat=8, time=0)
MetaMessage('sequencer_specific', data=(0, 0, 65), time=0)
...
Track 1
MetaMessage('track_name', name='Right', time=0)
MetaMessage('key_signature', key='Eb', time=0)
note_on channel=0 note=72 velocity=80 time=3072
note_off channel=0 note=72 velocity=100 time=380
note_on channel=0 note=72 velocity=80 time=4
...
Track 2
MetaMessage('track_name', name='Left', time=0)
MetaMessage('key_signature', key='Eb', time=0)
note_on channel=1 note=43 velocity=100 time=768
note_off channel=1 note=43 velocity=100 time=380
note_on channel=1 note=43 velocity=100 time=4
...

As we can see, the file contains three tracks, the first has only the metadata, and the other two contain some extra data and musical notes. To find how many different notes the music piece has, we will use onlynote_on records. Each MIDI note has a digital code in the 21..108 range:

MIDI note codes, Image source Wikipedia

Using this information, let's make a function that extracts all notes from a MIDI file into the Python list:

def extract_midi_notes(file_path: str) -> List[int]:
    """ Get all notes from a MIDI file """
    notes = []
    mid = mido.MidiFile(file_path, clip=True)
    for track in mid.tracks:
        for item in track:
            # Example:
            # {'type': 'note_on', 'time': 241, 'note': 67, 'velocity': 56, 'channel': 0}
            item_data = item.dict()
            if item_data["type"] == "note_on":
                note = item_data["note"]
                velocity = item_data["velocity"]
                if velocity > 0:
                    notes.append(note)
    return notes

As a next step, let's create another function that processes all MIDI files in a folder:

def get_notes(files_mask: str):
    """ Get all notes from several files """
    midi_notes = []
    for file_path in glob.glob(files_mask):
        midi_notes += extract_midi_notes(file_path)
    return midi_notes

We will also need some MIDI files. These files, especially for classical music (which is in the public domain), are often available for free, and it is easy to find them. For my analysis, I was using files from http://www.piano-midi.de (CC BY-SA German license).

Distribution

At this point, we have a method that extracts all musical notes from a set of MIDI files. We can easily see which notes are used more often using a histogram in Matplotlib:

import matplotlib.pyplot as plt

# Get data
files_mask = "Tchaikovsky/*.mid"
midi_notes = get_notes(files_mask)

# Draw
fig, ax = plt.subplots(figsize=(7, 5))
plt.hist(midi_notes, density=False, bins=60, range=[21, 108], edgecolor='#505070')
plt.ylabel('Count')
plt.xlabel('MIDI Notes')
ax.spines['top'].set_color('#EEEEEE')
ax.spines['right'].set_color('#EEEEEE')
ax.spines['bottom'].set_color('#888888')
ax.spines['left'].set_color('#888888')
plt.tight_layout()
plt.title(files_mask.split("/")[0])
plt.show()

The output looks like this:

Notes distribution, Image by author

The plt.hist method is doing all the needed jobs to calculate value distribution from the list and plot it. As we can see, the distribution looks normal. The most used keys are located in the center of the keyboard, which makes sense.

Visualization

KeyboardTechnically speaking, the problem is solved, and using only about 20 lines of Python code, we can see the note distribution for different music pieces. But practically, it's not convenient, especially for musicians, and we can do it much better. A Matplotlib library can draw more than just scientific graphs. Let's draw a piano keyboard:

fig, ax = plt.subplots(figsize=(14, 3))

def draw_patch(ax, x: int, y: int, w: int, h: int, color: str, alpha=0.6):
    """ Draw a patch in specific coordinates and color """
    #  H
    #  |
    # (xy)--W
    ax.add_patch(Rectangle((x, y), w, h,
                            facecolor=color, edgecolor=None,
                            alpha=alpha, fill=True,
                            linewidth=1, zorder=2))

def draw_black_note(ax, x_pos: int, key_width: int, keyb_height: int):
    """ Draw a black note in a specific X-position """
    note_w = 0.4*key_width
    ax.plot([x_pos - note_w, x_pos - note_w, x_pos + note_w, x_pos + note_w],
            [keyb_height-1, 0.25*keyb_height, 0.25*keyb_height, keyb_height-1], color="#111111")

    draw_patch(ax, x_pos - note_w, 0.25*keyb_height, 2*note_w, 0.75*keyb_height, color="#000000")

num_octaves = 9
note_per_octave = 7
key_width, key_height = 20, 160

# Key names
key_names = ["C", "D", "E", "F", "G", "A", "B"]

# Draw frame
keyb_width = num_octaves*note_per_octave*key_width
ax.plot([0, keyb_width, keyb_width, 0, 0], [0, 0, key_height, key_height, 0], color="#0000AA")

# Draw octaves
for oct_num in range(num_octaves):
    oct_xpos = oct_num*key_width*7
    # Octave separator
    ax.plot([oct_xpos, oct_xpos], [0, key_height-1], color="#0000AA")
    # Note separator
    for note_num in range(1, 7):
        x_pos = oct_num*key_width*7 + note_num*key_width
        ax.plot([x_pos, x_pos], [1, key_height-1], color="#EEEEEE")

    # Black keys
    for black_pos_ind in [1, 2, 4, 5, 6]:
        x_pos = oct_xpos + black_pos_ind*key_width
        draw_black_note(ax, x_pos, key_width, key_height)

    # White keys
    for note_ind in range(note_per_octave):
        x_pos = oct_num*key_width*7 + note_ind*key_width
        key_name = f"{key_names[note_ind]}{oct_num}"

        # Text
        ax.text(x_pos + key_width/2, 0.1*key_height, key_name, verticalalignment='bottom',
                horizontalalignment='center', color='black', fontsize=9)

plt.axis("off")
plt.xlim(5*key_width - 1, 57*key_width + 1)
plt.tight_layout()
plt.show()

The output looks like this, which is much better than a histogram:

Piano keyboard in Matplotlib, Image by author

HeatmapThe next step is to visualize the distribution. A convenient way to do it is to use a heatmap that we can draw over our keyboard. First, we need a method to convert key names (C3, D3, etc.) to MIDI codes. It's just a simple linear conversion:

def get_midi_code(key_name: str) -> int:
    """ Convert key name to the MIDI code. 'E3' => 52 """
    # MIDI codes: C0 - 12, D0 - 14, etc; 12 keys in octave
    key_names = ["C", "C#", "D", "D#", "E", "F", "F#", "G", "G#", "A", "A#", "B"]
    # "E1" => E, 1
    name, octave = key_name[0], int(key_name[1])
    return (octave + 1)*12 + key_names.index(name)

Second, we can use Python's Counter class to know how often each note was used:

midi_notes = get_notes("Tchaikovsky/*.mid")
notes_counter = Counter(midi_notes)

most_common = notes_counter.most_common(1)[0]
print(f"Most common MIDI code {most_common[0]}, was used {most_common[1]} times")
#> Most common MIDI code 61, was used 171 times

midi_code = get_midi_code("C4")
print(f"C4 note was used {notes_counter[midi_code]} times")
#> C4 note was used 123 times

Using this code, we can use an occurrence frequency for each note as an alpha channel for a heatmap:

for note_ind in range(note_per_octave):
    x_pos = oct_num*key_width*7 + note_ind*key_width
    key_name = f"{key_names[note_ind]}{oct_num}"

    midi_code = get_midi_code(key_name)
    # Number as a heatmap
    use_count = notes_counter[midi_code]
    if use_count > 0:
        color = "#FF0000"
        max_note_count = most_common[1]
        alpha = use_count/max_note_count
        margin = 0.1*key_width
        draw_patch(ax, x_pos + margin, 0.02*key_height, key_width - 2*margin,
                   0.2*key_height, color, alpha=alpha)

    # Number as text
    ax.text(x_pos + key_width/2, 0.05*key_height, str(use_count),
            verticalalignment='bottom', horizontalalignment='center',
            color='#333333', fontsize=7)

The same logic applies to the black keys; for the sake of simplicity, I'll omit this code here.

Results

Let's finally answer the question in the title of the story. How many piano keys do we need to learn to play the piano?

To get an answer, we first need to find a proper repertoire. Lots of books with piano lessons are available on Amazon, and usually, their first pages can be viewed for free, which is enough to collect the list of titles. With a title, it is easy to find a MIDI file. Well, let's see some results.

The first year of study: Good King Wenceslas (a Christmas carol):

Screenshot by author

As we can see, the melody is simple, it mostly uses only white keys, and the real note range is about 3 octaves. But there are no pianos with a 3-octave range available, and the closest one has four octaves.

A second year of study: Bagatelle in C minor, WoO 52 (Beethoven)

Screenshot by author

As we can see, it's more complex, more black keys are used, and the range of this piece is about 4.5 octaves.

A third year of study: Consolations, S. 172 (Franz Liszt)

Screenshot by author

In this case, the range is wider; almost all 7 octaves are required to play this piece.

Conclusion

In this article, I showed how to process MIDI files and make a musical note distribution in the form of a histogram and piano keyboard heatmap. It can be useful not only to answer a question like "How many keys do we need?". Converting a melody or the list of files to a heatmap allows us to compare the styles of different pieces, different composers, or different genres, like classical and pop. It can be part of a larger analysis in the musicology or cultural anthropology domains. For example, this is a note distribution in J.S. Bach's music, and it is clearly visible that the range is smaller:

J.S. Bach note distribution, Image by author

Readers may wonder why, and the answer is easy: Bach was writing music for the harpsichord, and the modern piano was not invented yet at that time. It is interesting that now, more than 300 years later, we can easily see this in the heatmap!

As for piano lessons, I must admit that my initial idea that the instrument with a 4-octave range is enough to learn to play the piano is wrong. It can be okay for the first year of education, but as we can see from the heatmap, at least a 6-octave range is required to play music, even during the second year of study.

Those who are interested in social data analysis are also welcome to read other articles:

If you enjoyed this story, feel free to subscribe to Medium, and you will get notifications when my new articles will be published, as well as full access to thousands of stories from other authors. Full source code and a Jupyter notebook for this article are also available on my Patreon page.

Thanks for reading.

Tags: Data Science Data Visualization Hands On Tutorials Music Programming

Comment