A Weekend AI Project: Running Speech Recognition and a LLaMA-2 GPT on a Raspberry Pi

Author:Murphy  |  View: 29823  |  Time: 2025-03-22 23:11:05
Raspberry Pi running a LLaMA model, Image by author

Nowadays, nobody will be surprised by running a deep learning model in the cloud. But the situation can be much more complicated in the edge or consumer device world. There are several reasons for that. First, the use of cloud APIs requires devices to always be online. This is not a problem for a web service but can be a dealbreaker for the device that needs to be functional without Internet access. Second, cloud APIs cost money, and customers likely will not be happy to pay yet another subscription fee. Last but not least, after several years, the project may be finished, API endpoints will be shut down, and the expensive hardware will turn into a brick. Which is naturally not friendly for customers, the ecosystem, and the environment. That's why I am convinced that the end-user hardware should be fully functional offline, without extra costs or using the online APIs (well, it can be optional but not mandatory).

In this article, I will show how to run a LLaMA GPT model and automatic speech recognition (ASR) on a Raspberry Pi. That will allow us to ask Raspberry Pi questions and get answers. And as promised, all this will work fully offline.

Let's get into it!

The code presented in this article is intended to work on the Raspberry Pi. But most of the methods (except the "display" part) will also work on a Windows, OSX, or Linux laptop. So, those readers who don't have a Raspberry Pi can easily test the code without any problems.

Hardware

For this project, I will be using a Raspberry Pi 4. It is a single-board computer running Linux; it is small and requires only 5V DC power without fans and active cooling:

Raspberry Pi 4, Image source Wikipedia

A newer 2023 model, the Raspberry Pi 5, should be even better; according to benchmarks, it's almost 2x faster. But it is also almost 50% more expensive, and for our test, the model 4 is good enough.

As for the RAM size, we have two options:

  • A Raspberry Pi with 8 GB of RAM allows us to run a 7B LLaMA-2 GPT model, whose memory footprint in a 4-bit quantization mode is about 5 GB.
  • A 2 or 4 GB device allows us to run a smaller model like TinyLlama-1B. As a bonus, this model is also faster, but as we will see later, its answers can be a bit less "smart."

Both models can be downloaded from HuggingFace, and in general, almost no code changes will be required.

The Raspberry Pi is a full-fledged Linux computer, and we can easily see the output in the terminal via SSH. But it is not so fun and not suitable for a mobile device like a robot. For the Raspberry Pi, I will be using a monochrome 128×64 I2C OLED display. This display needs only 4 wires to connect:

I2C OLED display connection, Image made by author in Fritzing

A display and wires can be obtained on Amazon for $5–10; no soldering skills are required. An I2C interface must be enabled in the Raspberry Pi settings; there are enough tutorials about that. For simplicity reasons, I will omit the hardware part here and focus only on Python code.

Display

I will start with the display because it's better to see something on the screen during the tests. An Adafruit_CircuitPython_SSD1306 library allows us to display any image on the OLED display. This library has a low-level interface; it can only draw pixels or a monochrome bitmap from a memory buffer. To use scrollable text, I created an array that stores the text buffer and a method _display_update that draws the text:

from PIL import Image, ImageDraw, ImageFont
try:
    import board
    import adafruit_ssd1306
    i2c = board.I2C()
    oled = adafruit_ssd1306.SSD1306_I2C(pixels_size[0], pixels_size[1], i2c)
except ImportError:
    oled = None

char_h = 11
rpi_font_poath = "DejaVuSans.ttf"
font = ImageFont.truetype(rpi_font_poath, char_h)
pixels_size = (128, 64)
max_x, max_y = 22, 5
display_lines = [""]

def _display_update():
    """ Show lines on the screen """
    global oled
    image = Image.new("1", pixels_size)
    draw = ImageDraw.Draw(image)
    for y, line in enumerate(display_lines):
        draw.text((0, y*char_h), line, font=font, fill=255, align="left")

    if oled:
        oled.fill(0)
        oled.image(image)
        oled.show()

Here, a (22, 5) variable contains the number of rows and columns we can display. The oled variable can also be None if the ImportError occurs; for example, if we run this code on a laptop instead of the Raspberry Pi.

To emulate text scrolling, I also created two helper methods:

def add_display_line(text: str):
    """ Add new line with scrolling """
    global display_lines
    # Split line to chunks according to screen width
    text_chunks = [text[i: i+max_x] for i in range(0, len(text), max_x)]
    for text in text_chunks:
        for line in text.split("n"):
            display_lines.append(line)
            display_lines = display_lines[-max_y:]
    _display_update()

def add_display_tokens(text: str):
    """ Add new tokens with or without extra line break """
    global display_lines
    last_line = display_lines.pop()
    new_line = last_line + text
    add_display_line(new_line)

The first method is adding a new line to the display; if the string is too long, the method automatically splits it into several lines. A second method is adding a text token without "carriage return"; I will use it to display the answer from a GPT model. The add_display_line method allows us to use code like this:

for p in range(20):
    add_display_line(f"{datetime.now().strftime('%H:%M:%S')}: Line-{p}")
    time.sleep(0.2)

If everything was made correctly, the output should look like this:

OLED Display, Image by author

There are also special libraries for external displays like Luma-oled, but our solution is good enough for the task.

Automatic Speech Recognition (ASR)

For ASR, I will be using a Transformers library from HuggingFace

Tags: DIY Electronics Llm Programming Speech Recognition

Comment