Running Local LLMs is More Useful and Easier Than You Think

Author:Murphy  |  View: 28212  |  Time: 2025-03-22 20:52:45

Image generated by AI by Author

1 Why Local LLMs

ChatGPT is great, no doubt about that, but it comes with a significant drawback: everything you write or upload is stored on OpenAI's servers. Although this may be fine in many cases, when dealing with sensitive data this might become a problem.

For this reason, I started exploring open-source LLMs which can be run locally on personal computers. As it turns out, there are actually many more reasons why they are great.

  1. Data Privacy: your information stays on your machine.
  2. Cost-Effective: no subscription fees or API costs, they are free to use.
  3. Customization: models can be fine-tuned with your specific system prompts or datasets.
  4. Offline Functionality: no internet connection is required.
  5. Unrestricted Use: free from limitations imposed by external APIs.

Now, setting up a local LLM is surprisingly straightforward. This article provides a step-by-step guide to help you install and run an open-source model on your local machine.

Let's get started!


2 Installing Ollama and Running Llama 3

Ollama is an open-source project which allows to easily run Large Language Models (LLMs) locally on personal computers. It is known for being very user-friendly, super lightweight and offers a wide range of different pre-trained models – including the latest and greatest from Meta (Llama 3) and Google (Gemma 2). All these companies spent millions of dollars training these models so that we can play around with them locally on our own machines. Isn't that amazing?

Ollama in itself is nothing more than an empty shell and needs an of the shelf Llm to function.

Before diving into the installation process let's have a look at the available models:

Non exhaustive list of available models in Ollama – Screenshot from Ollama.com

And there are many more!

For the purpose of this article, I am focusing on the latest model by Meta, called Llama 3, which promises amazing performance. At the time of writing this article it is the most popular model on the platform with 4.4M pulls.

Llama 3 model – Screenshot from Ollama.com

The following steps show how to install Ollama on your computer, feed it with Llama3 and eventually use that model as you would use ChatGPT.

STEP 1/2:

  1. Go to ollama.com and click "Download" – I'm on macOS so I'll focus on this option in the rest of the tutorial, although it should not be very different with Linux or Windows.
  2. Click "Download for macOS".
Steps 1 and 2 – Screenshots from Ollama.com

STEP 3/4/5: As with any other apps, just follow the very straightforward installation steps.

  1. Click "Install".
  2. Click "Next".
  3. Run "ollama run llama3" in your terminal.
Steps 3, 4 and 5— Screenshots from Ollama.com

This last step will first download the 8B version of llama3 on your computer (about 4.7GB) and will then run it. As simple as that!

Downloading and running Llama 3 – Image by Author

And this article could stop right here. A few clicks and a line of code later, here we are running an LLM locally!

You can ask it anything, like explaining the differences between the 8 billion and 70 billion parameters versions of Llama 3.

Example prompt with llama3 – Image by Author

The response time of the model will typically depend on the GPU / RAM of your computer.


3 A Few Useful Commands

If you want to keep using LLMs locally within the terminal, there are a few basic commands which seem essential to have in mind:

  • ollama run llama3 Run the model, in this case llama3

  • ollama list List all the models already installed locally

  • ollama pull mistral Pull another model available on the platform, in this case mistral

  • /clear (once the model is running) Clear the context of the session to start fresh

  • /bye (once the model is running) Exit ollama

  • /? (once the model is running) List all the available commands

Many more commands exist for more complex use cases like creating new fine-tuned models.

the Github repo of Ollama is a very complete documentation.

GitHub – ollama/ollama: Get up and running with Llama 3, Mistral, Gemma 2, and other large language…

For basic use cases the CLI might be enough, however, there's more…


4 Llama 3 in Jupyter Notebook

Using LLMs through the terminal is fine but interacting with the models through a Python code opens many more possibilities.

In order to do so, we need to install the langchain_community library with pip (_pip install langchaincommunity) and import the Ollama package.

Let's say I want to create a short bio for a person providing her name, age and occupation. In this example the code will look like this:

# !pip install langchain_community

# Import the necessary package
from langchain_community.llms import Ollama

# Create a model instance
llm = Ollama(model="llama3")

# Use the model with a prompt
llm.invoke("Generate a short, 2-sentence bio for Alice, who is 25 years old and works as a Engineer")

Pretty straightforward!

Here is the result:

"Here is a possible bio for Alice:nnAlice is a 25-year-old engineer with a passion for innovative problem-solving and a knack for bringing complex ideas to life. With a strong foundation in mechanical engineering and a keen eye for detail, she's dedicated to crafting solutions that make a real-world impact."

This output could be further refined by removing the introductory piece with regular expressions.

Generating a bio for a single person can easily be done within the terminal, however doing the same task for a list of many people becomes tedious without Python. With Python, the prompts can be parameterized and the process can be automated for many individuals.

For example:

import pandas as pd

# Create a sample DataFrame
df = pd.DataFrame({
    'name': ['Alice', 'Bob', 'Charlie'],
    'age': [25, 30, 35],
    'occupation': ['Engineer', 'Teacher', 'Artist']
})

# Create a function which can be applied to the dataframe
def generate_bio(name, age, occupation):
    prompt = f"Generate a short, 2-sentence bio for {name}, who is {age} years old and works as a {occupation}"
    return llm.invoke(prompt)

# Apply the function to the dataframe
df['bio'] = df.apply(lambda row: generate_bio(row['name'], row['age'], row['occupation']), axis=1)

df.head()

Now, for each line of the DataFrame the model generates a bio!


5 Final Thoughts

The intent of this article was to highlight the simplicity of implementing a fully functional LLM locally thanks to Ollama.

Whether we use this model through the terminal for simple asks or with Python for more complex / automated tasks, the process is straightforward.

You could even have your own ChatGPT like graphic interface thanks to open-source projects like Open WebUI.

I personally find it remarkable that with just a few clicks and lines of code, we can get our hands on something so useful! Hopefully you enjoyed it too.

Thanks for reading all the way to the end of the article. Follow for more! Feel free to leave a message below, or reach out to me through LinkedIn / X if you have any questions / remarks!

Get an email whenever Guillaume Weingertner publishes.

Tags: AI Llama 3 Llm Ollama Python

Comment