10 Exciting Project Ideas Using Large Language Models (LLMs) for Your Portfolio
One common piece of advice I often hear for job applicants is to have a portfolio showcasing your work. This doesn't only apply to artists or models but also to software developers and data scientists.
A portfolio of your projects acts as public evidence of your skills. This public evidence can be anything from a blog to open-source contributions to an active engagement on forums such as StackOverflow. But these types of public evidence take a long time to build.
Another type of evidence showcasing your skills is with smaller end-to-end projects.
Another type of evidence showcasing your skills is with smaller end-to-end projects. For data scientists, these can be projects such as exploratory data analysis and data visualization, classical Machine Learning on tabular data, or Deep Learning to classify images.
With the advent of large language models (LLMs) in the form of pre-trained foundation models, such as OpenAI's GPT-3, the opportunities to build cool things with LLMs are endless. And with the emergence of developer tools, the technical barrier is getting lower.
Thus, now is a great time to add a new LLM-powered project to your portfolio!
This article will share 10 side project ideas that utilize LLMs for downstream tasks. Wherever you are in your career, I hope these will inspire you to build something fun while learning about this new technology.
- Cover letter generator
- Chatbot with a personality
- YouTube summarizer
- Information extraction from job postings
- Custom web scraper
- Searchable database of your documents
- Question answering over documents
- Clustering social media posts and podcast episodes into topics
- Classify business inquiries from e-mails
- Where is Waldo?
Projects Based on Text Generation
LLMs are most widely known for their generative capabilities. In this section, we will first discuss some project ideas based on use cases related to them:
- Generative: cover letter generator
- Conversational: chatbot with a personality
- Summarization: YouTube summarizer
- Extraction: Information extraction from job postings
- Rewriting: Custom web scraper
Project Idea 1: Cover Letter Generator
LLMs' main superpower is their ability to generate coherent bodies of text. While there is a lot of discussion in the media of how people are exploiting this technology by creating fully AI-generated blogs to students cheating on homework, this technology is already being widely adapted in copywriting or programming to increase productivity.
Did you say you were looking for a new job opportunity? Then this project might be right up your alley: A cover letter generator.
While you could technically build this only by engineering the perfect prompt and filling it with the relevant information about the role, this will become repetitive work if you want to apply for multiple roles.
Thus, this is a great small project to practice prompt engineering and using prompt templates.
prompt_template = """
Write a cover letter to {contact_person}
from {your_name} for a {role} job at {company_name}.
I have experience in {personal_exp}.
I am excited about the job because {job_desc}.
I am passionate about {passion}.
"""
If you are feeling ready for a challenge, you could also try to extract the relevant information from the job posting to feed to the prompt template (see extraction).

Project Idea 2: Customized Chatbot
You've heard of ChatGPT. I don't need to go into detail here. It's conversational capabilities are pretty impressive. But it lacks personality and has limited information. What if you could give it access to specific knowledge or even a full personality?
The first example is not only a cute and whimsical idea, but it also serves a therapeutic purpose. Michelle Huang built a chatbot based on her diaries to chat with her childhood self.
What's especially cool about this project is that Michelle used this chatbot for her inner child work.
In a "Black Mirror" episode called "Be Right Back" from 2013, the grieving protagonist reconnects with her late boyfriend after learning about a service that lets people stay in touch with the deceased.
Ten years later, you could technically build this on your laptop as a weekend project…
Although this example is a bit morbid, who's to say we won't see this technology help us grieve in the future?
Here are the rough steps you would follow to realize a project like these:
- Collect data from your old diaries or chat history and load into documents
- Feed an LLM the contextual information in the prompt
- Add conversational memory

Project Idea 3: YouTube or Podcast Summarizer
Summarization is another great use case of LLMs – especially with the vast amount of (AI-)generated content (see generator). But we don't only have information in the form of text but also from audio (e.g., podcasts) or video that need summarization.
Content creators often reference older episodes you might have missed. Because most of us don't consume every episode a content creator produces, it might be difficult to get the reference.
For this case, I have often thought that it would be awesome to search for the relevant episode and either listen to the full episode or get the relevant key points from that episode.
Here is an example of how you can summarize YouTube videos:
Create Your Own YouTube Video Summarizer App in Just 3 Easy Steps
If you want to extend this idea, you could make the episodes searchable (see search). This would enable you to ask a content creator's database questions like "In which episode did [content creator] talk about [topic]?
Here are the rough steps you would follow to realize this project:
- Download the video or podcast transcript and load into documents
- Split long documents into chunks
- Summarize the transcript with an LLM
- Optional: Wrap it all in a user-friendly command line interface or even a web application.

Project Idea 4: Information Extraction
Another useful use case of LLMs is information extraction. For example, you can provide an LLM with a few examples that contain text and the information you want it to extract.
Rember the cover letter generator from earlier? You could extend it with a component to extract the relevant information from a job posting directly:
prompt = """
This program will extract relevant information from a job posting.
Here are some examples:
Job posting:
Lead engineer for software integration (remote possible)
At XYZ Co. we are making the world a better place.
To do so we are looking for a lead engineer with experience in Python and JIRA.
Extracted Text:
Role: Lead engieer for software integration.
Company: XYZ Co.
Requirements: Python, JIRA
--
Job posting:
Senior software engineer - Autonomous Mobility
ABC Inc. is a great company.
We are looking for someone with great ability to write complex C code.
Extracted Text:
"""
Here are the rough steps you would follow to realize this project:
- Load job description from job posting into a document
- Extract the relevant information with the LLM by prompt engineering a prompt using examples

Project Idea 5: Web Scraper
LLMs are exceptional at rewriting (transforming) texts, such as
- rewriting text in a specific style (e.g., the style of "The Economist" or "New Yorker")
- rewriting text in a specific reading level (e.g., level grade 6 for easier readability)
- reformatting information from any format to any other format
- text correction (e.g., spelling and grammar)
- translations
It is very common to use LLMs to convert text from one form to another.
A creative idea to apply this rewriting capability is to use it for web scraping. If you have ever written a web scraper, you know how tedious it is. What if you could use LLMs to build a more generic solution to extract data from unstructured websites?
This is exactly what mangotree has done:
I built an LLM-powered tool that can comprehend any website structure and extract the desired data…
Here are the rough steps you would follow to realize this project:
- Scrape the website's source code and load into a document
- Split long documents into chunks
- Extract the relevant data from the source code using the LLM (see extraction)
- Reformat the extracted data into the desired format with the LLM by prompt engineering a prompt using examples

Projects Based on Text Representation
The project ideas so far were based on the idea of generating new text. But another use case of LLMs is based on the idea of text representations. You can input the text to an embeddings model and extract the numerical representation of this text – the "text embeddings".
These text embeddings enable you to perform mathematical operations, including similarity calculations, or apply Machine Learning algorithms.
In this section, we will discuss some project ideas based on use cases related to them:
- Search and similarity: searchable database of your documents
- Question answering: question answering over documents or code base
- Clustering: clustering social media posts and podcast episodes into topics
- Classification: classify business inquiries from e-mails
Project Idea 6: Searchable Database of Your Documents
Embeddings can help us search for content based on similarity. In contrast to keyword-based search engines, we can calculate the similarity of a document's embeddings to the embeddings of a search query.
For example, you could turn your personal documents into a searchable database:
How I Turned My Company's Docs into a Searchable Database with OpenAI
Another neat project is Andrej Karpathy's weekend hack that enables you to search for a specific movie:
Here are the rough steps you would follow to realize a project like these:
- Load the files into documents
- Split long documents into chunks
- Generate and store the embeddings from the documents with an embeddings model
- Define the index query to retrieve the relevant files

Project Idea 7: Question Answering over Documents
Question answering can be viewed as a combination of search (see search) and summarization (see summarization). It can help work through any document in a more intuitive way.
You can use it to chat with your documents or any code base:
Here are the rough steps you would follow to realize this project:
- Load source code into documents
- Split long documents into chunks
- Generate and store the embeddings from the documents with an embeddings model
- Define the index query to retrieve context and prompt the LLM on it

Project Idea 8: Clustering Documents into Topics
Aside from querying documents or information from said documents, you can also use embeddings to put documents into categories by using clustering (unsupervised learning).
For example, you can use clustering to find topics in a podcast episode.
Or you can cluster posts on an online forum into topics.
Here are the rough steps you would follow to realize a project like these:
- Load content into documents
- Split long documents into chunks
- Generate and store the embeddings from the documents with an embeddings model
- Use the embeddings as the input for a clustering algorithm

Project Idea 9: Classifying Inquiries
Similarly to clustering, you can put documents into categories in a supervised fashion with classification.
For example, you could classify eCommerce inquiries into categories, such as e.g., shipping, returns, and tracking.
Here are the rough steps you would follow to realize this project:
- Load e-mails into documents
- Generate and store the embeddings from the documents with an embeddings model
- Use the embedding to train a classifier

Advanced Projects (Project Idea 10: Where is Waldo)
If the above projects sound too easy for you, how about combining an LLM with the recently released Segment Anything model (SAM) from Meta AI to find Waldo?
Summary
Now is your turn to get crackin'!
While most of the above examples are using OpenAI's GPT-3 or higher, you need to know that you need to have a paid subscription to use the OpenAI API. For small experimentation, the OpenAI API is affordable but keep in mind that there will be inference costs if you decide to deploy your project to a web server.
If you want a more cost-efficient solution, you can use open-source foundation models hosted on HuggingFace. However, you will most likely notice some performance loss.
Once you have decided on a foundation model, you can start by digging deeper into the following concepts:
- loading documents and processing them,
- prompt engineering and prompt templates (including few-shot prompt templates),
- conversational memory,
- embeddings, and vector databases
depending on the project you chose.
Many different tools are emerging at the moment to help you build LLM-powered applications, such as LlamaIndex or LangChain. Below you can find a quick introduction to LangChain to get started building your first LLM-powered application:
Getting Started with LangChain: A Beginner's Guide to Building LLM-Powered Applications
I hope this list of LLM-powered applications has inspired you to build your own. I'd love to hear what creative ideas you come up with!

Enjoyed This Story?
Subscribe for free to get notified when I publish a new story.