The Three Essential Methods to Evaluate a New Language Model
What is this about? New LLMs are released every week, and if you’re like me, you might ask yourself: Does this one finally fit all the use cases I want to utilise an LLM for? In this tutorial, I will share the techniques that I use to evaluate new L- 25946Murphy ≡ DeepGuide
Building a maintainable and modular LLM application stack with Hamilton
LLM Applications are dataflows, use a tool specifically designed to express them.- 24397Murphy ≡ DeepGuide
LLMOps: Production prompt engineering patterns with Hamilton
An overview of the production grade ways to iterate on prompts with Hamilton.- 28868Murphy ≡ DeepGuide
How to Measure the Success of Your RAG-based LLM System
Including a new novel method for judging answers with a qualitative score and detailed explanation.- 26720Murphy ≡ DeepGuide
Line-By-Line, Let's Reproduce GPT-2: Section 3 – Training
This blog post will go line-by-line through the code in Section 3 of Andrej Karpathy's "Let's reproduce GPT-2 (124M)"- 24390Murphy ≡ DeepGuide
Retrieval Augmented Generation (RAG) Inference Engines with LangChain on CPUs
Exploring scale, fidelity, and latency in AI applications with RAG- 24954Murphy ≡ DeepGuide
Exploring mergekit for Model Merge, AutoEval for Model Evaluation, and DPO for Model Fine-tuning
My observations from experimenting with model merge, evaluation, and two model fine-tuning techniques- 29751Murphy ≡ DeepGuide
Building an LLMOPs Pipeline
Utilize SageMaker Pipelines, JumpStart, and Clarify to Fine-Tune and Evaluate a Llama 7B Model- 21251Murphy ≡ DeepGuide
Top Evaluation Metrics for RAG Failures
Troubleshoot LLMs and Retrieval Augmented Generation with Retrieval and Response Metrics- 29446Murphy ≡ DeepGuide
A Humanitarian Crises Situation Report AI Assistant: Exploring LLMOps with Prompt Flow
Exploring some techniques for safe deployment of LLM solutions- 28743Murphy ≡ DeepGuide
How to Make the Most Out of LLM Production Data: Simulated User Feedback
A novel approach to use production data to simulate user feedback for testing and evaluating your LLM app.- 27507Murphy ≡ DeepGuide
Productionize LLM RAG App in Django – Part I: Celery
Automate Pinecone Daily Upsert Task with Celery and Slack monitoring- 25666Murphy ≡ DeepGuide
Reducing the Size of Docker Images Serving Large Language Models
Have you encountered a problem where a 1 GB transformer-based model increases even up to 8 GB when deployed using Docker containerization?- 23150Murphy ≡ DeepGuide
Reducing the Size of Docker Images Serving Large Language Models (part 2)
How to reduce a "small" Docker image by another 10%.- 28488Murphy ≡ DeepGuide
Building an Observable arXiv RAG Chatbot with LangChain, Chainlit, and Literal AI
A tutorial on building a semantic paper engine using RAG with LangChain, Chainlit copilot apps, and Literal AI observability.- 28335Murphy ≡ DeepGuide
Supercharge Your LLM Apps using DSPy and Langfuse
Build Production Grade LLM Apps with Ease- 21477Murphy ≡ DeepGuide
What Did I Learn from Building LLM Applications in 2024? – Part 2
An engineer's journey to building LLM-powered applications- 21224Murphy ≡ DeepGuide
We look at an implementation of the HyperLogLog cardinality estimati
Using clustering algorithms such as K-means is one of the most popul
Level up Your Data Game by Mastering These 4 Skills
Learn how to create an object-oriented approach to compare and evalu
When I was a beginner using Kubernetes, my main concern was getting
Tutorial and theory on how to carry out forecasts with moving averag