The Three Essential Methods to Evaluate a New Language Model
What is this about? New LLMs are released every week, and if you’re like me, you might ask yourself: Does this one finally fit all the use cases I want to utilise an LLM for? In this tutorial, I will share the techniques that I use to evaluate new L- 25960Murphy ≡ DeepGuide
Building a maintainable and modular LLM application stack with Hamilton
LLM Applications are dataflows, use a tool specifically designed to express them.- 24412Murphy ≡ DeepGuide
LLMOps: Production prompt engineering patterns with Hamilton
An overview of the production grade ways to iterate on prompts with Hamilton.- 28878Murphy ≡ DeepGuide
How to Measure the Success of Your RAG-based LLM System
Including a new novel method for judging answers with a qualitative score and detailed explanation.- 26731Murphy ≡ DeepGuide
Line-By-Line, Let's Reproduce GPT-2: Section 3 – Training
This blog post will go line-by-line through the code in Section 3 of Andrej Karpathy's "Let's reproduce GPT-2 (124M)"- 24401Murphy ≡ DeepGuide
Retrieval Augmented Generation (RAG) Inference Engines with LangChain on CPUs
Exploring scale, fidelity, and latency in AI applications with RAG- 24964Murphy ≡ DeepGuide
Exploring mergekit for Model Merge, AutoEval for Model Evaluation, and DPO for Model Fine-tuning
My observations from experimenting with model merge, evaluation, and two model fine-tuning techniques- 29761Murphy ≡ DeepGuide
Building an LLMOPs Pipeline
Utilize SageMaker Pipelines, JumpStart, and Clarify to Fine-Tune and Evaluate a Llama 7B Model- 21262Murphy ≡ DeepGuide
Top Evaluation Metrics for RAG Failures
Troubleshoot LLMs and Retrieval Augmented Generation with Retrieval and Response Metrics- 29456Murphy ≡ DeepGuide
A Humanitarian Crises Situation Report AI Assistant: Exploring LLMOps with Prompt Flow
Exploring some techniques for safe deployment of LLM solutions- 28753Murphy ≡ DeepGuide
How to Make the Most Out of LLM Production Data: Simulated User Feedback
A novel approach to use production data to simulate user feedback for testing and evaluating your LLM app.- 27517Murphy ≡ DeepGuide
Productionize LLM RAG App in Django – Part I: Celery
Automate Pinecone Daily Upsert Task with Celery and Slack monitoring- 25676Murphy ≡ DeepGuide
Reducing the Size of Docker Images Serving Large Language Models
Have you encountered a problem where a 1 GB transformer-based model increases even up to 8 GB when deployed using Docker containerization?- 23160Murphy ≡ DeepGuide
Reducing the Size of Docker Images Serving Large Language Models (part 2)
How to reduce a "small" Docker image by another 10%.- 28499Murphy ≡ DeepGuide
Building an Observable arXiv RAG Chatbot with LangChain, Chainlit, and Literal AI
A tutorial on building a semantic paper engine using RAG with LangChain, Chainlit copilot apps, and Literal AI observability.- 28346Murphy ≡ DeepGuide
Supercharge Your LLM Apps using DSPy and Langfuse
Build Production Grade LLM Apps with Ease- 21488Murphy ≡ DeepGuide
What Did I Learn from Building LLM Applications in 2024? – Part 2
An engineer's journey to building LLM-powered applications- 21235Murphy ≡ DeepGuide
We look at an implementation of the HyperLogLog cardinality estimati
Using clustering algorithms such as K-means is one of the most popul
Level up Your Data Game by Mastering These 4 Skills
Learn how to create an object-oriented approach to compare and evalu
When I was a beginner using Kubernetes, my main concern was getting
Tutorial and theory on how to carry out forecasts with moving averag