Create Stronger Decision Trees with bootstrapping and genetic algorithms
A technique to better allow decision trees to be used as interpretable models- 21242Murphy2025-03-23
We Need to Raise the Bar for AI Product Managers
How to Stop Blaming the 'Model' and Start Building Successful AI Products- 28870Murphy2025-03-23
Pre-Commit & Git Hooks: Automate High Code Quality
How to improve your code quality with pre-commit and git hooks- 28806Murphy2025-03-23
Structured Outputs and How to Use Them
Building robustness and determinism in LLM applications- 26690Murphy2025-03-23
Improving Code Quality During Data Transformation with Polars
Optimize your data workflows with Polars by improving code quality and refining transformations with these best practices.- 23946Murphy2025-03-23
Running a SOTA 7B Parameter Embedding Model on a Single GPU
In this post I will explain how to run a state-of-the-art 7B parameter LLM based embedding model on just a single 24GB GPU. I will cover some theory and then show how to run it with the HuggingFace Transformers library in Python in just a few lines of cod- 20651Murphy2025-03-23
Algorithm-Agnostic Model Building with MLflow
A beginner-friendly step-by-step guide to creating generic ML pipelines using mlflow.pyfunc- 22098Murphy2025-03-23
LLMOps – Serve a Llama-3 model with BentoML
Quickly set up LLM APIs with BentoML and Runpod- 27993Murphy2025-03-23
Data Scaling 101: Standardization and Min-Max Scaling Explained
When to use MinMaxScaler vs StandardScaler vs something else- 23529Murphy2025-03-23
Which Regression technique should you use?
Here's a taxonomy of what is the best regression technique based on your specific dataset- 29485Murphy2025-03-23
Denormalisation: Thoughtful Optimisation or Irrational Avant-Garde?
Perspective on Performance Optimisation and Data Quality- 27993Murphy2025-03-23
AI for the Absolute Novice – Intuitively and Exhaustively Explained
From "I've never coded" to making an AI model from scratch.- 27319Murphy2025-03-23
VAE for Time Series
Generate realistic sequential data with this easy-to-train model- 23774Murphy2025-03-23
KernelSHAP can be misleading with correlated predictors
A concrete case study- 26432Murphy2025-03-23
Introduction to Support Vector Machines - Motivation and Basics
Learn basic concepts that make Support Vector Machine a powerful linear classifier- 29183Murphy2025-03-23
Must-Know Techniques for Handling Big Data in Hive
HQL's Unique Features- PARTITIONED BY, STORED AS, DISTRIBUTE BY / CLUSTER BY, LATERAL VIEW with EXPLODE and COLLECT_SET- 22219Murphy2025-03-23
Unleashing the Power of Triton: Mastering GPU Kernel Optimization in Python
Accelerating AI/ML Model Training with Custom Operators - Part 2- 25953Murphy2025-03-23
Accelerating AI/ML Model Training with Custom Operators
On the potential benefits of creating model-specific GPU kernels and their application to optimizing the use of dynamically shaped tensors- 27963Murphy2025-03-23
Avoid Building a Data Platform in 2024
Why articles about 'Building a Data Platform' are mostly misleading- 21472Murphy2025-03-23
Four Visualisation Libraries That Seamlessly Integrate With Pandas Dataframe
Make use of Pandas plotting backend for the easiest plotting- 24390Murphy2025-03-23
The current state of continual learning in AI
Why is ChatGPT only trained up until 2021?Optimizing Pandas Code: The Impact of Operation Sequence
Learn how to rearrange your code to achieve significant speed improvements.