Create Stronger Decision Trees with bootstrapping and genetic algorithms
A technique to better allow decision trees to be used as interpretable models- 21282Murphy2025-03-23
We Need to Raise the Bar for AI Product Managers
How to Stop Blaming the 'Model' and Start Building Successful AI Products- 28912Murphy2025-03-23
Pre-Commit & Git Hooks: Automate High Code Quality
How to improve your code quality with pre-commit and git hooks- 28846Murphy2025-03-23
Structured Outputs and How to Use Them
Building robustness and determinism in LLM applications- 26730Murphy2025-03-23
Improving Code Quality During Data Transformation with Polars
Optimize your data workflows with Polars by improving code quality and refining transformations with these best practices.- 23986Murphy2025-03-23
Running a SOTA 7B Parameter Embedding Model on a Single GPU
In this post I will explain how to run a state-of-the-art 7B parameter LLM based embedding model on just a single 24GB GPU. I will cover some theory and then show how to run it with the HuggingFace Transformers library in Python in just a few lines of cod- 20691Murphy2025-03-23
Algorithm-Agnostic Model Building with MLflow
A beginner-friendly step-by-step guide to creating generic ML pipelines using mlflow.pyfunc- 22138Murphy2025-03-23
LLMOps – Serve a Llama-3 model with BentoML
Quickly set up LLM APIs with BentoML and Runpod- 28032Murphy2025-03-23
Data Scaling 101: Standardization and Min-Max Scaling Explained
When to use MinMaxScaler vs StandardScaler vs something else- 23570Murphy2025-03-23
Which Regression technique should you use?
Here's a taxonomy of what is the best regression technique based on your specific dataset- 29525Murphy2025-03-23
Denormalisation: Thoughtful Optimisation or Irrational Avant-Garde?
Perspective on Performance Optimisation and Data Quality- 28033Murphy2025-03-23
AI for the Absolute Novice – Intuitively and Exhaustively Explained
From "I've never coded" to making an AI model from scratch.- 27358Murphy2025-03-23
VAE for Time Series
Generate realistic sequential data with this easy-to-train model- 23813Murphy2025-03-23
KernelSHAP can be misleading with correlated predictors
A concrete case study- 26471Murphy2025-03-23
Introduction to Support Vector Machines - Motivation and Basics
Learn basic concepts that make Support Vector Machine a powerful linear classifier- 29222Murphy2025-03-23
Must-Know Techniques for Handling Big Data in Hive
HQL's Unique Features- PARTITIONED BY, STORED AS, DISTRIBUTE BY / CLUSTER BY, LATERAL VIEW with EXPLODE and COLLECT_SET- 22259Murphy2025-03-23
Unleashing the Power of Triton: Mastering GPU Kernel Optimization in Python
Accelerating AI/ML Model Training with Custom Operators - Part 2- 25994Murphy2025-03-23
Accelerating AI/ML Model Training with Custom Operators
On the potential benefits of creating model-specific GPU kernels and their application to optimizing the use of dynamically shaped tensors- 28003Murphy2025-03-23
Avoid Building a Data Platform in 2024
Why articles about 'Building a Data Platform' are mostly misleading- 21511Murphy2025-03-23
Four Visualisation Libraries That Seamlessly Integrate With Pandas Dataframe
Make use of Pandas plotting backend for the easiest plotting- 24429Murphy2025-03-23
Genius Cliques: Mapping out the Nobel Network
Combining Network Science, Data Visualization, and Wikipedia to uncover hidden connections between all the Nobel laureates.
Data Science Expertise Comes in Many Shapes and Forms
Our weekly selection of must-read Editors' Picks and original features
