QUICK LINKS How To Submit Your Work Frequently Asked Questions Contact TDS Editors with questions or concerns Since founding Towards Data Science in 2016, we’ve built the largest publication on Medium with a dedicated community of readers and contributors- 29683Murphy2025-03-22
In the world of machine learning, we obsess over model architectures, training pipelines, and hyper-parameter tuning, yet often overlook a fundamental aspect: how our features live and breathe throughout their lifecycle. From in-memory calculations that v- 25853Murphy2025-03-22
Stable Diffusion 1.5/2.0/2.1/XL 1.0, DALL-E, Imagen… In the past years, diffusion models have showcased stunning quality in image generation. However, while producing great quality on generic concepts, these struggle to generate high quality for more spec- 20755Murphy2025-03-22
Overview Introduction — Purpose and Reasons Datasets, Tasks, and Settings Results Conclusions Wrapping Up Introduction — Purpose and Reasons Speed is important when dealing with large amounts of data. If you are handling data in a cloud data warehouse or- 22690Murphy2025-03-22
Decision tree algorithms have always fascinated me. They are easy to implement and achieve good results on various classification and regression tasks. Combined with boosting, decision trees are still state-of-the-art in many applications. Frameworks such- 23465Murphy2025-03-22
It’s been more than 15 years since I finished my master’s degree, but I’m still haunted by the hair-pulling frustration of managing my of R scripts. As a (recovering) perfectionist, I named each script very systematically by date (think: an- 20803Murphy2025-03-22
There seems to be a consensus that leveraging data, analytics, and AI to create a data-driven organization requires a clear strategic approach. However, there is less clarity and agreement on exactly what this strategic approach should look like in practi- 24432Murphy2025-03-22
This article is aimed at those who want to understand exactly how diffusion models work, with no prior knowledge expected. I’ve tried to use illustrations wherever possible to provide visual intuitions on each part of these models. I’ve kept mathematical- 26036Murphy2025-03-22
Python has grown to dominate data science, and its package Pandas has become the go-to tool for data analysis. It is great for tabular data and supports data files of up to 1GB if you have a large RAM. Within these size limits, it is also good with time-s- 21903Murphy2025-03-22
Introduction Developers work on applications that are supposed to be deployed on some server in order to allow anyone to use those. Typically in the machine where these apps live, developers set up environment variables that allow the app to run. These va- 29054Murphy2025-03-22
Minimum cost flow optimization minimizes the cost of moving flow through a network of nodes and edges. Nodes include sources (supply) and sinks (demand), with different costs and capacity limits. The aim is to find the least costly way to move volume from- 21412Murphy2025-03-22
The basic principle of Large Language Models (LLMs) is very simple: to predict the next word (or token) in a sequence of words based on statistical patterns in their training data. However, this seemingly simple capability turns out to be incredibly sophi- 30146Murphy2025-03-22
Metric collection is an essential part of every machine learning project, enabling us to track model performance and monitor training progress. Ideally, metrics should be collected and computed without introducing any additional overhead to the training p- 24515Murphy2025-03-22
Let’s say you are in a customer care center, and you would like to know the probability distribution of the number of calls per minute, or in other words, you want to answer the question: what is the probability of receiving zero, one, two, … etc.,- 28990Murphy2025-03-22
I am sure the quantum hype has reached every person in tech (and outside it, most probably). With some over-the-top claims, like “some company has proved quantum supremacy,” “the quantum revolution is here,” or my favorite, “quantum computers are here, an- 21896Murphy2025-03-22
How Reliable Are Your Predictions? About To be considered reliable, a model must be calibrated so that its confidence in each decision closely reflects its true outcome. In this blog post we’ll take a look at the most commonly used definition for calibrat- 25381Murphy2025-03-22
In Part 1, we discussed the importance of collecting good image data and assigning proper labels for your image classification project to be successful. Also, we talked about classes and sub-classes of your data. These may seem pretty straight forward con- 24129Murphy2025-03-22
Microsoft PowerBI is a one of the most popular business intelligence (BI) tools, and while it has all the features you need to create dynamic analytic reporting for stakeholders across the business, creating some advanced data visualizations is more chall- 27211Murphy2025-03-22
In this latest part of my series, I will share what I have learned on selecting a model for image classification and how to fine tune that model. I will also show how you can leverage the model to accelerate your labelling process, and finally how to just- 22893Murphy2025-03-22
While building my own LLM-based application, I found many prompt engineering guides, but few equivalent guides for determining the temperature setting. Of course, temperature is a simple numerical value while prompts can get mindblowingly complex, so it m- 28589Murphy2025-03-22
Why is ChatGPT only trained up until 2021?
Learn how to rearrange your code to achieve significant speed improvements.