DeepGuide for DeepSeek

Towards Data Science is Launching as an Independent Publication
QUICK LINKS How To Submit Your Work Frequently Asked Questions Contact TDS Editors with questions or concerns Since founding Towards Data Science in 2016, we’ve built the largest publication on Medium with a dedicated community of readers and contributors
29733Murphy2025-03-22
ML Feature Management: A Practical Evolution Guide
In the world of machine learning, we obsess over model architectures, training pipelines, and hyper-parameter tuning, yet often overlook a fundamental aspect: how our features live and breathe throughout their lifecycle. From in-memory calculations that v
25902Murphy2025-03-22
Six Ways to Control Style and Content in Diffusion Models
Stable Diffusion 1.5/2.0/2.1/XL 1.0, DALL-E, Imagen… In the past years, diffusion models have showcased stunning quality in image generation. However, while producing great quality on generic concepts, these struggle to generate high quality for more spec
20805Murphy2025-03-22
Polars vs. Pandas — An Independent Speed Comparison
Overview Introduction — Purpose and Reasons Datasets, Tasks, and Settings Results Conclusions Wrapping Up Introduction — Purpose and Reasons Speed is important when dealing with large amounts of data. If you are handling data in a cloud data warehouse or
22739Murphy2025-03-22
Build a Decision Tree in Polars from Scratch
Decision tree algorithms have always fascinated me. They are easy to implement and achieve good results on various classification and regression tasks. Combined with boosting, decision trees are still state-of-the-art in many applications. Frameworks such
23514Murphy2025-03-22
Branching Out: 4 Git Workflows for Collaborating on ML
It’s been more than 15 years since I finished my master’s degree, but I’m still haunted by the hair-pulling frustration of managing my of R scripts. As a (recovering) perfectionist, I named each script very systematically by date (think: an
20855Murphy2025-03-22
Data vs. Business Strategy
There seems to be a consensus that leveraging data, analytics, and AI to create a data-driven organization requires a clear strategic approach. However, there is less clarity and agreement on exactly what this strategic approach should look like in practi
24481Murphy2025-03-22
A Visual Guide to How Diffusion Models Work
This article is aimed at those who want to understand exactly how diffusion models work, with no prior knowledge expected. I’ve tried to use illustrations wherever possible to provide visual intuitions on each part of these models. I’ve kept mathematical
26085Murphy2025-03-22
Pandas Can’t Handle This: How ArcticDB Powers Massive Datasets
Python has grown to dominate data science, and its package Pandas has become the go-to tool for data analysis. It is great for tabular data and supports data files of up to 1GB if you have a large RAM. Within these size limits, it is also good with time-s
21951Murphy2025-03-22
Manage Environment Variables with Pydantic
Introduction Developers work on applications that are supposed to be deployed on some server in order to allow anyone to use those. Typically in the machine where these apps live, developers set up environment variables that allow the app to run. These va
29112Murphy2025-03-22
Introduction to Minimum Cost Flow Optimization in Python
Minimum cost flow optimization minimizes the cost of moving flow through a network of nodes and edges. Nodes include sources (supply) and sinks (demand), with different costs and capacity limits. The aim is to find the least costly way to move volume from
21465Murphy2025-03-22
How to Measure the Reliability of a Large Language Model’s Response
The basic principle of Large Language Models (LLMs) is very simple: to predict the next word (or token) in a sequence of words based on statistical patterns in their training data. However, this seemingly simple capability turns out to be incredibly sophi
30200Murphy2025-03-22
Efficient Metric Collection in PyTorch: Avoiding the Performance Pitfalls of TorchMetrics
Metric collection is an essential part of every machine learning project, enabling us to track model performance and monitor training progress. Ideally, metrics should be collected and computed without introducing any additional overhead to the training p
24569Murphy2025-03-22
Method of Moments Estimation with Python Code
Let’s say you are in a customer care center, and you would like to know the probability distribution of the number of calls per minute, or in other words, you want to answer the question: what is the probability of receiving zero, one, two, … etc.,
29041Murphy2025-03-22
Should Data Scientists Care About Quantum Computing?
I am sure the quantum hype has reached every person in tech (and outside it, most probably). With some over-the-top claims, like “some company has proved quantum supremacy,” “the quantum revolution is here,” or my favorite, “quantum computers are here, an
21949Murphy2025-03-22
Understanding Model Calibration: A Gentle Introduction & Visual Exploration
How Reliable Are Your Predictions? About To be considered reliable, a model must be calibrated so that its confidence in each decision closely reflects its true outcome. In this blog post we’ll take a look at the most commonly used definition for calibrat
25435Murphy2025-03-22
Learnings from a Machine Learning Engineer — Part 2: The Data Sets
In Part 1, we discussed the importance of collecting good image data and assigning proper labels for your image classification project to be successful. Also, we talked about classes and sub-classes of your data. These may seem pretty straight forward con
24181Murphy2025-03-22
How to Create Network Graph Visualizations in Microsoft PowerBI
Microsoft PowerBI is a one of the most popular business intelligence (BI) tools, and while it has all the features you need to create dynamic analytic reporting for stakeholders across the business, creating some advanced data visualizations is more chall
27265Murphy2025-03-22
Learnings from a Machine Learning Engineer — Part 4: The Model
In this latest part of my series, I will share what I have learned on selecting a model for image classification and how to fine tune that model. I will also show how you can leverage the model to accelerate your labelling process, and finally how to just
22946Murphy2025-03-22
A Comprehensive Guide to LLM Temperature
While building my own LLM-based application, I found many prompt engineering guides, but few equivalent guides for determining the temperature setting. Of course, temperature is a simple numerical value while prompts can get mindblowingly complex, so it m
28642Murphy2025-03-22

< 261 262 263 264 265 >

Genius Cliques: Mapping out the Nobel Network
Combining Network Science, Data Visualization, and Wikipedia to uncover hidden connections between all the Nobel laureates.
Data Science Expertise Comes in Many Shapes and Forms
Our weekly selection of must-read Editors' Picks and original features

Recommend

◦ Expanding Time

◦ Semantically Compress Text to Save On LLM Costs

◦ How bad is being greedy?

◦ The One Mindset Change That Launched Me into Data Science

◦ My commute to work is more than 4 hours. Each way.

◦ Unveiling Metadynamics: A Beginner's Guide to Mastering PLUMED (Part 1 of 3)

◦ Accelerating AI/ML Model Training with Custom Operators

◦ Sketch: A Promising AI Library to Help With Pandas Dataframes Directly in Jupyter

◦ Regulating Generative AI

◦ How to Use SQLAlchemy to Make Database Requests Asynchronously

◦ Information at a Glance: Do Your Charts Suck?

◦ Finding Dark Matter using a Quantum Computer