DeepGuide for DeepSeek

Unlocking the Power of Big Data: The Fascinating World of Graph Learning
Large companies generate and collect vast amounts of data, as an example and 90% of this data has been created in recent years. Yet, 73% of these data remain unused [1]. However, as you may know, data is a goldmine for companies working with Big Data. Dee
21869Murphy2025-03-23
RAG: How to Talk to Your Data
Comprehensive guide on how to analyse customer feedback using ChatGPT
24755Murphy2025-03-23
Delta Lake – Partitioning, Z-Order and Liquid Clustering
How are different partitioning/clustering methods implemented in Delta? How do they work in practice?
27210Murphy2025-03-23
Towards Generative AI for Model Architecture
How "MAD" AI will help us discover the next transformer
30137Murphy2025-03-23
Data Tells Us "What" and We Always Seek for "Why"
In my previous article, I kicked off the "Read with Me" book club to explore Judea Pearl’s "The Book of Why". I would like to thank everyone who has shown interest and signed up to join the club. I am hopeful that we can embark o
24938Murphy2025-03-23
Precision Clustering Made Simple: kscorer's Guide to Auto-Selecting Optimal K-means Clusters
kscorer streamlines the process of clustering and provides practical approach to data analysis through advanced scoring and parallelization
26116Murphy2025-03-23
Pareto, Power Laws, and Fat Tails
What they don't teach you in statistics
27079Murphy2025-03-23
Make Python Faster by Caching Functions: Memoization
The article discusses memoization using the Python standard library. The functools.lru_cache decorator makes this so simple!
24035Murphy2025-03-23
Creating a Gradient Descent Animation in Python
How to plot the trajectory of a point over a complex surface
20186Murphy2025-03-23
Demystifying Dependence and Why it is Important in Causal Inference and Causal Validation
Photo by Ana Municio on Unsplash Introduction Causal Inference is an emergent branch of data science concerned with determining the cause-and-effect relationship between events and outcomes and it has the potential to significantly add to the value that m
27944Murphy2025-03-23
A Beginner's Guide to Building High-Quality Datasets for Machine Learning
Tools and techniques for data cleaning, visualization, augmentation, and synthetic data generation
24482Murphy2025-03-23
Introduction to Clustering Algorithms
A comprehensive guide to 10 clustering algorithms commonly used for Hierarchical, Partitional, and Density-Based Clustering
23813Murphy2025-03-23
Teaching is Hard: How to Train Small Models and Outperforming Large Counterparts
Distilling the knowledge of a large model is complex but a new method shows incredible performances
25768Murphy2025-03-23
Data Engineering Books
Readers Digest to Learn Data Engineering Gradually
24501Murphy2025-03-23
Philosophy and Data Science -Thinking deeply about data
Part 1 : Determinism
26324Murphy2025-03-23
Create your Vision Chat Assistant with LLaVA
Get started with multimodal conversational models using the open-source LLaVA model.
29061Murphy2025-03-23
Chat with Your Dataset using Bayesian Inferences.
The ability to ask questions to your data set has always been an intriguing prospect. You will be surprised how easy it is to learn a local...
25135Murphy2025-03-23
Creating your own ChatGPT without coding – A Step by Step Guide
OpenAI released a new feature to create your own GPT. Here is a tutorial on how to do it as well as the main limitations you might face.
28984Murphy2025-03-23
Testing the Consistency of Reported Machine Learning Performance Scores by the mlscorecheck Package
In this post, we explore how the Python package mlscorecheck can be used for testing the consistency between reported machine learning performance scores and the accompanying descriptions of experimental setups. Disclaimer: the author of this post is the
25479Murphy2025-03-23
Understanding Instrumental Variables
How to estimate causal effects when you cannot randomize treatment
22377Murphy2025-03-23