This Is How LLMs Break Down the Language
The science and art behind tokenization- 21494Murphy2025-03-22
One-Tailed Vs. Two-Tailed Tests
Choosing between one- and two-tailed hypotheses affects every stage of A/B testing. Learn why the hypothesis direction matters and explore the pros and cons of each approach.- 25031Murphy2025-03-22
LettuceDetect: A Hallucination Detection Framework for RAG Applications
How to capitalize on ModernBERT’s extended context window to build a token-level classifier for hallucination detection- 21268Murphy2025-03-22
How to Spot and Prevent Model Drift Before it Impacts Your Business
3 essential methods to track model drift you should know- 23601Murphy2025-03-22
Experiments Illustrated: How We Optimized Premium Listings on Our Nursing Job Board
Also, how georandomization can help clean up spillovers- 29879Murphy2025-03-22
Experiments Illustrated: How Random Assignment Saved Us $1M in Marketing Spend
Also, a casual intro to the multiple comparisons problem- 21035Murphy2025-03-22
Are You Still Using LoRA to Fine-Tune Your LLM?
A look at this year’s crop of LoRA alternatives- 25553Murphy2025-03-22
Linear Regression in Time Series: Sources of Spurious Regression
Why does the autocorrelation of the errors term matter?- 28976Murphy2025-03-22
From Fuzzy to Precise: How a Morphological Feature Extractor Enhances AI's Recognition Capabil
Mimicking human visual perception to truly understand objects- 26368Murphy2025-03-22
The Impact of GenAI and Its Implications for Data Scientists
What we can learn from Anthropic’s analysis of millions of Claude.ai chats- 28400Murphy2025-03-22
Mastering Hadoop, Part 3: Hadoop Ecosystem: Get the most out of your cluster
Exploring the Hadoop ecosystem — key tools to maximize your cluster’s potential- 21645Murphy2025-03-22
Mastering Hadoop, Part 2: Getting Hands-On — Setting Up and Scaling Hadoop
Understanding Hadoop’s core components before installation and scaling- 29119Murphy2025-03-22
Six Organizational Models for Data Science
Setting a team up for success or failure- 26135Murphy2025-03-22
Platform-Mesh, Hub and Spoke, and Centralised | 3 Types of data team
Why understanding team structure is critical for data and AI- 29695Murphy2025-03-22
Fourier Transform Applications in Literary Analysis
How mathematics and data analysis can offer a head start to analysing poetry, before even reading the words.- 24170Murphy2025-03-22
How to Make Your LLM More Accurate with RAG & Fine-Tuning
And when to use which one- 24388Murphy2025-03-22
Mastering the Poisson Distribution: Intuition and Foundations
Take a dive into the foundations and exemplifying use cases of the Poisson distribution- 23030Murphy2025-03-22
Anatomy of a Parquet File
Parquet from scratch: A Python deep dive into a raw parquet file- 22495Murphy2025-03-22
Heatmaps for Time Series
Visualizing trends and outliers with non-linear color scales- 24175Murphy2025-03-22
Algorithm Protection in the Context of Federated Learning
A pragmatic look into protecting algorithms and models deployed into real-world federated analysis and learning settings in healthcare.- 22115Murphy2025-03-22
The current state of continual learning in AI
Why is ChatGPT only trained up until 2021?Optimizing Pandas Code: The Impact of Operation Sequence
Learn how to rearrange your code to achieve significant speed improvements.