Learning New Data Science Skills, The Right Way
We tend to think of learning curves as neat, smooth, continuously upwards-trending lines. Look close enough at any learning journey, though, and you'll see numerous dips and plateaus along the way: in reality, even seasoned professionals feel like beginners when faced with a new tool or workflow.
This week, we've put together some of our favorite recent tutorials and introductory guides. They come with very few—if any—prerequisites for data and ML practitioners; they cover very different topics, from deep learning to anomaly detection, but share a strong commitment to patient explanations, concrete details, and expert contextualization. So if you find yourself in a learning lull, pick any of our highlights below: they're bound to help you snap right out of it.
- Sometimes it makes perfect sense to jump in at the deep end: why not just implement and train a convolutional neural network (CNN) from scratch? You'll have Betty LD‘s step-by-step instructions to guide you through the process; if you've been thinking about tinkering with the PyTorch Lightning library, here's your chance.
- This has been the summer of open-source large language models, it seems, with a new specimen arriving on the scene every few days. Donato Riccio‘s new post is a beginner-friendly primer on all things Llama, Alpaca, and more, and covers the basics of fine-tuning and working with these LLMs.
- Looking to dive a bit deeper into the open-source LLM pool? No need for floaties: Shawhin Talebi is here to help you get familiar with Hugging Face's Transformers library, which offers "an easy and cost-free way to work with a wide variety of open-source language models."
- For a fresh spin on a more traditional ML approach, Evie Fowler‘s latest contribution outlines the benefits of using anomaly detection methods to tackle problems created by imbalanced outcome classes in supervised learning.
Why stop here? If you're still in the mood for learning, we've got a few more excellent reads you shouldn't miss out on:
- Ruth Eneyi Ikwu examines how unchecked collinearity leads to unintended bias by taking a critical look at a problematic dataset.
- Is there a practical path to sustainable AI? Leonie Monigatti‘s Kaggle competition-winning article surveys potential approaches for boosting deep learning models' efficiency in production.
- If you're passionate about audio and music data, Naman Agrawal‘s deep dive on time and frequency domain feature extraction is a must-read.
- Mark Ridley offers a thoughtful analysis of how the rise of generative AI might impact product-engineering teams. (Buckle up: this is the first of six posts in this excellent series.)
- Data Science and ML role descriptions evolve all the time; Stephanie Kirmer zooms in on ML engineers and wonders if their emergence might be a symptom of pink-collaring in data science.
- Pol Marin continues to explore interesting topics in sports analytics; his latest object of study: FC Barcelona's defense (and its discontents).
Thank you for supporting our authors! If you enjoy the articles you read on TDS, consider becoming a Medium member – it unlocks our entire archive (and every other post on Medium, too).
Until the next Variable,
TDS Editors