Confronting Bias in Data Is (Still) Difficult-and Necessary

Author:Murphy  |  View: 23048  |  Time: 2025-03-23 19:51:48

Year after year, datasets get bigger, cloud servers run faster, and analytics tools become more sophisticated. Despite this constant progress, however, practitioners continue to run into the issue of Bias—whether it's lurking in the dark recesses of their data files, popping up in their models' outputs, or framing their project's root assumptions.

A definitive solution to bias will require a lot more than local changes to a data team's workflows; it's not realistic to expect tactical fixes to solve a deep-rooted systemic problem. There's hope, however, in the growing recognition (in tech and beyond) that this is, indeed, a problem to think about, discuss, and tackle collectively.

This week, we're highlighting several articles that cover bias and data (and bias in data) in creative, actionable, and thought-provoking ways.

  • The different types of bias you might encounter. For anyone who's exploring this topic for the first time, Shahrokh Barati‘s primer is an essential read on the differences between statistical bias and ethical bias: "two different categories of bias with distinct root causes and mitigations," that can each jeopardize data projects (and harm end users) if left unaddressed.
  • A powerful strategy to add to your anti-bias toolkit. After ML models go into production, they continue to evolve as teams fine-tune them to optimize their performance. Every tweak is a potential opening for bias to sneak in – which is why Jazmia Henry advocates for the adoption of model versioning, an approach that "allows for model rollbacks that can save your company money long term, but more importantly, help reduce bias if and when it arises."
  • Who shapes the politics of language models' outputs? The rapid integration of chatbots into our day-to-day lives begs the question of their objectivity. Yennie Jun attempted to measure the political leanings of GPT-3's outputs; the fascinating results she reports raise a whole set of questions about the responsibility and transparency of the people who train and design these powerful models.

For any of you who'd like to branch out into other topics over the next few days—from A/B testing to natural language processing—we're delighted to share some of our recent favorites. Enjoy!


We hope you consider becoming a Medium member this week – it's the most direct and effective way to support the work we publish.

Until the next Variable,

TDS Editors

Tags: Bias Data Science Tds Features The Variable Towards Data Science

Comment