How (Not) to Cheat with Data Visualizations

Author:Murphy  |  View: 24726  |  Time: 2025-03-22 22:15:31
Source: image generated by the author in ChatGPT.

Not so long ago, I published a post on data visualization using KNIME. In the opening story, I used the example of the documentary An Inconvenient Truth. The movie shows, among others, how complicated data can be presented effectively using visualizations. The movie, concepts and data presented there raised a lot of controversy and dispute. So did that entry paragraph from my post.

Data Visualization with KNIME

The author removed the critical comments from my post (or at least I don't see them anymore). However, these critiques inspired me to conduct a deeper exploration into a relatively uncharted territory: the dos and don'ts of creating deceitful data visualizations. This article aims to address precisely this topic, guiding readers through the nuances of integrity in data presentation.

In this post, I will introduce fourteen Data Manipulation tactics you must diligently avoid including in your presentation toolkit. Should you discover any of these tactics lurking within your work, you must eliminate them without delay. Furthermore, I will guide you through strategies for easily sidestepping or correcting visualizations that these manipulation tactics have compromised.

As for the "uncharted territory"…

The abundance of materials, including articles, posts, and books like Alberto Cairo's "How Charts Lie," underscores the prevalence of manipulative data visualizations. By introducing a "wrongdoer perspective," I add my contribution and emphasize how easily data can be manipulated. This perspective isn't about blame but raising awareness of the potential for accidental manipulation, stressing the need for accuracy and ethical integrity in our work. The goal is to underscore our visualizations' significant impact on decision-making and opinions, advocating for vigilance in responsibly conveying information.

Examining the controversy

The movie mentioned above received judicial scrutiny. Namely, the court examined it to see how accurately it talked about climate change. The judge agreed that the film mostly showed the fundamental problems of global warming correctly. But, it also found nine mistakes, thinking these were made because the movie tried too hard to show how profound climate change is. Because of this, the movie could be used for teaching, but it had to come with special notes explaining things more clearly. This situation shows how tricky it can be to share information, especially when trying to make a point or teach something important [1]. Data visualization can be tricky as well, as I will hopefully demonstrate further in this post.

Anatomy of data visualization

Why is data visualization so important?

Visualization plays a crucial role in data analysis by turning complex concepts into visuals that are easy for everyone to understand. It shows patterns and insights we wouldn't probably see just by looking at numbers, making information more accessible and valuable. This approach taps into our brain's ability to quickly interpret visuals, helping with tasks like cleaning data, exploring it, and presenting findings. It's also essential for making fast, informed decisions. The future of data visualization involves finding the best ways to choose visuals that convey information effectively and improving software to help with this process. That is what the theory says. The challenge lies, however, in creating attractive and informative graphics, balancing creativity and clarity, and maintaining honesty [2].

The Decalogue of Visualization

That's why, among others, I created the concept named "The Decalogue of Visualization." It outlines essential practices for creating compelling and clear visual data presentations, highlighting the importance of starting with a clear purpose and considering the audience's comprehension. It advocates for facilitating eye tracking with visual cues, minimizing cognitive load by removing extraneous elements, choosing precise visualization attributes for clarity, and decluttering the chart to enhance focus on the data. Emphasizing the optimal use of data-ink ratio and strategic color application to highlight critical data, the guidelines also call for maintaining design integrity, ensuring accessibility, particularly for those with visual impairments, and avoiding manipulative practices that might distort data representation. These principles aim to improve audience engagement and ensure that visualizations are informative, inclusive, and do not include misleading elements.

Decalogue of visualization / Dekalog wizualizacji

How (not) to cheat with data visualization?

But what if things go wrong? Well, it's time to confront the dark side of data visualization.

Photo by R.D. Smith on Unsplash

Fourteen tactics that you should avoid… or stop using!

Before we delve any further, let me provide an example of a deceitful visualization that was never aimed to become one. The image I'm referring to can be seen on several websites (e.g. here). It illustrates the number of deaths by gunshot in Florida before and after a new law was enacted. However, one might misinterpret this chart due to an inverted vertical scale. Interestingly, the chart was never created to deceive. Instead, the author aimed for an ‘artistic' effect: depicting blood running down a wall to emphasize the tragedy of death by any means. Thus, misleading visualizations are not always the result of malicious intent. Regardless of the motive, we must avoid falling into this trap. Let's look at how to prevent such misunderstandings by recognizing various manipulation tactics and understanding the harm they can inflict.

Tactic 1: Choosing the incorrect chart type

Opt for a chart that obfuscates rather than clarifies your data's story. For instance, choosing a 3D pie chart for representing complex multi-variable datasets can distort the viewer's perspective, making it challenging to discern the relative sizes of sections. Similarly, using a 2D pie charts for comparisons. The key here is to exploit the chart type that best hides the reality of your data, misleading your audience by presenting the data in a format that doesn't suit its nature.

I'll draw from one of my preferred examples to show this concept in practice. Picture us working for an e-commerce company specializing in electronics. Our objective is to scrutinize the sales revenue from our products, identifying our top-performing items and those that may require additional focus. The initial question we aim to address is: Which product experienced the most significant increase in sales revenue in 2023 compared to 2022? An analyst has prepared the following visual representation to aid in uncovering the answer. Let's dive in…

Source: image by the author.

Naturally, finding an answer to the previously mentioned question proves to be more complicated than anticipated. We're required to read the legend, extract values from the chart on the left, then from the one on the right, all while trying to remember these details. Does anyone have something to jot this down on? This scenario perfectly illustrates the pitfalls of choosing an inappropriate chart type for our analysis, particularly when comparing multiple items. The optimal selection? A simple bar chart. Let's proceed with that approach.

Source: image by the author.

And the winner is the "EcoBright Smart Bulb". Consider the acceleration in reaching the correct conclusion. However, extracting the answer efficiently involves more than just the chart type. The solution to our query is strategically highlighted in the title, and the product that requires our focus is marked with blue shading. These deliberate design choices serve a singular goal: to enable visualization readers to quickly and accurately identify the key insights.

Tactic 2: Employing distractors

Clutter your visualization with superfluous elements – extra lines, unnecessary decorations, company logos, or irrelevant data points – to distract from the critical insights. By embedding these distractors, you can effectively divert your audience's attention away from unfavorable trends or figures. The more your audience focuses on deciphering these irrelevant details, the less they'll question the validity or implications of the presented data. This tactic relies on the viewer's limited attention span, using visual noise to obscure the truth. Let's look at an example.

Source: image by the author.

This chart is undoubtedly bustling with activity, yet the phrase "a lot" hardly captures its message's essence. What exactly is the chart trying to tell us? Is it a cause for concern, or does it bear an upbeat message?

The primary issue with this chart lies in its clutter of superfluous elements: excessive borders, gridlines, and decorative images, including logos. These elements, especially the gridlines and borders presented in dark hues, obscure the visibility of the data series lines. The colors appear to be used merely for aesthetic purposes, contributing little to data comprehension, while the legend fails to fulfill any meaningful role, adding to the overall distraction.

Let's try it again…

Source: image by the author.

Indeed, there's a noticeable difference. We've significantly reduced clutter by eliminating all extraneous elements – borders, gridlines, and the plethora of colors, logos, legends, or any other components that detracted rather than added value. The essential information is now prominently highlighted and emphasized in the functional title, ensuring it captures the viewer's attention effectively.

Tactic 3: Overusing colors

Leverage an overly vibrant palette or an unnecessary variety of colors to complicate the visualization. When a different, striking color represents each data point, it becomes harder for the viewer to prioritize information or discern meaningful patterns. This color overload can lead to analysis paralysis, where the sheer volume of colorful data distracts and overwhelms rather than informs. The strategic misuse of color can thus serve as a powerful tool in misleading by decoration, drawing attention away from the nuances and realities of the data.

Take a look at the chart below. Isn't it beautiful? So many colors? And so little sense. What is the message here?

Source: image by the author.

Color is a strategic tool in data visualization, primarily to draw the viewer's attention to specific elements or areas. The use of color should be strictly functional, with no room for purely decorative applications.

How to Use Color in Data Visualizations

A straightforward guideline is to employ lighter colors for elements of lesser importance and opt for darker, contrasting colors to highlight important ones. When labeling, utilize a dark-colored font against light backgrounds and a light (such as white) font for dark backgrounds.

Now, examining the chart discussed previously, let's explore what improvements can be made. Indeed, there is room for enhancement. Initially, we should determine which company merits closer examination. Let's select "Company 2" for this purpose. Consequently, our revised chart will be designed to reflect this focus.

Source: image by the author.

Tactic 4: Avoiding data ordering

Present your data in a haphazard or non-intuitive sequence to hide trends that might be obvious with a more logical arrangement. For example, scattering data points in a time series or categorically similar items can make it difficult for viewers to trace the progression of data over time or across categories. Disrupting the natural flow of information creates a barrier to understanding, forcing the audience to work harder to piece together the story you choose not to tell directly.

Organizing data can significantly enhance our understanding and decision-making process. It stands out as an effective and straightforward method to highlight what's significant and what's not. Should our focus be on the best-selling products, or is it more crucial to examine the underperformers? By systematically arranging the data, we can quickly identify these key areas.

Consider the following scenario: Initially, we have a chart that lacks any specific order. Can you determine which products are top sellers and which are not within just 10 seconds?

Source: image by the author.

It's possible, yet requires some effort. So let's try again. Which chart is easier to read?

Source: image by the author.

Tactic 5: Manipulating scale

Dramatically manipulate the axes' scale on graphs to alter the perceived significance of data trends. Compressing the y-axis can minimize apparent fluctuations, making steep rises or falls in data seem trivial. Conversely, stretching the y-axis can exaggerate minor variations, suggesting a level of volatility or change that isn't supported by the data. This manipulation influences the audience's visual interpretation, misleading them about the data's importance or impact.

Interestingly, there are occasions when manipulating data is not just permissible but preferred. This is particularly true when we aim to highlight changes that might otherwise go unnoticed. However, regardless of our intentions, it's crucial to ensure transparency with our audience by informing them that such manipulation has been conducted.

Let's examine the chart below. Our government has decided to amend the tax law, potentially resulting in a sharp increase in effective tax rates. Or will it?

Source: image by the author.

In reality, this change will be significant. However, its visual representation will not appear as dramatic as suggested by the previous chart. Let's take a look.

First, I added labels to the vertical axis. As you can see, the labeling started at 27%. What's funny about it is that the modification was made automatically by Excel. So be aware.

Source: image by the author.

And now, let's look at the accurate representation.

Source: image by the author.

Tactic 6: Create trend illusion

Present data in a manner that falsely implies a significant trend or pattern where none exists. This can be achieved through selective data presentation, such as choosing specific time frames or data points that skew the perception of the trend or by manipulating the scale and axis of graphs to exaggerate minor fluctuations into seemingly essential trends. The tactic preys on the viewer's tendency to infer a meaningful trend from visual patterns, leading to misconceptions about the underlying data. By carefully curating the data and its presentation, the creator can subtly influence the audience's interpretation, misleading them into drawing conclusions that the data, when viewed in its entirety and proper context, does not support.

However, the illustration of this technique I will use here is not subtle. Take a look.

Source: image by the author.

So we have an obvious, positive trend… Well, we don't. Take a closer look at the x-axis; what appears to be a trend at first glance isn't one. Are the product names unfamiliar to you? If so, I recommend revisiting tactic 4 to see the proper representation of this data.

Tactic 7: Cherry-picking data

Curate your data set selectively to present only the information that supports your narrative, conveniently ignoring data that contradicts or complicates your story. This approach allows you to construct a seemingly coherent and convincing narrative based on partial truths. By focusing on specific data points that align with your desired message, you can easily mislead your audience into accepting a skewed reality while maintaining credibility by using actual data points.

Photo by Nick Fewings on Unsplash

To illustrate this tactic, I will begin with an accurate representation this time. As you can observe, the company we are analyzing has experienced periods of both good and bad performance over the years. Although the overall trend appears upward, there is considerable underlying volatility that potential investors must be cautious of.

Source: image by the author.

The chart displayed above outlines the company's sales revenue. The level of volatility in this indicator is remarkably high, giving the impression of an almost complete lack of control.

Source: image by the author.

However, considering only the data from every other year, the picture improves significantly. Over the past few years, the company has demonstrated steady progress in sales, characterized by a smooth and solid growth trajectory. As a stock exchange investor, this performance would make the company a desirable addition to my portfolio!

Tactic 8: Omitting visual cues

Deliberately leave out visual aids that could help clarify the data, such as axis labels, legends, or grid lines. By removing these cues, you obscure the scale and context, making it difficult for viewers to gauge the significance of the data accurately. This tactic relies on the viewer's inability to question what they cannot easily measure or compare, allowing you to present data that supports your narrative, even if it does so misleadingly.

Let's examine the chart below, which I refer to as a "spaghetti chart." For reasons unknown to me, these charts frequently appear in scientific papers (probably that's my own bias to spot them). However, if they resemble the chart below, their function seems to be nothing more than providing a colorful filler for space.

Source: image by the author.

A single, simple modification – highlighting one of the series in contrast to the others – significantly enhances the utility of this chart. For instance, it allows us to gain insight into how a particular product trends compared to all others.

Source: image by the author.

Tactic 9: Cluttering charts with excessive information

Source: image generated by the author in ChatGPT.

Overwhelm the viewer by packing too much information into a single chart. Use this approach to hide the data you don't want to be seen amidst a sea of other data points, lines, bars, or indicators. The more information you include, the harder it becomes for viewers to focus on any single piece of data, allowing significant details to be overlooked or dismissed as viewers struggle to make sense of the visual chaos. This technique is especially effective in dulling the impact of unfavorable data by burying it under layers of complexity.

Utilizing ChatGPT and Python, I crafted the "artwork" displayed below. I've encountered similar creations in practical settings.

Source: image generated by the author in ChatGPT.

There's a straightforward strategy to prevent such disarray. Before introducing any additional element, pose the question: what function will it fulfill? More precisely, will it enhance the primary objective of the chart? If the response is anything other than a definitive yes, refrain from adding it.

Tactic 10: Showing cumulative values

Using cumulative values in data visualization underscores trends, growth, or total impact over time, which might not be as apparent when only looking at individual values. Plot the cumulative sum of data points on a graph rather than the individual values. For example, if one were to track monthly sales figures over a year, showing each month's sales added to the previous months' totals can dramatically highlight the overall upward trend and total sales for the year. This method can be particularly persuasive in emphasizing progress or success, as it visually aggregates growth, making it easier for the audience to perceive long-term trends. However, this approach can also be manipulative if not used carefully, as it might exaggerate the significance of growth by not showing the variability or decline in individual data points, leading viewers to a possibly misleading interpretation of constant progress.

Let's look at the chart below. It displays the cumulative (running total) revenue value over several months.

Source: image by the author.

We can't tell much from this graph. It's moving to the right, so things must go well! However, the non-cumulative graph paints a different picture.

Source: image by the author.

Now, it is an entirely different picture. And I have just purchased stock in that company!

Tactic 11: Using 3D visualizations

Employ 3D visualizations to distort perspective and make accurate comparisons challenging. While 3D charts can appear sophisticated and engaging, they often introduce visual distortion that can exaggerate or minimize differences between data points, depending on the angle of view. This can be particularly misleading when used to represent data that would appear less dramatic if shown in a simple 2D format. The key is to use the added dimension to your advantage, skewing perception in favor of your desired interpretation.

To exemplify this technique, I prepared the visual below. Which supplier of Company A is the largest? At first glance, the intuitive answer seems to be Supplier B. But is it?

Source: image by the author.

Adding data labels revealed a surprising truth: Supplier B is the second largest, trailing well behind Supplier A.

Source: image by the author.

To avoid such confusion, the solution is straightforward: never use 3D transformations to visualize data.

Source: image by the author.

Tactic 12: Utilizing gradients and shades

Use gradients and shades to create visual depth or highlight certain parts of your data over others. By manipulating the visual weight of data through color intensity or gradient effects, you can guide the viewer's attention toward or away from specific data points. The strategic use of shading can thus mask the true nature of the data, subtly influencing interpretation.

Let's examine the example below. Some individuals may find this "form of expression" appealing, though I do not. We must consider whether it truly makes sense. For instance, it might be appropriate if we are presenting to a group of kindergarten children. In other contexts, it likely does not make sense, especially considering the extra effort required to achieve such a result.

Source: image by the author.

Let's correct that chart. See the difference? Interpreting the data takes seconds and no longer causes eye strain.

Source: image by the author.

Tactic 13: Ignoring the power of titles

Use meaningless or purely technical titles for your data visualizations. The title of a chart or graph is not just a label; it's a framing device that sets the context and primes the audience's understanding of the visualized data. By deliberately choosing neutral, misleading, or vague titles, a presenter can downplay or obscure the data's real narrative. For instance, a graph showing a significant decline in wildlife population due to deforestation could be misleadingly titled "Changes in wildlife population" instead of accurately reflecting the crisis with a title like "Dramatic decline in wildlife Population due to deforestation." This manipulation technique relies on the audience's tendency to take the title at face value, thus shaping their interpretation of the data before they even begin to analyze the specifics.

Title is the quickest and most effective tool to improve our visualizations. No time to remove 3D, shading, or unnecessary elements? Add a meaningful title!

Imagine we have something as complicated as the below image. What's in this chart? Understanding our stock's performance against the market is critical. Viewers should quickly grasp the main message.

Source: image generated by the author in ChatGPT & Spyder.

We can assist viewers by providing a compelling, actionable title. With a strong title, they might not even need to delve into the visualization below – though it must remain available for reference.

Source: image generated by the author in ChatGPT & Spyder.

Tactic 14: Using "junk charts"

Employ overly complex or unnecessarily convoluted visualizations to present data, often obscuring the truth or making it difficult for the audience to understand the real story behind the numbers. These charts might include excessive decorative elements, confusing 3D effects, inappropriate use of color, or an overwhelming amount of information crammed into a single visualization. These charts' complexity or decorative nature can distract from the data's actual message, leading to misinterpretation or a focus on irrelevant details. Junk charts can be particularly manipulative when they aim to impress or overwhelm the viewer with style over substance, diverting attention from less favorable interpretations of the data. Effective data communication relies on clarity, simplicity, and directness, often intentionally ignored when junk charts are deployed.

Junk chart examples. Source: image generated by the author in ChatGPT.

Conclusions

The importance of visualization cannot be overstated. It turns complicated data into clear, interesting stories that can teach, convince, and inform people. However, using it well means taking on a big responsibility. Looking closely at how some visuals can mislead or twist the truth shows why ethics matter so much in this field. This article pushes for honesty, precision, and ethical behavior in making and understanding data visuals, reminding us to always be careful and thoughtful.

As we navigate the vast seas of data in this digital age, let us commit to principles that ensure our visualizations not only captivate but also truthfully inform our audience, fostering a well-informed society capable of making decisions grounded in reality rather than manipulation.

Source: image generated by the author in ChatGPT.

Did you like this post?

Consider a subscription to get notified about my new stories, follow me, or leave a

Tags: Charts And Graphs Data Manipulation Data Storytelling Data Vizualisation Tips And Tricks

Comment