I Tested Frontline M-LLMs on Their Chart Interpretation Skills
Multimodal LLMs (MLLMs) promise that they can interpret anything on an image. It's true for most cases, such as image captioning and object detection.
But can it reasonably and accurately understand data presented on a chart?
If you really want to build an app that tells you what to do when you point your camera at a car dashboard, the LLMs chart interpretation skills should be exceptional.
Of course, Multimodal LLMs can narrate what's on a chart, but consuming data and answering complex user questions is challenging.
I wanted to find out how difficult it is.
I set up eight challenges for LLMs to solve. Every challenge has a rudimentary chart and a question for the LLM to answer. We know the correct answer because we created the data, but the LLM needs to figure it out only using the visualization given to it.
As of writing this, and according to my understanding, there are five prominent Multimodal LLM providers in the market: OpenAI (GPT4o), Meta Llama 3.2 (11B & 90B models), Mistral with its brand new Pixtral 12B, Cloude 3.5 Sonnet, and Google's Gemini 1.5.
I'll let all these six models (because Llama3.2 has two variants) work on my challenges.
A quick note here: I'm not affiliated with any of these providers. This is my unopinionated analysis.
I use LangChain's multimodal prompting to keep things consistent between different models. If you repeat this, you can also use these providers' playground environments.
You could also check the Colab notebook I've used to evaluate these models.
Let's see how they perform.
TLDR
This is going to be a very long post. That's because I've also included the LLMs' outputs here. It would be easy for you if I put the final results in a table.
So, here it is.
The green cells indicate the model has successfully solved the problem given to it. An amber cell means the answers are partially correct. Red means it's nowhere close to the correct answer.

Based on my findings, Claude 3.5 Sonnet does well on most tasks. However, none of the models are fool-proof. Besides, all these tasks are pretty basic. We don't know how these models would perform for more sophisticated charts.
Chart Interpretation Challenges for LLMs
1 Basic Bar Chart
Bar charts are the most basic, and they are everywhere. What good if an LLM can't understand one of these?
Here, I'm trying the vertical bar chart with no explicit data labels. If there were any data labels, it would make interpretation easy for the LLM. The challenge here is understanding every bar's visual heights and answering the user's question.
We ask the LLM to find out the most popular and least popular fruit and how many more people prefer the most popular over the least popular.

This is a pretty easy challenge; one can answer the question by looking. The most popular fruit is Bananas, with 40 people preferring them; the least popular is grapes, with only 15 preferring them. The difference is 25.
But let's look at the outputs of the LLMs.
### Question: Which fruit is the most popular among the group, and how many more people prefer it over the least popular fruit?
### Correct Answer: The most popular fruit is Bananas, with 25 more people preferring it over Grapes.
### Model interpretation:
##### llama3.2 11b:
**Step 1: Identify the most popular fruit.**
The most popular fruit is bananas, with 40 people preferring it.
**Step 2: Identify the least popular fruit.**
The least popular fruit is grapes, with 15 people preferring it.
**Step 3: Calculate the difference between the most popular and least popular fruits.**
To find the difference, subtract the number of people who prefer grapes from the number of people who prefer bananas:
40 (bananas) - 15 (grapes) = 25
**Conclusion:**
The most popular fruit among the group is bananas, with 40 people preferring it. The least popular fruit is grapes, with 15 people preferring it. The difference between the two is 25 people, indicating that bananas are the clear favorite among the group.
##### llama3.2 90b:
To determine the most popular fruit and the difference in preference between it and the least popular fruit, we need to analyze the data provided in the bar graph.
**Step 1: Identify the Most Popular Fruit**
Looking at the graph, we can see that bananas have the highest number of people who prefer them, with a total of 40 individuals.
**Step 2: Identify the Least Popular Fruit**
The least popular fruit is grapes, with only 15 people preferring them.
**Step 3: Calculate the Difference in Preference**
To find out how many more people prefer bananas over grapes, we subtract the number of people who prefer grapes from the number of people who prefer bananas:
40 (bananas) - 15 (grapes) = 25
Therefore, 25 more people prefer bananas over grapes.
**Conclusion**
Based on the analysis, bananas are the most popular fruit among the group, and 25 more people prefer them over grapes, which is the least popular fruit.
##### gpt4o:
Bananas are the most popular fruit, preferred by 40 people. The least popular fruit is grapes, preferred by 15 people. Therefore, 25 more people prefer bananas over grapes.
##### pixtral12b:
To determine which fruit is the most popular and how many more people prefer it over the least popular fruit, we need to analyze the data provided in the image.
1. Identify the number of people who prefer each fruit:
- Apples: 25 people
- Bananas: 40 people
- Oranges: 20 people
- Grapes: 15 people
2. Determine the most popular fruit:
- The fruit with the highest number of people is Bananas, with 40 people.
3. Determine the least popular fruit:
- The fruit with the lowest number of people is Grapes, with 15 people.
4. Calculate the difference in the number of people who prefer the most popular fruit over the least popular fruit:
- Difference = Number of people who prefer Bananas - Number of people who prefer Grapes
- Difference = 40 - 15
- Difference = 25
Therefore, Bananas are the most popular fruit, and 25 more people prefer Bananas over Grapes.
Bananas are the most popular fruit, and 25 more people prefer Bananas over Grapes.
##### claude35sonnet:
Based on the bar graph showing the favorite fruits of 100 people:
The most popular fruit is Bananas, with 40 people preferring it.
The least popular fruit among the options given is Grapes, with 15 people preferring it.
To calculate how many more people prefer Bananas over Grapes:
40 (Bananas) - 15 (Grapes) = 25
Therefore, Bananas are the most popular fruit, and 25 more people prefer Bananas over Grapes (the least popular fruit in the survey).
##### gemini1.5:
Bananas are the most popular, with approximately 25 more people preferring them over grapes, the least popular.
I'm not surprised that all the models we tested here answered correctly. After all, it's a simple bar chart any child can understand.
But what amazes me is these LLM's reasoning steps. Gemini and GPT4o didn't give these reasoning steps. However, we can ask these models to "think step-by-step" to provide it. Nonetheless, these steps give us confidence in their answers.
Basic Line Chart
Besides bar charts, an equally overused visualization technique is the line chart. It's everywhere, from stock markets to scientific experiments to high school textbooks.
This is a dummy temperature over the week chart. The goal is to predict the day with the highest temperature and the difference between that and the temperature of Sunday.
Again, we don't use explicit data labels, so it's the LLM's job to make sense of the nodes' positions.

This would've been an easier challenge if it had grid lines. Even without that, the challenge isn't super complicated. We can see that the highest is 22, the lowest is 14, and the difference is 8. Here's how the LLMs tackle the problem.
### Question: On which day did the city experience the highest temperature, and what was the temperature difference between that day and Sunday?
### Correct Answer: The highest temperature was on Thursday at 22°C, which is 8°C higher than Sunday.
### Model interpretation:
##### llama3.2 11b:
**Temperature Analysis**
To determine the day with the highest temperature and the temperature difference between that day and Sunday, we need to analyze the graph provided.
**Highest Temperature Day**
The graph shows a clear peak on Thursday, indicating that this was the day with the highest temperature. The temperature on Thursday reached 22 degrees Celsius.
**Temperature Difference with Sunday**
To calculate the temperature difference between Thursday and Sunday, we need to subtract the temperature on Sunday from the temperature on Thursday.
Temperature on Thursday: 22 degrees Celsius
Temperature on Sunday: 14 degrees Celsius
Temperature difference = Temperature on Thursday - Temperature on Sunday
= 22 - 14
= 8 degrees Celsius
Therefore, the city experienced the highest temperature on **Thursday**, with a temperature of **22 degrees Celsius**. The temperature difference between Thursday and Sunday is **8 degrees Celsius**.
##### llama3.2 90b:
To answer this question, we need to identify the day with the highest temperature and then calculate the temperature difference between that day and Sunday.
**Step 1: Identify the day with the highest temperature**
Looking at the graph, we can see that the highest temperature was recorded on Thursday, with a temperature of 22°C.
**Step 2: Calculate the temperature difference between Thursday and Sunday**
The temperature on Sunday was 14°C. To find the temperature difference, we subtract the temperature on Sunday from the temperature on Thursday:
Temperature difference = Temperature on Thursday - Temperature on Sunday
= 22°C - 14°C
= 8°C
Therefore, the city experienced the highest temperature on Thursday, and the temperature difference between Thursday and Sunday was 8°C.
##### gpt4o:
The highest temperature was on Thursday at 22°C. The temperature on Sunday was 15°C, so the difference is 7°C.
##### pixtral12b:
The image is a line graph showing temperature changes over a week from Monday to Sunday. The question asks for the day with the highest temperature and the temperature difference between that day and Sunday.
**Reasoning:**
1. Identify the day with the highest temperature by looking at the peak of the graph. The highest point is on Thursday.
2. Note the temperature on Thursday, which is approximately 22°C.
3. Note the temperature on Sunday, which is approximately 14°C.
4. Calculate the temperature difference between Thursday and Sunday: 22°C - 14°C = 8°C.
The day with the highest temperature is Thursday, and the temperature difference between Thursday and Sunday is 8°C.
##### claude35sonnet:
According to the graph, the city experienced the highest temperature on Thursday, reaching approximately 22°C.
To calculate the temperature difference between Thursday and Sunday:
Thursday's temperature: 22°C (approximately)
Sunday's temperature: 14°C
Temperature difference = 22°C - 14°C = 8°C
Therefore, the temperature difference between Thursday (the hottest day) and Sunday was 8°C.
##### gemini1.5:
The highest temperature was on Thursday at 22 degrees. The difference between Thursday and Sunday's temperature is 8 degrees.
Again, Gemini and GPT4o didn't provide us with the reasoning steps, but we haven't asked for them. Once again, all the LLMs did provide the correct answer, except GPT4o, which was off by 1°C. The correct answer was 8°C, but GPT4o computed it as 7°C.
Pie Chart
If you take a list of pie chart haters, my name would be at the top. Despite their simplicity and widespread popularity, pie charts aren't easy to understand and could go wrong in many ways.
I'm not the only one who says this. Nancy Duarte, a renowned communication expert, also discusses it.
If your focus is on comparing categories, the bar chart is the better choice. – Nancy Duarte
Nonetheless, it's been everywhere. So, LLMs should be able to study them. Let's see how they perform.
This is a relatively simple challenge, though. We've got a pic chart of market shares. What would Brand C's share be if the total market size was 1 million?

Ahh! Okay, I'll leave the interpretability of Pie charts for now.
It looks like Brand C has between 20% and 25% of the total, so the answer should be around 230k. However, since we know the data, the correct answer is 200k.
### Question: If the total number of smartphone users is 1 million, how many users does Brand C have?
### Correct Answer: Brand C has 200000 users.
### Model interpretation:
##### llama3.2 11b:
To find the number of users for Brand C, we need to calculate the percentage of the market share that Brand C has.
Brand C has a market share of 20%.
To find the number of users for Brand C, we can multiply the total number of smartphone users (1 million) by the market share percentage (20%).
1,000,000 x 0.20 = 200,000
Therefore, Brand C has 200,000 users.
##### llama3.2 90b:
To find the number of users for Brand C, we need to calculate the percentage of the market share that Brand C has and then multiply it by the total number of smartphone users.
From the pie chart, we can see that Brand C has a market share of 25%. To convert this to a decimal, we divide by 100:
25% ÷ 100 = 0.25
Now, we can multiply the decimal by the total number of smartphone users (1 million) to find the number of users for Brand C:
0.25 × 1,000,000 = 250,000
Therefore, Brand C has 250,000 users.
##### gpt4o:
Brand C has a green segment in the pie chart. Estimating from the chart, Brand C seems to have about 20% of the market share.
So, if the total number of smartphone users is 1 million:
[ 1,000,000 times 0.20 = 200,000 ]
Brand C has approximately 200,000 users.
##### pixtral12b:
To determine the number of users for Brand C, we need to analyze the pie chart and understand the proportion of the market share that Brand C holds.
1. **Identify the market share of Brand C**: The pie chart shows that Brand C has a green segment. Visually, this segment appears to be approximately one-third of the pie chart.
2. **Estimate the percentage of the market share**: Since the green segment (Brand C) appears to be one-third of the pie chart, we can estimate that Brand C holds about 33.33% of the market share.
3. **Calculate the number of users for Brand C**: To find the number of users for Brand C, we multiply the total number of smartphone users by the percentage of the market share held by Brand C.
[
text{Number of users for Brand C} = text{Total number of smartphone users} times left( frac{text{Market share of Brand C}}{100} right)
]
Substituting the values:
[
text{Number of users for Brand C} = 1,000,000 times left( frac{33.33}{100} right)
]
Simplifying the calculation:
[
text{Number of users for Brand C} = 1,000,000 times 0.3333
]
[
text{Number of users for Brand C} = 333,333.33
]
Since the number of users must be a whole number, we can round to the nearest whole number:
[
text{Number of users for Brand C} approx 333,333
]
Approximately 333,333 users.
##### claude35sonnet:
To answer this question, I need to determine the percentage of the market share that Brand C holds, then calculate that percentage of 1 million users.
From the pie chart, I can see that Brand C is represented by the green slice. While exact percentages aren't given, I can estimate that Brand C's slice appears to be about 20-25% of the total pie.
Let's assume it's approximately 22% for this calculation.
22% of 1 million is:
1,000,000 * 0.22 = 220,000
Therefore, if the total number of smartphone users is 1 million, Brand C would have approximately 220,000 users.
Please note that this is an estimate based on visual approximation of the pie chart. The actual number could vary slightly depending on the precise percentage that Brand C holds.
##### gemini1.5:
It is impossible to tell from the image alone how many users Brand C has.
Things are getting interesting from now.
Llama 3.2 11b and GPT4o accurately predicted the size of the slice. Surprisingly, the Llama 3.2 90B model has a slight error, while the 11b one doesn't. Claude was fine with a 20k off, but Pixtral was nowhere near. Pixtral says 333k, which can't be true.
However, the real tragedy was Gemini 1.5. It didn't even attempt to solve the problem.
Stacked Bar Chart
Here's another chart I don't usually like, but I also think there's some reasonable use. Stack bar charts are complex to infer. They don't provide references to where the segment starts and how big it is.
The following chart stacks up the sales of two products over different quarters of a year. The task is to compute the total sales of product Y. I'm sure it'll take five minutes to figure out the answer. A grouped bar chart would have been easier.
Further, we also need to predict the percentage increase of product X.
However, since we know the data, the answer is 180 units of Product X, and the increase in Product X was 60%.

Let's see what the LLMs have to say about it.
### Question: Calculate the total annual sales for Product Y and determine the percentage increase in sales from Q1 to Q4 for Product X.
### Correct Answer: Total annual sales for Product Y is 180 units. Product X sales increased by 60.00% from Q1 to Q4.
### Model interpretation:
##### llama3.2 11b:
To calculate the total annual sales for Product Y, we need to add up the sales for each quarter.
* Q1: 80,000 units
* Q2: 100,000 units
* Q3: 120,000 units
* Q4: 140,000 units
Total annual sales for Product Y = 80,000 + 100,000 + 120,000 + 140,000 = 440,000 units
To determine the percentage increase in sales from Q1 to Q4 for Product X, we need to calculate the difference in sales between Q1 and Q4 and then divide by the sales in Q1.
* Sales in Q1: 50,000 units
* Sales in Q4: 80,000 units
Difference in sales = 80,000 - 50,000 = 30,000 units
Percentage increase = (Difference in sales / Sales in Q1) * 100
= (30,000 / 50,000) * 100
= 60%
Therefore, the total annual sales for Product Y is 440,000 units, and the percentage increase in sales from Q1 to Q4 for Product X is 60%.
##### llama3.2 90b:
To calculate the total annual sales for Product Y, we need to add up the sales for each quarter.
Product Y sales:
Q1 = 80,000 units
Q2 = 100,000 units
Q3 = 120,000 units
Q4 = 140,000 units
Total annual sales for Product Y = 80,000 + 100,000 + 120,000 + 140,000 = 440,000 units
To determine the percentage increase in sales from Q1 to Q4 for Product X, we need to calculate the difference in sales between Q4 and Q1, divide it by the sales in Q1, and then multiply by 100.
Product X sales:
Q1 = 50,000 units
Q4 = 80,000 units
Percentage increase = ((80,000 - 50,000) / 50,000) * 100 = 60%
Therefore, the total annual sales for Product Y is 440,000 units, and the percentage increase in sales from Q1 to Q4 for Product X is 60%.
##### gpt4o:
To calculate the total annual sales for Product Y and determine the percentage increase in sales from Q1 to Q4 for Product X, we'll analyze the data from the bar chart.
### Total Annual Sales for Product Y:
1. **Q1**: Approximately 20,000 units
2. **Q2**: Approximately 40,000 units
3. **Q3**: Approximately 60,000 units
4. **Q4**: Approximately 60,000 units
Total sales for Product Y = 20,000 + 40,000 + 60,000 + 60,000 = **180,000 units**.
### Percentage Increase in Sales for Product X from Q1 to Q4:
1. **Q1 Sales**: Approximately 60,000 units
2. **Q4 Sales**: Approximately 80,000 units
Percentage increase = (frac{(Q4 - Q1)}{Q1} times 100)
= (frac{(80,000 - 60,000)}{60,000} times 100)
= (frac{20,000}{60,000} times 100)
= (33.33%)
Therefore, the total annual sales for Product Y are 180,000 units, and the percentage increase in sales from Q1 to Q4 for Product X is approximately 33.33%.
##### pixtral12b:
To calculate the total annual sales for Product Y, we need to sum the sales of Product Y for each quarter (Q1 to Q4).
From the image:
- Q1: Product Y sales = 20,000 units
- Q2: Product Y sales = 30,000 units
- Q3: Product Y sales = 40,000 units
- Q4: Product Y sales = 50,000 units
Total annual sales for Product Y:
[ 20,000 + 30,000 + 40,000 + 50,000 = 140,000 text{ units} ]
Next, to determine the percentage increase in sales from Q1 to Q4 for Product X, we need to find the sales for Product X in Q1 and Q4.
From the image:
- Q1: Product X sales = 40,000 units
- Q4: Product X sales = 80,000 units
The increase in sales from Q1 to Q4 for Product X is:
[ 80,000 - 40,000 = 40,000 text{ units} ]
The percentage increase is calculated as:
[ left( frac{40,000}{40,000} right) times 100 = 100% ]
The total annual sales for Product Y are 140,000 units, and the percentage increase in sales from Q1 to Q4 for Product X is 100%.
##### claude35sonnet:
To answer this question, I'll need to break it down into two parts:
1. Calculate the total annual sales for Product Y:
Q1: 80 - 50 = 30
Q2: 100 - 60 = 40
Q3: 120 - 70 = 50
Q4: 140 - 80 = 60
Total = 30 + 40 + 50 + 60 = 180 thousand units
2. Calculate the percentage increase in sales from Q1 to Q4 for Product X:
Q1 sales for X: 50 thousand units
Q4 sales for X: 80 thousand units
Increase: 80 - 50 = 30 thousand units
Percentage increase = (Increase / Original) * 100
= (30 / 50) * 100 = 60%
Therefore, the total annual sales for Product Y is 180,000 units, and the percentage increase in sales from Q1 to Q4 for Product X is 60%.
##### gemini1.5:
The total annual sales for Product Y are approximately 390 thousand. Product X had a percentage increase in sales from Q1 to Q4 of approximately 150%.
Both Llama versions were off by a big margin in this challenge. Instead of summarizing the total sales of Product Y, it summarised the sales of both X and Y. However, they both correctly calculated the percentage increase in Product X's sales. GPT 4o did the first part correctly but couldn't compute the percentages.
Gemini 1.5 and Pixtral didn't do well on both questions.
The only model that got both parts right was Claude 3.5 Sonnet.
Scatter Plot
The best chart to study the relationships of two variables is this. Thus, countless scientific studies, as well as business presentations, include scatter plots.
The following is a dummy chart to study the relationship between the number of hours students study for an exam and their exam scores.
The question is simple: If a student studies five hours a day, what score can this student expect on the exam? We can answer this question with ease – 75.

Let's see what the LLMs are doing.
### Question: Based on the trend, estimate the expected exam score for a student who studies for 5 hours.
### Correct Answer: Based on the trend, a student who studies for 5 hours is expected to score approximately 74.5%.
### Model interpretation:
##### llama3.2 11b:
To estimate the expected exam score for a student who studies for 5 hours, we can analyze the trend in the graph. The graph shows a positive correlation between hours studied and exam scores, with higher hours studied corresponding to higher exam scores.
**Step 1: Identify the Trend**
The trend in the graph is a straight line with a positive slope, indicating that as the number of hours studied increases, the exam score also increases.
**Step 2: Locate the 5-Hour Mark**
We need to locate the point on the graph where the student has studied for 5 hours. This point is located at x = 5 on the x-axis.
**Step 3: Estimate the Exam Score**
To estimate the exam score, we can draw a vertical line from the 5-hour mark to the trend line. The point where this line intersects the trend line represents the estimated exam score.
**Step 4: Read the Estimated Exam Score**
By reading the y-axis value at the intersection point, we can estimate the exam score for a student who studies for 5 hours.
**Estimated Exam Score**
Based on the trend, the estimated exam score for a student who studies for 5 hours is approximately **80%**.
Therefore, the expected exam score for a student who studies for 5 hours is around 80%.
##### llama3.2 90b:
To estimate the expected exam score for a student who studies for 5 hours, we can analyze the trend in the graph. The graph shows a positive correlation between the number of hours studied and the exam score.
**Step 1: Identify the Trend**
The graph indicates that as the number of hours studied increases, the exam score also increases. This suggests a linear relationship between the two variables.
**Step 2: Determine the Rate of Increase**
By examining the graph, we can see that for every additional hour studied, the exam score increases by approximately 10 points. For example, from 2 hours to 3 hours, the score increases from around 60 to 70, which is a 10-point increase.
**Step 3: Apply the Trend to 5 Hours**
Since the student studies for 5 hours, we can apply the trend to estimate the expected exam score. Starting from the base score of around 60 at 2 hours, we add 10 points for each additional hour:
* 2 hours: 60 points
* 3 hours: 60 + 10 = 70 points
* 4 hours: 70 + 10 = 80 points
* 5 hours: 80 + 10 = 90 points
Therefore, based on the trend, the expected exam score for a student who studies for 5 hours is approximately 90 points.
##### gpt4o:
Based on the trend shown in the graph, a student who studies for 5 hours is expected to score approximately 75% on the exam.
##### pixtral12b:
The image is a scatter plot titled "Hours Studied vs Exam Scores." It shows the relationship between the number of hours studied and the corresponding exam scores. There is a red dashed trend line indicating a positive correlation between the hours studied and exam scores.
To estimate the expected exam score for a student who studies for 5 hours, we need to look at the trend line. The trend line can be used to estimate the exam score for a given number of hours studied. By observing the trend line, we can see that for 5 hours studied, the corresponding exam score is approximately 75%.
Therefore, the estimated exam score for a student who studies for 5 hours is 75%.
##### claude35sonnet:
Based on the trend shown in the scatter plot and the linear regression line (represented by the red dashed line), we can estimate that a student who studies for 5 hours would likely score around 75% on the exam.
The graph shows a clear positive correlation between hours studied and exam scores. The regression line passes through approximately the 75% mark on the y-axis (Exam Score) when it intersects with the 5-hour mark on the x-axis (Hours Studied).
This estimate aligns with the overall trend depicted, where more hours of study generally correspond to higher exam scores. The actual data point closest to 5 hours studied shows a score of about 80%, which is close to our estimate from the trend line.
##### gemini1.5:
Based on the trend, the expected exam score for a student who studies for 5 hours is approximately 75%.
Even though the task was simple, only GPT4o, Pixtral 12B, and Gemini could get it right. Sonnet and Llama 3.2 11B were off by 5 points, but 90B was off by 15 points.
Multiple Line Charts
We spoke about line charts and how LLMs already understand them. However, line charts are often suitable for comparing multiple variables, too.
The following chart compares the revenue growth of three companies over six months. The task is to find the company with the highest revenue growth and its growth value.
From the data, we know it's company three, and the value is 31.25%. We can't blame the LLMs for picking company two because that shows similar growth.

Here are the LLM interpretations.
### Question: Which company showed the highest overall growth over the six months, and by what percentage did their revenue increase from January to June?
### Correct Answer: Company 3 showed the highest overall growth of 31.25% from January to June.
### Model interpretation:
##### llama3.2 11b:
To determine which company showed the highest overall growth over the six months, we need to calculate the percentage increase in revenue for each company from January to June.
**Company 1:**
* January: $100,000
* June: $130,000
* Increase: $30,000
* Percentage increase: ($30,000 / $100,000) x 100% = 30%
**Company 2:**
* January: $90,000
* June: $115,000
* Increase: $25,000
* Percentage increase: ($25,000 / $90,000) x 100% = 27.78%
**Company 3:**
* January: $80,000
* June: $105,000
* Increase: $25,000
* Percentage increase: ($25,000 / $80,000) x 100% = 31.25%
Based on the calculations, **Company 3** showed the highest overall growth over the six months, with a percentage increase of 31.25%.
##### llama3.2 90b:
To determine which company showed the highest overall growth over the six months, we need to calculate the percentage increase in revenue for each company from January to June.
**Company 1:**
* Revenue in January: $100,000
* Revenue in June: $130,000
* Increase in revenue: $30,000
* Percentage increase: ($30,000 / $100,000) x 100% = 30%
**Company 2:**
* Revenue in January: $90,000
* Revenue in June: $120,000
* Increase in revenue: $30,000
* Percentage increase: ($30,000 / $90,000) x 100% = 33.33%
**Company 3:**
* Revenue in January: $80,000
* Revenue in June: $110,000
* Increase in revenue: $30,000
* Percentage increase: ($30,000 / $80,000) x 100% = 37.5%
Based on the calculations, **Company 3** showed the highest overall growth over the six months, with a revenue increase of **37.5%** from January to June.
##### gpt4o:
Company 1 showed the highest overall growth.
In January, Company 1 had a revenue of approximately 100 thousand dollars, and in June, it reached about 130 thousand dollars.
The percentage increase is calculated as follows:
[
text{Percentage Increase} = left(frac{text{June Revenue} - text{January Revenue}}{text{January Revenue}}right) times 100
]
[
text{Percentage Increase} = left(frac{130 - 100}{100}right) times 100 = 30%
]
So, Company 1's revenue increased by 30% from January to June.
##### pixtral12b:
To determine which company showed the highest overall growth over the six months and by what percentage their revenue increased from January to June, we need to follow these steps:
1. Identify the revenue for each company in January and June.
2. Calculate the percentage increase in revenue for each company from January to June.
3. Compare the percentage increases to determine which company had the highest growth.
Step 1: Identify the revenue for each company in January and June.
- Company 1 (Blue line):
- January: ~100,000 USD
- June: ~130,000 USD
- Company 2 (Orange line):
- January: ~90,000 USD
- June: ~110,000 USD
- Company 3 (Green line):
- January: ~80,000 USD
- June: ~100,000 USD
Step 2: Calculate the percentage increase in revenue for each company from January to June.
- Company 1:
- Percentage Increase = ((130,000 - 100,000) / 100,000) * 100 = 30%
- Company 2:
- Percentage Increase = ((110,000 - 90,000) / 90,000) * 100 = 22.22%
- Company 3:
- Percentage Increase = ((100,000 - 80,000) / 80,000) * 100 = 25%
Step 3: Compare the percentage increases to determine which company had the highest growth.
- Company 1: 30%
- Company 2: 22.22%
- Company 3: 25%
Company 1 showed the highest overall growth over the six months.
Company 1 showed the highest overall growth over the six months, with a percentage increase of 30% from January to June.
##### claude35sonnet:
To determine which company had the highest overall growth and calculate its percentage increase from January to June, I'll calculate the percentage change for each company:
Company 1:
January revenue: $100,000
June revenue: $130,000
Percentage change = (130,000 - 100,000) / 100,000 * 100 = 30% increase
Company 2:
January revenue: $90,000
June revenue: $115,000
Percentage change = (115,000 - 90,000) / 90,000 * 100 = 27.78% increase
Company 3:
January revenue: $80,000
June revenue: $105,000
Percentage change = (105,000 - 80,000) / 80,000 * 100 = 31.25% increase
Company 3 showed the highest overall growth with a 31.25% increase in revenue from January to June.
##### gemini1.5:
Company 3 showed the highest overall growth. Their revenue increased from approximately $80,000 to $105,000, which is a growth of about 31%.
This task was easy for Llama 3.2 11b, Sonnet, and Gemini. Llama 3.2 90b got it slightly off because it didn't compute the growth percentage correctly. Pixtral and GPT4o were doing badly as they didn't even find the company correctly.
Histograms
Histograms are statisticians' favorites – the best tool to study the distribution of a variable.
The following is the age distribution of a random population. The task here is twofold. First, we need to find the population's median age group. The second task is to compute the percentage of the population under 30.

The data show that the median age group is 30–39, and 45% of the population is younger than 30.
Let's see how LLMs find out the answer.
### Question: What is the median age group of the population, and what percentage of the population is under 30 years old?
### Correct Answer: The median age group is 30-39. 45% of the population is under 30 years old.
### Model interpretation:
##### llama3.2 11b:
To find the median age group, we need to first arrange the age groups in order from youngest to oldest. The age groups are: 0-9, 10-19, 20-29, 30-39, 40-49, 50-59, 60-69, and 70-79.
Since there are an even number of age groups (8), the median will be the average of the two middle groups. The two middle groups are 20-29 and 30-39.
To find the percentage of the population under 30 years old, we need to add up the percentages for the age groups 0-9, 10-19, and 20-29.
* The percentage for the age group 0-9 is 5%.
* The percentage for the age group 10-19 is 15%.
* The percentage for the age group 20-29 is 25%.
Adding these percentages together gives us a total of 45%. Therefore, 45% of the population is under 30 years old.
The median age group of the population is 20-29, and 45% of the population is under 30 years old.
##### llama3.2 90b:
To find the median age group, we need to arrange the age groups in order from youngest to oldest and then find the middle value. The age groups are: 0-9, 10-19, 20-29, 30-39, 40-49, 50-59, 60-69, and 70-79.
Since there are an even number of age groups (8), the median will be the average of the two middle values. The two middle values are 20-29 and 30-39. Therefore, the median age group is the average of these two values, which is 25-34.
To find the percentage of the population under 30 years old, we need to add up the percentages for the age groups 0-9, 10-19, and 20-29.
The percentage for the age group 0-9 is 5%, the percentage for the age group 10-19 is 15%, and the percentage for the age group 20-29 is 25%.
Adding these percentages together gives us a total of 45%. Therefore, 45% of the population is under 30 years old.
**Answer:** The median age group is 25-34, and 45% of the population is under 30 years old.
##### gpt4o:
The median age group of the population is 30-39, as it divides the distribution into two equal halves.
The percentage of the population under 30 years old is the sum of the percentages of the 0-9, 10-19, and 20-29 age groups:
- 0-9: 5%
- 10-19: 15%
- 20-29: 25%
Total: 5% + 15% + 25% = 45%
So, 45% of the population is under 30 years old.
##### pixtral12b:
To determine the median age group, we need to look at the age groups listed on the x-axis and find the middle value. The age groups are: 0-9, 10-19, 20-29, 30-39, 40-49, 50-59, 60-69, and 70-79. Since there are eight age groups, the median will be the average of the fourth and fifth groups.
The fourth age group is 30-39, and the fifth age group is 40-49. The median age group is the average of these two, which is (30-39 + 40-49) / 2 = (30 + 40 + 39 + 49) / 4 = 158 / 4 = 39.5. Therefore, the median age group is 39.5 years old.
Next, to find the percentage of the population under 30 years old, we need to sum the percentages of the age groups 0-9, 10-19, and 20-29.
From the image:
- The percentage for the 0-9 age group is approximately 5%.
- The percentage for the 10-19 age group is approximately 15%.
- The percentage for the 20-29 age group is approximately 25%.
Adding these percentages together: 5% + 15% + 25% = 45%.
Therefore, 45% of the population is under 30 years old.
The median age group of the population is 39.5 years old, and 45% of the population is under 30 years old.
##### claude35sonnet:
To determine the median age group and the percentage of the population under 30 years old, I'll analyze the data presented in the graph.
1. Median age group:
The median would be the middle value when the population is divided into two equal halves. Looking at the cumulative percentages:
0-9: 5%
10-19: 5% + 15% = 20%
20-29: 20% + 25% = 45%
30-39: 45% + 20% = 65%
40-49: 65% + 15% = 80%
50-59: 80% + 10% = 90%
60-69: 90% + 7% = 97%
70-79: 97% + 3% = 100%
The 50% mark falls within the 30-39 age group. Therefore, the median age group is 30-39.
2. Percentage of population under 30 years old:
To calculate this, I'll add the percentages for the age groups 0-9, 10-19, and 20-29:
5% + 15% + 25% = 45%
Therefore, the median age group of the population is 30-39, and 45% of the population is under 30 years old.
##### gemini1.5:
The median age group is 30-39. 55% of the population is under 30 years old.
All the models could answer the second question, and the answer was accurate, 45%. However, Llama 3.2 versions didn't do well with the first task. The median age groups were not correct.
Box Plot
Box plots, like histograms, help us study the distribution of variables. They also help us identify skewness, outliers, and other properties more clearly.
The following box plot shows math test scores in a class. The LLM task is simple: find and comment on the IQR.

Here's what the LLMs do.
### Question: What is the interquartile range (IQR) of the test scores, and what does it indicate about the distribution of the scores?
### Correct Answer: The interquartile range (IQR) is 10%. It indicates that the middle 50% of the scores range from 70% to 80%.
### Model interpretation:
##### llama3.2 11b:
To find the interquartile range (IQR), we need to first identify the first quartile (Q1) and the third quartile (Q3).
Looking at the box plot, we can see that the median is at 75.0, which is the middle value of the dataset. The box extends from the first quartile (Q1) to the third quartile (Q3), with the median marked by a horizontal line inside the box.
The first quartile (Q1) is at 70.0, and the third quartile (Q3) is at 80.0.
Now, we can calculate the interquartile range (IQR) by subtracting Q1 from Q3:
IQR = Q3 - Q1 = 80.0 - 70.0 = 10.0
The interquartile range (IQR) of the test scores is 10.0. This indicates that the middle 50% of the scores are concentrated between 70.0 and 80.0, with the median score at 75.0. The IQR provides a measure of the spread or dispersion of the scores within this range.
*Answer*: 10.0
##### llama3.2 90b:
To find the interquartile range (IQR), we need to identify the first quartile (Q1) and the third quartile (Q3) from the box plot.
From the box plot, we can see that:
* The median (Q2) is approximately 78.
* The lower whisker extends to around 65, indicating that Q1 is approximately 72.5.
* The upper whisker extends to around 85, indicating that Q3 is approximately 80.
Now, let's calculate the IQR:
IQR = Q3 - Q1
= 80 - 72.5
= 7.5
The IQR of the test scores is 7.5. This indicates that the middle 50% of the scores are spread over a range of 7.5 points, from 72.5 to 80. A smaller IQR suggests that the scores are more concentrated around the median, while a larger IQR indicates more variability in the scores. In this case, the IQR is relatively small compared to the overall range of scores, suggesting that the majority of the scores are clustered around the median value of 78.
##### gpt4o:
The interquartile range (IQR) is the difference between the third quartile (Q3) and the first quartile (Q1). In the box plot, Q1 is at approximately 70 and Q3 is at approximately 80. Therefore, the IQR is:
[ IQR = Q3 - Q1 = 80 - 70 = 10 ]
The IQR of 10 indicates that the middle 50% of the test scores are spread over a range of 10 percentage points. This suggests a moderate spread in the central part of the data, with a relatively uniform distribution around the median.
##### pixtral12b:
To determine the interquartile range (IQR) of the test scores, we need to look at the box plot provided. The IQR is the range between the first quartile (Q1) and the third quartile (Q3). Here are the steps to find the IQR:
1. **Identify Q1 and Q3**:
- Q1 is the value below which 25% of the data falls.
- Q3 is the value below which 75% of the data falls.
2. **Determine the range**:
- The box plot shows that the lower quartile (Q1) is around 70%.
- The upper quartile (Q3) is around 80%.
3. **Calculate the IQR**:
- IQR = Q3 - Q1
- IQR = 80% - 70%
- IQR = 10%
The IQR of 10% indicates the spread of the middle 50% of the data. A smaller IQR suggests that the data is more concentrated around the median, while a larger IQR indicates more variability within the middle 50% of the data.
In this case, an IQR of 10% suggests that the scores are relatively concentrated around the median, indicating less variability in the middle range of the scores.
The interquartile range (IQR) of the test scores is 10%, indicating that the middle 50% of the scores are relatively concentrated around the median.
##### claude35sonnet:
To calculate the interquartile range (IQR), I'll identify the first (Q1) and third (Q3) quartiles from the box plot.
Q1 (bottom of the box): 70%
Q3 (top of the box): 80%
IQR = Q3 - Q1 = 80% - 70% = 10%
The interquartile range is 10 percentage points.
What this indicates about the distribution of the scores:
1. The middle 50% of the scores fall within a 10 percentage point range, suggesting a moderate level of consistency in the central portion of the data.
2. This IQR is relatively small compared to the overall range of scores (from about 65% to 85%), indicating that the central 50% of students performed fairly similarly to each other.
3. The IQR being smaller than the whiskers (especially the upper whisker) suggests there may be some outliers or more spread in the upper and lower 25% of the scores.
4. The median (line in the box) appears to be closer to Q1 than Q3, indicating a slight positive skew in the distribution of scores.
Overall, the IQR of 10% indicates a moderate level of variability in the central scores, with potentially more variability in the extremes of the distribution.
##### gemini1.5:
The IQR appears to be approximately 5 (79.5 - 74.5). This indicates that the middle 50% of the test scores are clustered within a relatively narrow range, suggesting a relatively consistent performance among the students within that range.
Although this task was simple, Gemini1.5 and Llama 3.2 90b didn't do well. Llama 3.2 90b was computing the IQR as 7.5 and Gemini as 5. Both are wrong.
Other models have correctly computed the IQR as 10%, and the comments were accurate.
Final Thoughts
Charts are everywhere, and not all are at their best representation.
However, for multimodal LLMs to be helpful in real life, their chart interpretation skills must be at their best. It should get the critical message extracted regardless of information clutter.
Most frontline MLLMs do a decent job of interpreting basic charts. Most notably, Claude 3.5 Sonnet does an excellent job. Also, Llama 3.2's 11B version works fine. The smaller model even outperforms its 90B version.
Yet, it may not be ready for mission-critical use cases. Also, I need to do more research to comment on how these models perform when given a more sophisticated chart.
Thanks for reading, friend! Besides Medium, I'm on LinkedIn and X, too!