Intro to Data Analysis: The "Google Method"

Author:Murphy  |  View: 21477  |  Time: 2025-03-23 18:22:42
Image by Adam Satria (Unsplash)

Data Analytics at large companies is messy.

The data is frequently elusive, hiding in various corners of the organisation, requiring soft skills and a Sherlock-like investigation to locate and integrate it together.

There's the formidable fortress of data governance that protects the company and its customers, but complicates your aim to get your grubby hands on said data. Here, negotiation and soft skills matter.

As if all that wasn't challenging enough, you'll need to muster those technical chops for effective analytics and master the art of data storytelling to sell the impact of your data-driven insights.

Update: I now post analytics tutorials on YouTube.

Reality of skillsets needed. Image by author

Having worked as a data scientist and engineer at one of Australia's largest banks over the past five years, I've discovered that online courses don't adequately prepare students for the realities of enterprise analytics.

I aim to kill three birds with this article.

For those…

  1. With little industry experience – I'll share my own end-to-end experiences of typical data analysis projects at my bank, warts and all.
  2. Learning data analytics – we'll perform a basic pivot table analysis in Excel that concludes with visualised insights. Dataset included!
  3. Looking for a data analysis process to follow – I'll employ Google's own six-stage process for the contemporary data analyst: Ask, Prepare, Process, Analyse, Share and Act.
From descriptive analysis to predictive modelling. Image by author

Let's dive in.

1. Ask

Unlike academia, analyses in industry is driven by a Business need.

What problem is the stakeholder trying to solve? Your analysis will typically enable the company to do something better, faster or cheaper.

My bank: In my industry of banking, here are some typical questions thrown our way.

  • How can we better identify fraudulent transactions?
  • What factors contribute to customer churn or employee turnover?
  • What are the key drivers behind customer satisfaction?
  • What are the key factors influencing credit card spending?
  • How can we optimise our loan approvals process so that it's faster yet still timely?

We'll work with business stakeholders to ask the right questions, define the problem and formulate a plan of attack.

We'll leverage some domain-expertise and consider the whole problem in order to capture the situation right.

And we'll keep an open line of communication with these stakeholders throughout the project.

Skills needed: project and stakeholder management; domain-expertise.

Exercise

Suppose you work for a small online retailer that sells various merchandise to Aussie and New Zealand customers. Your boss wants to understand the sales performance of umbrellas:

"Which months do they sell the best?"

"Where in Australia and NZ are they most popular?

Great! Some meaty questions for you to dive into.

2. Prepare

The business need now motivates a Data need.

We need to prepare the data for the analytics to drive the insights that answer the business question. (Whew, that was a mouthful!)

My bank: Here are some common questions analysts need to address as part of data preparation.

  • What data do we need? Where does it lie? Is it operational data or analytical data? Which source systems do the operational data lie in? Does the analytical data sit in a data warehouse or data lake?
  • What tools can we use to access the data? You might use something like Teradata Studio for a SQL database hosted by data warehouse vendor Teradata. You might use Azure Synapse Analytics for big data hosted on Microsoft Azure Cloud. You might use Excel or Power BI for smaller tables, particularly those small enough to download to your local machine. What if you need to join datasets from all these sources? Maybe an all-in-one Data Science workbench like Dataiku.
  • What governance processes do we need to navigate? Who owns the data? What approvals do we need to access it, particularly if the data contains sensitive customer information? How will you guarantee its security? How will the lineage of the data transformed during the lifecycle of this project be tracked? (…a massive contributor to enterprise tech debt).

On governance, data discovery projects here typically requires buy-in from a multitude of stakeholders, all with their own interests and KPIs:

  • The Business – obviously! They commissioned me in the first place…
  • Architecture – do the patterns needed for this project align with the strategic Technology direction of the bank? Or is my work largely an ad hoc tactical solution that will cease to become valuable later? Will my team be contributing to technical debt?
  • Data Platforms – do we have the data and tooling services to achieve the aims of the project? Does it align with our funding constraints?
  • Governance – what paperwork is needed to access and use the data? Are there privacy and ethical considerations?
  • Risk – what data risks are present? What are the controls in response? How will sensitive data be protected?

Skills needed: understand company's enterprise data architecture; data modelling; experience with different platforms and tools; ability to align various stakeholder interests.

Exercise (continued)

Suppose the aggregated sales data for the online business is stored in a CRM platform and downloadable to your laptop here.

Here's what it looks like in Excel.

Image by author

3. Process

Ah, the fun part.

Data wrangling!

(Half-joking here…)

During the data processing stage, you'll clean the data and make it ready for analysis.

This includes addressing data quality issues – such as duplicate and missing data – and decide on how to deal with things like irrelevant data and outliers.

My bank: In large organisations, data quality issues are almost impossible to eliminate.

There's simply too much data created by too many people and source systems.

Data warehouses often boast a significant amount of technical debt from bespoke data pipelines that are built for specific projects, creating redundant data assets that are not reusable.

In the 2020's, the solution organisations are adopting worldwide is moving from a project-oriented mindset to a product-oriented mindset where data is treated as a quality reusable product.

Meanwhile data lakes run into bottleneck issues due to funnelling an enormous amount of work on an overburdened centralised data team who are data engineering experts but not data-domain experts.

However, the benefit of having professional data engineers building ETL pipelines for data analysts and data scientists is that a significant amount of data problems are identified and fixed.

Skills needed: data wrangling.

Exercise (continued)

Time for you to prepare your spreadsheet above for analysis.

Since you're looking at the popularity of umbrella sales (i.e. volume sold is what matters), I suggest you hide columns like the Sale Price and Manufacturing Cost.

Image by author

You can then filter the sales for just Umbrellas…since that's what your boss cares about!

Image by author

Oops, we can already see some data quality issues, which you can directly fix in the spreadsheet.

Ideally, data quality issues are programmatically addressed with code, as manual human efforts are unreliable for anything beyond small datasets.

Thankfully, the dataset for this exercise is small.

Image by author

After addressing these issues, your dataset is ready for analysis.

Enjoying this story? Get an email when I post similar articles.

4. Analyse

Time for the fun bit.

Certainly the bit that all the online courses focus on – the analytics!

The primary goal here is to find the patterns, relationships and trends that solve the business problem.

You'll make sense of the data and try to unlock the value that sits latent within.

My bank: We have pockets of analytics excellence, which includes data science squads that perform advanced analytics and predictive modelling. This includes machine learning prototyping with traditional Python notebooks, or newer no-code ML platforms that democratise advanced capabilities across the organisation.

How the world of AI, big data and data science intersect. Image by author

By far, however, the majority of the organisation sit at a data maturity level closer towards descriptive analytics, which means tools like Excel and Power BI.

Skills needed: analytics chops!

Exercise (continued)

To get a better sense of what's happening in the data, you can slice and dice the dataset and its fields in useful ways by leveraging pivot tables.

The idea is you'll only select the variables you're interested in and create visualisations to drive insights.

To create a pivot table in Excel, select the source data and Insert a PivotTable next to it.

Image by author

You'll start with a blank PivotTable, which you can populate by dragging and dropping fields as shown below.

Image by author

You're interested in measuring Umbrella sales numbers, so drag the:

  • Product field into Filters, so you can filter by Umbrellas;
  • Units Sold field into Values, as that's your main measure.
  • Year and Month Name into Rows, so you know the Umbrella sales by year and month.

You now have your PivotTable!

Image by author

Time for the grand finale – visualising your PivotTable in a chart.

To create it, select your PivotTable and Insert a PivotChart. I'd suggest picking a Line chart, but feel free to try others in order to best represent your data.

Image by author

Aha, the magic moment…

Insights!

Image by author

Your PivotChart clearly tells an interesting data story.

In 2022, umbrella sales spiked in April, reached the first pinnacle around May to June, dropped a little bit in July, then reached its ultimate peak in August before dropping.

In any case, May to August saw significantly higher sales than the rest of the year.

This addresses the first business problem of working out which times of the year boast the best sales.

By dragging and dropping fields in our PivotTable, you can create another PivotChart that answers the second business problem of which parts of Australia and NZ drives the best sales.

Image by author

We can clearly see that the majority of Umbrella sales actually come from New Zealand. This actually sheds further insight onto the first business problem – May and August happens to be the country's wet seasons!

In short, you cleaned the data, created some pivot tables to aggregate the data and then crafted some charts to visually show that our Kiwi friends were ultimately responsible for generating the majority of umbrella sales during just a few wet months of the year!

Nice!

5. Share

Here, you want to find an appropriate audience with whom to share your insights.

If your findings don't inform some sort of downstream decision-making, were they valuable at all?

Alright, that was a bit harsh, but the point I'm trying to make is a lot of analytics insights don't get used, typically because they're:

  • Not shared by analysts to stakeholders or
  • Not articulated well or
  • Not actioned by the decision-makers (more on this next).

An effective data analyst will grab their insights and leverage their soft skills to secure an audience of stakeholders and decision-makers to hear their message.

Skills needed: data visualisation; effective communications and storytelling; stakeholder management; persuasion and soft skills.

Exercise (continued)

Arrange a meeting to present your findings to your boss. He'll need to know just how valuable your Kiwi customers are!

6. Act

In our finale of the end-to-end data analytics process, your communications and storytelling skills reign supreme.

Many data scientists argue that these are the most important skills, as without it your work will have little impact on stakeholders and shall fade into insignificance.

You want decision-makers to act on your insights.

Data storytelling is crucial here because…

  • Your data doesn't speak for itself. Sorry!
  • Humans are wired to believe stories.
  • Stories create audience empathy, which generates an oxytocin response that results in their trust in you.
  • Once you've established this connection, you can sell them whatever you want.
  • Bottom line: Sell stories, not data.

My bank: Unfortunately, many analytics and data science projects aren't ultimately actioned by decision-makers.

This can be due to a number of reasons:

  • No buy-in from stakeholders. Your results aren't strong enough – or perhaps communicated in a manner that undersold their impact. (Work on your delivery!)
  • Insights are difficult to act upon. For instance, it might be exorbitantly expensive to implement certain insights, or it doesn't align with current business priorities. (Tough luck!)
  • Legacy culture that's resistant to data-driven insights, which is a reflection of the data maturity of the organisation. (Problem is above your pay grade.)
Where does your company sit? Image by author

Data maturity at most companies sit at a 3 out of 5 or less.

My bank – like many in the West – is sitting at around 2 to 3 out of 5, characterised by pockets of analytics excellence and a leadership team trying their best to uplift the data literacy of the entire enterprise through various initiatives, such as data democratisation. (Making data and tooling easier for all.)

Skills needed: data visualisation; effective communications and storytelling; stakeholder management; persuasion and soft skills.

Exercise (continued)

At a meeting with your boss, you string together a strong narrative that leverages some of the interactive Excel PivotCharts you crafted. Your boss is convinced and commits to building a stronger business in New Zealand while ensuring there that umbrella inventories are stocked up for next year's wet season.

Congratulations on persuading decision-makers to take action on the insights from your analytics!

Final Words

Data analysis courses are notoriously light on teaching various soft skills required to succeed as an effective data analyst in the real world.

That's why I like Google's six-stage framework:

  • Ask. What's the business problem?
  • Prepare. How do we get the data we need?
  • Process. Reliable insights can only come from clean data.
  • Analysis. Finally…time to show off your technical chops!
  • Share. Assemble a relevant audience and tell a compelling story.
  • Act. Did your communicated data insights impact decision-making?

In this article, I delved into my personal industry experiences, shedding light on the considerations and challenges found in each stage at large enterprises.

And for the aspiring data analysts among you, we consolidated some vital skills, such as wrangling small datasets into summarised pivot tables and crafting visualisations to expose reportable insights.

Let me know if you enjoyed this article!

All the best on your journey.

Find me on Twitter & YouTube [[here](https://youtube.com/@col_shoots)](https://youtube.com/@col_invests), here & here.

My Popular AI, ML & Data Science articles

  • AI & Machine Learning: A Fast-Paced Introduction – here
  • Machine Learning versus Mechanistic Modelling – here
  • Data Science: New Age Skills for the Modern Data Scientist – here
  • Generative AI: How Big Companies are Scrambling for Adoption – here
  • ChatGPT & GPT-4: How OpenAI Won the NLU War – here
  • GenAI Art: DALL-E, Midjourney & Stable Diffusion Explained – here
  • Beyond ChatGPT: Search for a Truly Intelligence Machine – here
  • Modern Enterprise Data Strategy Explained – here
  • From Data Warehouses & Data Lakes to Data Mesh – here
  • From Data Lakes to Data Mesh: A Guide to Latest Architecture – here
  • Azure Synapse Analytics in Action: 7 Use Cases Explained – here
  • Cloud Computing 101: Harness Cloud for Your Business – here
  • Data Warehouses & Data Modelling – a Quick Crash Course – here
  • Data Products: Building a Strong Foundation for Analytics – here
  • Data Democratisation: 5 ‘Data For All' Strategies – here
  • Data Governance: 5 Common Pain Points for Analysts – here
  • Power of Data Storytelling – Sell Stories, Not Data – here
  • Intro to Data Analysis: The Google Method – here
  • Power BI – From Data Modelling to Stunning Reports – here
  • Regression: Predict House Prices using Python – here
  • Classification: Predict Employee Churn using Python – here
  • Python Jupyter Notebooks versus Dataiku DSS – here
  • Popular Machine Learning Performance Metrics Explained – here
  • Building GenAI on AWS – My First Experience – here
  • Math Modelling & Machine Learning for COVID-19 – here
  • Future of Work: Is Your Career Safe in Age of AI – here

Tags: Analytics Business Data Data Science Technology

Comment