Digital Marketing Analysis with Python and MySQL

Author:Murphy  |  View: 27978  |  Time: 2025-03-23 19:16:34
Photo by Zdeněk Macháček on Unsplash

Introduction

Along this brief journey, we will explore a short and simple dataset with a collection of basic website marketing metrics such as ‘users', ‘sessions' and ‘bounces', over a five months period.

The purpose of this setup, rather than focusing on understanding the website performance, is to gain some basic but somehow useful knowledge to answer a number of must-have operational marketing questions.

We'll focus on two powerful and most-used digital tools exploring two ways leading us to the same results at the end of the day.

On the one hand, we will explore the syntax of MySQL Workbench with some diverse queries, in parallel, for each question, with the syntax of Python using graphic and visual resources. Both environments will be titled as # MySQL and # Python, respectively. For each question, with notes and explanations on both codes for a deeper understanding.

MySQL

-- displaying dataset (case_sql.csv)
SELECT * FROM case_sql;
Image by author.

You can download the SQL dataset here.

Python

# import python libraries
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import matplotlib.style as style
%matplotlib inline
color = sns.color_palette()
from pandas.plotting import table 
from datetime import datetime
# load the data set
df = pd.read_csv("case.csv", sep=";")

# number of observations and columns
df.shape
(31507, 7)

# display rows sample
df.sample(15)
Image by author.
# SHAPE
# Create a function that prints the shape of the dataframe and some other basic info 
# --> number of observations, features, duplicates, missing values (True, False) datatypes and its proportion.

def shape_df(df):
    print(f"Number of observations: {df.shape[0]}")
    print(f"Number of variables:    {df.shape[1]}")
    print(f"Number of duplicates:   {df.duplicated().sum()}")
    print(f"Are there any missing values? {df.isnull().values.any()}n-----")
    print(f"{df.dtypes.sort_values(ascending=True)}n-----")
    print(f"Datatypes' proportion:n{df.dtypes.value_counts(ascending=True)}")

# calling the function
shape_df(df)

Number of observations: 31507
Number of variables:    7
Number of duplicates:   4083
Are there any missing values? False
-----
date                         int64
users                        int64
sessions                     int64
bounces                      int64
brand                       object
device_category             object
default_channel_grouping    object
dtype: object
------
Datatypes proportion:
object    3
int64     4
dtype: int64
# lowering columns' capital letters for easy typing
df.columns = map(str.lower, df.columns)

# remove spaces in columns name
df.columns = df.columns.str.replace(' ','_')

Convert dates into datetime type

# make string version of original column 'date', call it 'date_'
df['date_'] = df['date'].astype(str)

# create the new columns using string indexing
df['year'] = df['date_'].str[0:4]
df['month'] = df['date_'].str[4:6]
df['day'] = df['date_'].str[6:]

# concatenate 'year', 'month' and 'day'
df["date"] = df["year"] + "-" + df["month"] + "-" + df["day"]

# convert to datetime
df["date"] = pd.to_datetime(df["date"], format="%Y-%m-%d")
# extract 'year', 'month' and 'weekday'
df["year"] = df["date"].dt.year
df["month"] = df["date"].dt.month
df["weekday"] = df["date"].dt.dayofweek.map({0 : "Mon", 1 : "Tue", 2 : "Wed", 3: "Thu", 4 : "Fri", 5 : "Sat", 6 : "Sun"})
# select columns to perform exploratory data analysis
cols = "date year month weekday brand device_category  default_channel_grouping  users  sessions  bounces".split()
df = df[cols].copy()

# display final dataset
df.head(10)
Image by author.

You can download the Python dataset here.


1. Let's look at the distribution of users by type of device

MySQL

-- Device distribution
SELECT 
    device_category, -- select the device_category column
    ROUND(COUNT(users) / (SELECT 
                    COUNT(users)
                FROM
                    case_sql) * 100,
            1) AS percent -- calculate the percentage of users in each category
FROM
    case_sql -- select data from the case_sql table
GROUP BY 1 -- group the result by device_category
ORDER BY 1; -- order the result by device_category in ascending order
Image by author.

We can see that the distribution by device shows mobile and desktop side by side as the most frequent access type.

# Python

# device distribution
# counts the number of occurrences of each unique device category in the device_category column of the DataFrame df, including missing values (if any).
df.device_category.value_counts(dropna=False).plot(kind='pie', figsize=(8,4),
                                              explode = (0.02, 0.02, 0.02),
                                              autopct='%1.1f%%',
                                              startangle=150);
# Params
plt.xticks(rotation=0, horizontalalignment="center")
plt.title("Device distribution", fontsize=10, loc="right");
Image by author.

2. What is the above scenario on a brand perspective?

MySQL

-- Brand distribution
SELECT 
    brand, -- select the brand column
    COUNT(users) AS users, -- count the number of users for each unique brand and alias the result as "users"
    ROUND(COUNT(users) / (SELECT 
                    COUNT(users)
                FROM
                    case_sql) * 100,
            2) AS percent -- calculate the percentage of users for each brand out of the total number of users in the case_sql table and alias the result as "percent"
FROM
    case_sql -- select data from the case_sql table
GROUP BY 1; -- group the result by the first column (brand)
Image by author.

At a brand level, Brand 2 has the highest number of visits with 56.28% against 43.72% of total visits on Brand 1.

Python

# Brand distribution
absolut = df["brand"].value_counts().to_frame()

# Pie chart
absolut.plot(kind='pie', subplots=True, autopct='%1.2f%%', 
             explode= (0.05, 0.05), startangle=20, 
             legend=False, fontsize=12, figsize=(8,4))

# Params
plt.xticks(rotation=0, horizontalalignment="center")
plt.title("Brand's distribution", fontsize=10, loc="right");

display(absolut) # Table
Image by author.

3. On which day of the week did we see most users arrive on the Brand 1 website?

MySQL

SELECT 
    date, -- select the date column
    DAYNAME(date) AS day_name, -- calculate the day name corresponding to each date
    SUM(users) AS users -- sum the number of users for each unique date where the brand is 'Brand 1'
FROM
    case_sql -- select data from the case_sql table
WHERE
    brand = 'Brand 1' -- filter rows where the brand is 'Brand 1'
GROUP BY 1 -- group the result by date
ORDER BY 3 DESC -- order the result by users in descending order
LIMIT 1; -- select only the first row of the result (the row with the highest number of users)
Image by author.

From the 298 412 users between Sep 2019 and Jan 2020, the day most users arrive on the Brand 1 website was on the 2019–11–22 with a total of 885 visits, it was a Friday.

Python

# filter users that arrived at 'Brand 1' only, assign it 'brand_1'
brand_1 = df[df["brand"] == "Brand 1"].copy()

''' sum total users that came from all "channelgrouping" for the same date, 
assign it 'brandgroup' no matter the type of device '''

brandgroup = brand_1.groupby(["date","weekday"])[["default_channel_grouping","users"]].sum()

# filter the date by maximum users, assign it 'users'
users = brandgroup[brandgroup["users"] == brandgroup.users.max()].copy()

# reseat index
users.reset_index(["date"], inplace=True)
users.reset_index(["weekday"], inplace=True)

# results
print(f"""Date: {users.date} nnTotal users: {users.users} nnDay of week: {users.weekday}""")

Date: 0   2019-11-22
Name: date, dtype: datetime64[ns] 

Total users: 0    885
Name: users, dtype: int64 

Day of week: 0    Fri
Name: weekday, dtype: object

# calling the variable
users
Image by author.

3.1 How many users landed on Brand 2 on that same day?

MySQL

SELECT 
    DATE(date) AS date, -- Select the date from the 'date' column and convert it to a date data type
    DAYNAME(date) AS dayofweek, -- Select the day of the week from the 'date' column
    SUM(CASE
        WHEN brand = 'Brand 1' THEN users -- Sum the 'users' column for Brand 1
        ELSE NULL
    END) AS b1_users,
    SUM(CASE
        WHEN brand = 'Brand 2' THEN users -- Sum the 'users' column for Brand 2
        ELSE NULL
    END) AS b2_users
FROM
    case_sql -- From the 'case_sql' table
GROUP BY 1, 2 -- Group the results by the first and second columns (date and dayofweek)
ORDER BY 3 DESC -- Order the results by b1_users in descending order
LIMIT 1; -- Limit the results to only the highest total number of Brand 1 users
Image by author.

Actually, both brands saw the highest number of visits on the same day.

Python

# filter users that arrived at 'Brand 2', assign it 'brand_2'
brand_2 = df[df["brand"] == "Brand 2"].copy()

# rename the 'users' column from previous (above) Python code
brandgroup.rename(columns = {'users':'brand1_users'}, inplace = True)

# include a new column with the filtered users from 'Brand_2'
brandgroup["brand2_users"] = brand_2.groupby(["date","weekday"])[["default_channel_grouping","users"]].sum()

# filter the new column (brand2_users) by maximum users
users2 = brandgroup[brandgroup["brand2_users"] == brandgroup.brand2_users.max()].copy()
Image by author.

4. Of all the Channel Groupings, which has contributed the highest number of users?

MySQL

SELECT 
    default_channel_grouping AS channels,
    SUM(users) AS total_users,
    ROUND(SUM(users) / (SELECT 
                    SUM(users)
                FROM
                    case_sql) * 100,
            1) AS percent -- calculate the percentage of users for each channel
FROM
    case_sql
GROUP BY 1
ORDER BY 2 DESC;
Image by author.

Organic Search is by far the channel that generates most users (almost 141000) representing almost half of total visits on both websites, followed by Paid Search and Direct. Display takes the 4th position and Social the 6th contributing with 6722.

Python

# sum users by all channel groups and plot bar chart 
ax = df.groupby("default_channel_grouping")["users"].sum().sort_values(ascending=True)
.plot(kind="bar", figsize=(9,6), fontsize=12, linewidth=2, 
      color=sns.color_palette("rocket"), grid=False, table=False)

# show data labels
for p in ax.patches:
    ax.annotate("%.0f" % p.get_height(), (p.get_x() + p.get_width() / 2., p.get_height()),
                ha='center', va='center', xytext=(0, 7), textcoords='offset points')

# params
plt.xlabel("Channel groups", fontsize=10)
plt.xticks(rotation=90, horizontalalignment="center")
plt.ylabel("Absolute values", fontsize=10)
plt.title("Best channel group (highest number of users)", fontsize=10, loc="right");
Image by author.

4.1 Of all the Channel Groupings, which has contributed the highest number of users breakdown by brand?

MySQL

SELECT 
    default_channel_grouping AS channels,
    SUM(CASE -- sum users by brand and map to new columns
        WHEN brand = 'brand 1' THEN users -- if brand = 'brand 1', sum users and store in 'Brand_1' column
        ELSE NULL -- if not 'brand 1', set value to null
    END) AS Brand_1, -- create column for Brand 1 users
    SUM(CASE 
        WHEN brand = 'brand 2' THEN users 
        ELSE NULL 
    END) AS Brand_2 
FROM
    case_sql
GROUP BY 1 -- group by channel
ORDER BY 3 DESC; -- order by Brand 2 users in descending order
Image by author.

Python

# create pivot_table
# sum all users for each brand by channels
type_pivot = df.pivot_table(
    columns="brand",
    index="default_channel_grouping",
    values="users", aggfunc=sum)

display(type_pivot)

#Display pivot_table with a bar chart
type_pivot.sort_values(by=["Brand 2"], ascending=True).plot(kind="bar", figsize=(12,8) ,fontsize = 15)
plt.xlabel("Channel groups", fontsize=10)
plt.xticks(rotation=90, horizontalalignment="center")
plt.ylabel("Absolute values", fontsize=10)
plt.title("Channel groups by brand (highest number of users)", fontsize=10, loc="right");
Image by author.
Image by author.

Organic Search has contributed by 105.062 of users for Brand 2, and by 35.911 users for Brand 1. With the exception of ‘Other' in which the Brand 1 is superior, the Brand 2 contributes the highest delivering users to the website in all channels.

5. Amongst all channels, which brand has contributed with a percentage of at least 5% of paid sessions during 2019?

MySQL

SELECT 
    brand,
    default_channel_grouping AS channels,
    ROUND(SUM(sessions) / (SELECT 
                    SUM(sessions)
                FROM
                    case_sql) * 100,
            1) AS percent
FROM
    case_sql
WHERE
    default_channel_grouping IN ('Paid Search' , 'Paid Social', 'Display', 'Other Advertising') -- include only rows with these values
        AND date < '2020-01-01' -- only date before '2020-01-01' will be included.
GROUP BY 1 , 2
HAVING percent > 5 -- filters the groups to only include values greater than 5%.
ORDER BY 1 , 3 DESC
Image by author

Python

# groupby dataframe by selected cols
df = df.groupby(["date","brand","default_channel_grouping"])["sessions"].sum().to_frame().copy()

# calculate percentages (new column)
df["percent"] = (df.apply(lambda x: x/x.sum())*100).round(2)

# reset index
df = df.reset_index().copy()

# display a 5 rows sample
df.sample(5)
Image by author.
# filter paid channels using lambda function
paid = df.apply(lambda row: row[df['default_channel_grouping'].isin(['Display','Paid Search','Paid Social','Other Advertising'])])

# filter year 2019
paid = paid[paid['date'] < '2020-01-01']

# groupby channels by brand
paid = paid.groupby(["brand","default_channel_grouping"])[["sessions","percent"]].sum()

# filter sessions higher than 5%
paid[paid["percent"] >5]
Image by author.

6. How many visits did both brands received by type of device?

MySQL

SELECT 
    brand,
    SUM(CASE
        WHEN device_category = 'Desktop' THEN users
        ELSE NULL
    END) AS desktop,
    SUM(CASE
        WHEN device_category = 'Mobile' THEN users
        ELSE NULL
    END) AS mobile,
    SUM(CASE
        WHEN device_category = 'Tablet' THEN users
        ELSE NULL
    END) AS tablet
FROM
    case_sql
GROUP BY 1
ORDER BY 1;
Image by author.

Python

# pivot_table
type_pivot = df.pivot_table(
    columns="device_category",
    index="brand",
    values="users", aggfunc=sum)

display(type_pivot)

# display pivot_table (chart)
ax = type_pivot.sort_values(by=["brand"], ascending=False).plot(kind="bar", figsize=(12,8) ,fontsize = 15);

# adding data labels
for p in ax.patches:
    ax.annotate("%.0f" % p.get_height(), (p.get_x() + p.get_width() / 2., p.get_height()), ha='center', va='center', xytext=(0, 7), textcoords='offset points')

plt.xlabel("Brands", fontsize=10)
plt.xticks(rotation=0, horizontalalignment="center")
plt.ylabel("Absolute values", fontsize=10)
plt.title("Brand by type of device", fontsize=10, loc="right")
plt.legend(["Desktop","Mobile","Tablet"]);
Image by author.

Mobile is the preferred type of device at Brand 2 whereas desktop is the most used device at Brand 1.

6.1 What is the user average usage of device type by channels?

MySQL

SELECT 
    default_channel_grouping,
    AVG(CASE
        WHEN device_category = 'Desktop' THEN users
        ELSE NULL
    END) AS desktop,
    AVG(CASE
        WHEN device_category = 'Mobile' THEN users
        ELSE NULL
    END) AS mobile,
    AVG(CASE
        WHEN device_category = 'Tablet' THEN users
        ELSE NULL
    END) AS tablet
FROM
    case_sql
GROUP BY 1
ORDER BY 1;
Image by author.

Python

# pivot_table
type_pivot = df.pivot_table(
    columns="device_category",
    index="default_channel_grouping",
    values="users", aggfunc=np.mean)

display(type_pivot)
Image by author.
# display pivot_table
type_pivot.sort_values(by=["default_channel_grouping"], ascending=False).plot(kind="bar", figsize=(12,8) ,fontsize = 15);

plt.xlabel("Date", fontsize=10)
plt.xticks(rotation=90, horizontalalignment="center")
plt.ylabel("Absolute values", fontsize=10)
plt.title("Average use of device types by channel grouping", fontsize=10, loc="right")
plt.legend(["Desktop","Mobile","Tablet"]);
Image by author.

On average, desktop is more often used regarding Referral, Direct and Others. As for the other channels, vertical oriented content should be always taken in consideration.

7. How to assess the bounce rate of the Channel Groupings?

The Bounce Rate is calculated as the total number of bounces divided by the total number of sessions.

MySQL

SELECT 
    default_channel_grouping,
    SUM(sessions) AS sessions,
    SUM(bounces) AS bounces,
    ROUND(SUM(bounces) / SUM(sessions) * 100, 2) AS bounces_r
FROM
    case_sql
GROUP BY 1
ORDER BY 4 DESC;
Image by author.

Average Bounce Rate: 54.93% (avg_bounces_r)

SELECT 
    SUM(sessions) AS sessions,
    SUM(bounces) AS bounces,
    ROUND(SUM(bounces) / SUM(sessions) * 100, 2) AS bounces_r,
    AVG(ROUND(bounces/sessions*100, 2)) AS avg_bounces_r
FROM
    case_sql;

Python

# group individual channels by sum of users
dfbounce = df.groupby("default_channel_grouping")["users"].sum().to_frame()

# group individual channels by sum of sessions
dfbounce["sessions"] = df.groupby("default_channel_grouping")["sessions"].sum()

# group individual channels by sum of bounces
dfbounce["bounces"] = df.groupby("default_channel_grouping")["bounces"].sum()

# calculus of bounce rate for each individual channel
dfbounce["bounces_r"] = dfbounce.apply(lambda x: 0.0 if x["sessions"] == 0.0 else (x["bounces"] / x["sessions"])*100, axis=1).round(2)

dff = dfbounce.copy()

dfbounce.drop(["users"],axis=1,inplace=True)

# sort values by rate
dfbounce.sort_values(by="bounces_r", ascending=False)
Image by author.
# display bar chart with the bounce rate for each channel
ax = dfbounce.groupby("default_channel_grouping")["bounces_r"].sum().sort_values(ascending=True)
.plot(kind="bar", figsize=(9,6), fontsize=12, linewidth=2, color=sns.color_palette("rocket"), grid=False, table=False)

for p in ax.patches:
    ax.annotate("%.2f" % p.get_height(), (p.get_x() + p.get_width() / 2., p.get_height()), 
                ha='center', va='center', xytext=(0, 7), textcoords='offset points')

plt.axhline(dfbounce.groupby("default_channel_grouping")["bounces_r"].mean().mean(), linewidth=1, color ="r")
plt.xlabel("channel groups", fontsize=10)
plt.xticks(rotation=90, horizontalalignment="center")
plt.ylabel("Absolute values", fontsize=10)
plt.title("Bounce rate by channelGrouping", fontsize=10, loc="right");
Image by author.

As excepted, Display has the highest bounce rate of all channels followed by Direct and Paid Social. Organic Search is levelled with the average. Below the threshold with the lowest Bounce Rates we find Referral, Native and Other Advertising.

7.1 Does the Bounce Rate on the site appear to be improving or deteriorating over time?

MySQL

SELECT 
    YEAR(date) AS year, -- extract year
    MONTH(date) AS month,  -- extract month 
    DATE_FORMAT(date, '%b') AS month_,   -- format the date column to display month name
    ROUND(SUM(bounces) / SUM(sessions) * 100, 2) AS bounces_r  -- calculate bounce rate
    case_sql                        
GROUP BY 1 , 2 , 3                  
ORDER BY 1 , 2 , 3;                 
Image by author.

Python


df_date = df.groupby("date")[['sessions','bounces']].sum()

''' create function to assess the bounce rate, assign it as 'bounce_r'
Return 0 if session's value is 0, else divide the bounces by sessions 
for each date and multiply it by 100 to get the percentage '''

def div(bounces, sessions):
    return lambda row: 0.0 if row[sessions] == 0.0 else float((row[bounces]/(row[sessions])))*100

# create column 'bounce_r' with the function results
df_date["bounce_r"] = (df_date.apply(div('bounces', 'sessions'), axis=1)).round(1)

# drop unnecessary columns
df_date.drop(["sessions","bounces"], axis=1, inplace=True)

# sum all bounces over time and plot chart
ax = df_date.plot(kind="line", figsize=(14,6), fontsize=12, linewidth=2)

plt.xlabel("Date", fontsize=10)
plt.xticks(rotation=90, horizontalalignment="center")
plt.ylabel("Rate", fontsize=10)
plt.title("Evolution of the bounce rate over time", fontsize=10, loc="right");
Image by author.
# Smoothing the line with a step of 15 days interval
resampled = df_date["bounce_r"].resample("m").mean() 

plt.figure(figsize = (12,6))
ax = sns.lineplot(data = resampled)
plt.title("Evolution of the bounce rate over time (smooth)", fontsize=10, loc="right")
plt.xlabel("Date", fontsize=10)
plt.xticks(rotation=0, horizontalalignment="center")
plt.ylabel("Rate", fontsize=10);
Image by author.

The Bounce Rate on site is improving over time.

7.2 Bounce Rate breakdown by Channel and Brand

Python

# filter by brand
b1 = df[df["brand"] == "Brand 1"]
b2 = df[df["brand"] == "Brand 2"]

# ** brand 1 **

# group individual channels by sum of sessions for brand 1
dfbrand = b1.groupby("default_channel_grouping")["sessions"].sum().to_frame()
dfbrand.rename(columns={"sessions":"sessions1"}, inplace=True)

# group individual channels by sum of bounces for brand 1
dfbrand["bounces1"] = b1.groupby("default_channel_grouping")["bounces"].sum()

# calculus of bounce rate for each individual channel for brand 1
dfbrand["1bounces_r"] = dfbrand.apply(lambda x: 0.0 if x["sessions1"] == 0.0 else (x["bounces1"] / x["sessions1"]*100), axis=1).round(2)

# ** brand 2 **

# group individual channels by sum of bounces for brand 2
dfbrand["sessions2"] = b2.groupby("default_channel_grouping")["sessions"].sum()

# group individual channels by sum of bounces for brand 2
dfbrand["bounces2"] = b2.groupby("default_channel_grouping")["bounces"].sum()

# calculus of bounce rate for each individual channel for brand 2
dfbrand["2bounces_r"] = dfbrand.apply(lambda x: 0.0 if x["sessions2"] == 0.0 else (x["bounces2"] / x["sessions2"]*100), axis=1).round(2)

# sort values by rate
dfbrand.sort_values(by="1bounces_r", ascending=False)
Image by author.
# clean dataframe
dfchannels = dfbrand.copy()
dfbrand_chart = dfbrand.copy()
dfbrand_chart.drop(["sessions1","sessions2","bounces1","bounces2"], axis=1, inplace=True)

# display bar chart with the average bounce rate for each channel
ax = dfbrand_chart.plot(kind="bar", figsize=(13,6), fontsize=12, linewidth=2, color=sns.color_palette("BrBG"), grid=False, table=False)

for p in ax.patches:
    ax.annotate("%.1f" % p.get_height(), (p.get_x() + p.get_width() / 2., p.get_height()), ha='center', va='center', xytext=(0, 7), textcoords='offset points')

plt.xlabel("channel groups", fontsize=10)
plt.xticks(rotation=90, horizontalalignment="center")
plt.ylabel("Absolute values", fontsize=10)
plt.title("Bounce rate by channelGrouping and by brand", fontsize=10, loc="right")
plt.legend(["Brand 1","Brand 2"]);
Image by author.

Brand 1 has a lower rate of what would be expected for the Display channel. Usually, this channels drive a high Bounce Rate. Important to understand in detail the content strategy and try to adapt it to Brand 2.

Brand 2 shows a higher bounce rate of what would be acceptable in the Referral channel.

8. Proportions between inbound and paid media

MySQL

SELECT 
    brand,
    CASE
        WHEN
            default_channel_grouping IN ('Paid Search', 
                'Paid Social',
                'Display',
                'Other Advertising')
        THEN
            'Paid'
        WHEN
            default_channel_grouping IN ('Direct',
                'Native',
                'Organic Search',
                'Referral',
                'Social',
                'Email',
                '(Other)')
        THEN
            'Organic'
        ELSE NULL
    END AS media,
    ROUND(SUM(bounces) / SUM(sessions) * 100, 2) AS bounce_r
FROM
    case_sql
GROUP BY brand , media
ORDER BY 1;
Image by author.

Python

# create dictionary
media_dict = 
{
    'Display': 'paid',
    'Paid Search': 'paid',
    'Paid Social': 'paid',
    'Other Advertising': 'paid',
    'Direct': 'organic',
    'Native': 'organic',
    'Organic Search': 'organic',
    'Referral': 'organic',
    'Social': 'organic',
    'Email': 'organic',
    '(Other)': 'organic'
}

# mapping the dict into a new column
df['media'] = df['default_channel_grouping'].map(media_dict)

# define cols position in dataframe
cols = ['brand','media','sessions','bounces']

# reindex columns order
df = df.reindex(columns = cols)

# groupby dataframe by selected cols
df = df.groupby(["brand","media"])[["sessions","bounces"]].sum()

# bounce rate by channel
df["bounces_r"] = df.apply(lambda x: 0.0 if x["sessions"] == 0.0 else (x["bounces"] / x["sessions"])*100, axis=1).round(2)
Image by author.

Conclusion

As promised, we went through a step-by-step approach to conducting a simple digital Marketing Analysis working alongside MySQL Workbench and Python.

Both tools have their specificities, their demands, but the reasoning is relatively similar, leaving aside their graphic capabilities and limitations.

Feel free to download the datasets and explore by practicing some of the technical details covered here, implementing new code as you ask further analysis questions.

Explore others projects you might also like:

SQL Digital Marketing Analysis

15 Business Questions about Mobile Marketing Campaigns: ROAS (Return On Ad Spend)

Machine Learning: predicting bank loan defaults

How to get in contact with me:

✅ Thanks for reading!

Tags: Analytics Marketing Analysis MySQL Python Sql

Comment