Marketing Data Science — Segmentation of Customers

Tracking Data from an Online Shop

Published in

DataDrivenInvestor

7 min readJul 10, 2022

In this article series, I wrote so far about Marketing Data Science ”Use Cases” and ”Customer Data Platforms”, and I described typical applications of Data Science ”Lead Prediction” and “Churn Prediction”. In this article, I bring an example of “Customer Segmentation”.

Customer segmentation is a data-driven decision technique to classify customers into homogenous groups. The data based on which segmentation is done can be structured data (e.g., demographic data such as gender, age, and income) or unstructured data (e.g., social media data). Further data can be collected to identify customer groups, such as data on customers’ behavior (e.g., which websites customers visited) or data on purchases.

In this post, I’ll show you how to group consumers using web analytics data from an online store. On-site personalization and targeted marketing campaigns may be implemented based on the findings.

On the way there, we’ll look at the data in further depth (“Explorative Data Analysis” or “EDA”), do some preliminary processing on the data, create segmentation, and then present the clusters. For the calculations, we will use Google Colab.

The Data

The data is from the Kaggle data platform and contains web tracking data for one month (Oct. 2019) from a large multi-category online shop.

Each line in the file represents an event. There are different types of events, such as page views, shopping cart actions, and purchases.

The record contains information about:

event_time / When was the event triggered? (UTC)
event_type / view, shopping cart, purchase
product_id / product ID
category_id / category ID
category_code / category name
brand / brand name
price / price
user_id / customer ID
user_session / session ID

The data is available as a CSV file as an export from a customer database platform for analysis.

All the calculations are in the collab file:

https://gist.github.com/astoeckl/3c12fedbba2d5e593814fdef230dd81c

First look at the data

There are over 42 million records available for the month of October 2019.

Over 3 million people visited this site. Customers bought over 166,000 distinct items.

Example Customer Journey

We try to figure out what a specific session ID means when we examine all entries that have been submitted for that session ID and attempt to interpret them.

The user has viewed several iPhones
An iPhone purchased with 1 click (without a shopping cart event)
Considered 2 unknown products of the brand arena
Viewing some Apple headphones and buying one
Afterward, he visited a more expensive one but decided not to buy it.

Example Customer History

To view all actions of a specific user in that month, we filter all records with his User ID.

Explorative Data Analysis

How many occurrences were recorded on each day in the web analysis?

Number of event types

How frequently do different events occur in the data, and what are they?

The majority of the information is made up of page views (96%), while the remaining portion consists of the shopping cart and purchase activities.

Features of the visitors

We take the most significant characteristics of each visitor and combine them into a table.

Pageviews
Visits
Number of purchased products
Number of products in the shopping cart
Total expenditure
Expenditure per visit
Pageviews per visit
Shopping cart actions per visit

We filter the purchases from the actions

In the following stage, we narrow down our purchases from the data in order to analyze them more accurately. We save the result in a separate table.

Key figures on purchases

How many products are purchased by one buyer? What is the average purchase value per buyer?

On average, each buyer makes slightly more than 2 purchases.

The average buying value per buyer is 773.85.

Brand popularity

From which brands are products bought?

Let’s look at a bar chart of the top 10 brands.

For further analysis, we organize the transactions into sets of the most popular brands (the top 5). The remaining are put in a group called “Others.”

For each customer, we determine the proportion of purchases in the six brand categories and store them in the buyer table.

Product categories

Which product categories are available?

The product category exists as a hierarchical code. We grab the first level and save it as a separate characteristic.

There are a total of 13 primary categories. The share of the purchase price in each of the major categories is added as an extra feature to the table of purchasers’ table.

Adding purchase characteristics to the characteristics of all visitors

We can now aggregate the purchase characteristics of all visitors, resulting in a table with all visitors and characteristics.

So we have the data of 3,022,290 users, of which we have stored 27 characteristics each.

Limitation of the number of users

We’ll stick to the first 50,000 users in order to keep calculations and visualization within bounds.

Conversion to matrix format for cluster calculation

Before we can begin clustering, we must alter the data into the correct format as a 2-dimensional array.

Scaling of the data

To ensure that all characteristics are represented on a comparable scale, the matrix is adjusted by moving by the average value and dividing by the standard deviation.

Calculation of customer segments with different numbers of clusters

The “k-Means algorithm” is used to identify the segments. It’s a type of cluster analysis where a group of items must create k clusters that are specified in advance.

The k-means algorithm in data begins with the first group of randomly selected centroids, which are used as the starting points for each cluster, and then iterates (repetitive) calculations to optimize the positions of the centroids.

Because we are dealing with a large amount of data, we employ the “mini-batch” form of the technique, which calculates new cluster centers only part of the time using all of the data in each iteration.

https://towardsdatascience.com/understanding-k-means-clustering-in-machine-learning-6a6e67336aa1

How to set the optimal number of clusters (“k-value”)?

For a given value of k, we calculate the clustering and then look for the optimum k. The computed silhouette score is a measure of how well the clustering was done. The closer the value is to 1, the better. It’s used to determine how many clusters there should be.

https://medium.com/@jyotiyadav99111/selecting-optimal-number-of-clusters-in-kmeans-algorithm-silhouette-score-c0d9ebb11308

We now use the determined optimum cluster number to generate clusters. Furthermore, we must consider the number of clients assigned to each segment.

Visualization of the clusters

We utilize the method “tSNE” to generate a visualization of the clustering. t-Distributed Stochastic Neighbor Embedding (tSNE) is a dimensionality reduction technique that is particularly suited for displaying high-dimensional data sets.

The data set is typically assumed to be data points on some data manifold, of which the data points are distributed in some unknown way. The goal is to project the data into two dimensions using dimensionality reduction while preserving distances between data points as much as possible.

https://towardsdatascience.com/an-introduction-to-t-sne-with-python-example-5a3a293108d1

Now let’s work out a visualization with a far fewer number of clusters. It is considerably more difficult to split the distinct regions into several parts.

Characterization of the segments

We use graphical representations such as “radar charts” to show the characteristics of categories for each segment at once in order to enable an interpretation of the segments. This may assist in interpreting the significance of the segments.

For example, a segment with high purchase shares in the categories of “Children” and “Sports,” as well as others in the areas of “Electronics” and “Gadgets.”

Subscribe to DDIntel Here.

Join our network here: https://datadriveninvestor.com/collaborate