Cluster Analysis in Marketing

In marketing, Cluster Analysis is a task performed on customer data to create distinct groups backed by appropriate figures. Based on these groups, you can modify your offer by changing price, product features, promoting it differently, advertising it using another way or message or even putting it in a different kind of store.

The one-size-fits-all is a strategy less effective than looking at (and treating) each customer differently. Customer groups are somewhere in between.

Such groups can be made by different kind of information apart from the commonly-used age, gender, income level, number of people in household etc. It is better actually to not use these for grouping but rather afterwards to describe the groups.

How can cluster analysis help in grouping customers?

The basic reason that cluster analysis exists is to support the grouping of the customers (or market segments) when this is not done based on your knowledge and expertise in your market. For example, it is easier for you to notice that older people come by your store more often in the weekdays and less often in the weekend.

Because you noticed this, you give one of your products that helps older people on Monday and Tuesday with a discount and it’s back to normal for the weekend so as not to lose sales from other people that buy it (say younger couples). This way, sales for both products increased and the cost of the promotion was as small as possible.

When you want to make decisions like that for more products and more customers, cluster analysis can come in very handy.

Cluster analysis creates distinct groups using the information available. Information can come from:

  1. sales data like sales, products sold, their category, their subcategory, date, amount, discount, store etc.
  2. surveys from across the market and include information like how often a customer purchases, age, gender, household income, political views, interests etc.
  3. Customer data like all the above but for each customer, using sources like loyalty cards
  4. In the big data age, you may look for data that include the above plus web usage data, like websites visited, time spent on website, what did the user click on, facebook likes, twitter favs, pinterest pins etc.

How do I perform Cluster Analysis?

Because clustering is about finding new things, this process requires some guidance from your part. This helps in saving time by making the search more focused. There are many processes (algorithms) to do clustering. It depends as with all analytics projects what is the question you want to answer. I assume you want to create clusters from information on purchases from your online store and website usage.

One popular algorithm is K-means. This algorithm requires of you to set, apart from the information necessary, the number of groups you want to end up with. The algorithm will try to put your customers (or generally people involved in the analysis) in one of the groups so that the groups are as distinct as possible and the people as similar as possible.

The result you get is clusters with averages that are as different as possible. For instance, one cluster of the total four that you wanted for the analysis is desribed by higher average purchased items and lower website visits than the other three. This would mean that customers that belong to that cluster visit rarely and buy a lot.

For educational purposes you can perform the analysis using code by Sheldon Neilson  you can download the workbook at the end of this post. It is a free and easy-to-use macro for Excel. There is detailed explanation as to how it works.

To test it, you can just replace the data in the workbook with your own and click on the button that says “k-means” on the top left part of the worksheet. I suggest you put headers for the variables you use in the columns.

A message will pop up that will ask you to select the data. Highlight the range including the headers for columns (variables) and rows (customers/responses/transactions). Click “OK”

Another message will pop up that asks for the number of clusters. This corresponds to “k” in “K-means”. Most probably you don’t know how many segments there are in your customer base, so you will run this several times depending on the case. Even in large surveys for consumers it is rare to have more than 6 market segments. So, if you are starting out then run K-means with 3, 4 and 5 groups and see the results how they make sense.

The result (or output) is actually the “Centroid” for each group. The centroid for each group represents the average member of the group, which in our case is the average customer. The customers in the data that are the closest to this average customer are assigned to his/her group.

From a technical perspective, there are other things you can do to check “how good” the results are like how many groups are better, or whether the groups created are really different from each other and how similar are the customers in each group.

To make sense of the results, you need to look at business experience and check whether such an average customer is important in your strategy. You can try to answer questions like:

  1. What problems does that group have that others don’t and would benefit from your offer?
  2. Which message is easier for one group to relate to (and be influenced better) than it is for the other groups?


Cluster analysis helps splitting your customers or whole market in smaller groups that are more easy to understand and customize (or develop from scratch) your complete offer in price, features, promotions, point of sale. K-means is an option for this analysis and you can practice with the free Excel workbook and code from Sheldon Neilson. You need to check the validity of the results not just technically but also based on what you can make of it.

So, what questions do you want to answer with clustering?

Leave a Reply