Data Science Help Optimize Starbucks’ Promotion Strategy
Introduction
Once every few days, Starbucks sends out an offer to users of the mobile app. An offer can be only an ad for a beverage or a real offer, for example, a discount or BOGO (buy one get one free) or just an informational offer which includes the product information. Some users might not receive any offers during certain weeks. Some users might receive an offer, either not review it, or review it but choose to ignore it. Other users can also receive an offer, never actually view the offer, and still complete the offer. There will be an offer completion record in the data set; however, the customer was not influenced by the offer because the customer never viewed the offer. So it is crucial to distinguish the offers received followed by a transaction influenced by the offer. This is the challenging part in this project.
For an effective bogo or discount offer, it goes through the journey of ‘offer received’- ‘offer-reviewed’- ‘transaction’- ‘offer completed’.
For an effective informational offer, it goes through the journey of ‘offer received’- ‘offer-reviewed’- ‘transaction’ within valid offer window.
So we need to separate these two different datasets and build different models to predict if an offer is effective or not.
Project Goal
- What are the main factors influencing the effectiveness of an offer on the Starbucks app?
- Build a machine learning model to predict whether a user would take up an offer
Evaluation Metrics
The problem that we chose to solve was to build models to predict whether a customer will respond to an offer or not. So it is binary classification problem.
Here I chose F1-score as evaluation metrics. The F1 score is a number between 0 and 1 and is the harmonic mean of precision and recall.
As we could see from data exploratory analysis, the final dataset for bogo or discount offers are imbalanced. Only about 25% of the received offers are effective. So if we choose precision, even the model predicts all offers to be ineffective, the model precision will be 0.75. Thus we get to know that the classifier that has an accuracy of 75% is basically worthless for our case. On the other hand, F1 score sort of maintains a balance between the precision and recall for the classifier. If the precision is low, the F1 is low and if the recall is low again your F1 score is low. So F1-score is a better evaluation metrics for our classification problem here.
Data Set
- profile.json: Rewards app users (17000 users x 5 fields)
There are 2175 records of gender and income are nulls. And these customers with empty gender and income all have age as 118. So we need to clean these nulls. In this project, I replace the nulls in gender as ‘unknown’ group. And nulls in income is replaced with ’0'.
Tenure day, Income and age are not normally distributed. Average age for all customers are 62.5 years old. Average income is $ 6,5405.
2. portfolio.json: Offers sent during 30-day test period (10 offers x 6 fields)
3. transcript.json: Event log (306648 events x 4 fields)
Customer action events distribution showed that about 75.7% customers reviewed offers they received, and only 44% of customers completed offers they received, among which include those customers who didn’t review the offers, therefore were not aware of the offers and would buy anyways. So next in the dataset preparation we have to exclude these cases from effective offers.
Preparing Data
I. Data Cleaning:
- Portfolio Data
- One hot encode the ‘channels’ columns.
2. Profile Data
- Transformed the ‘became_member_on’ column to a datetime object and generate tenure_day column.
3. Transcript Data
- Convert the ‘value’ column from list to ‘offer_id’ , ‘amount’ and ‘reward’ columns
II. Received Offer Dataset Labelling
1. Prepare Dataset of effective offers for bogo or discount
Effective Offer Journey:
‘offer received’- ‘offer viewed’ — ‘transaction’ — ‘offer completed’
Ineffective Offer Journey:
1) 'offer received' - 'transaction' - 'offer completed'
2) 'offer received' - 'offer reviewed' - 'transaction'
3) 'offer received' - 'offer reviewed'
4) 'offer received' - 'transaction'
5) 'offer received'1) 'offer received' - 'transaction' - 'offer completed'
2) 'offer received' - 'offer reviewed' - 'transaction'
3) 'offer received' - 'offer reviewed'
4) 'offer received' - 'transaction'
5) 'offer received'
-Using shift to find the transactions after offer viewed and before offer completed
- Get the transactions with effective bogo or discount offers.
- Get the received offer dataset with labelling of ‘completed offer’
2. Prepare Dataset of effective informational offers
Effective offer journey:
‘offer received’- ‘offer viewed’- ‘transaction’ within offer valid time
Ineffective offer journey:
1) 'offer received' - 'transaction'
2) 'offer received' - 'offer reviewed' - 'transaction' after valid time
3) 'offer received' - 'offer reviewed'
4) 'offer received'
- Only get informational offers
- Filter dataset for transactions that occur after an offer is viewed, forward fill offer ids by person
- Find offer id for transactions after offer viewed, forwarded offer_id from the closest offer viewed, not guarantee it is the true offer id, needs to remove those false ones later.
- Find all received informational offers and transactions
- Find the transactions with offerid matching with previous received offer id
- Distinguish effective informational offers
- Label the effective offers as ‘yes’ in the column of ‘effective_offer’ and others as ‘no’
Exploratory Data Analysis
1. Overall distribution exploration
- Distributions of member registration showed two sudden increase of members in the end quarter date of 2015–09 and 2017–09 and one drop of number of registered users in 2018–09.
- Gender distribution showed 15.9% more male registered customers than female registered customers.
- Numbers of received bogo and discount offers are similar and almost twice as much as informational offers. Bogo offers seemed to be reviewed slightly more than discount offers, however the completed offers of bogo are less than discount offers. In total, (completed discount offer/reviewed discount offer) is more than 80%. while only about 64% of bogo offer completed versus reviewed. Here again the completed offers including those are not viewed before the purchase, therefore should be ineffective offers.
2. EDA for received bogo or distributed offers
- Female customers respond better than males with bogo or discount offer. Customers who don’t fill in the sex rarely respond with bogo or discount offer.
- Customers who filled age with 118 rarely respond to discount or bogo offers.
- Senior customers complete offers more than young customers, especially for customers older than 50. Customers who filled with age 118, ‘unknown’ here, rarely complete offers.
- Customers who have higher income has better ratio of completing an offer. Customers with unknown income don’t respond an offer well.
- Customers with longer tenure days respond to an offer better.
3. EDA for informational offers
- About 50% of Male customers respond to an informational offer while Female customers slightly respond less.
- Customers with age under 50 respond to informational offers better. Customers who filled with age 118, ‘unknown’ here, respond to informational offers less often.
- Customers with Low income and Middle income respond to informational offers better
- Customers with longer tenure respond to informational offers better. Especially for those customers who have more than 1250 tenure days.
Modelling
1. Build a model to predict if a customer would complete a bogo or discount offer
- Generate a new feature, Num_of_offer_received, which is the number of offers a customer has received before the current offer
- Fill NA
- Apply MinMaxscaler to numerical variables
- Build Decision Tree and random forest models using f1-score as evaluation metrics
Baseline Decision Tree Model and Performance:
Random Forest Model and Performance:
Inprove Random Forest Model with Grid Search to Fine Tune Parameters. The grid search parameters are:
param_grid={‘max_features’: [‘auto’, ‘sqrt’],
‘max_depth’ : [5,10,15],
‘n_estimators’: [25,50,100],
‘min_samples_split’: [2, 5, 10],
‘min_samples_leaf’: [2, 3, 5]
And the best parameters are:
{'max_depth': 10,
'max_features': 'auto',
'min_samples_leaf': 2,
'min_samples_split': 10,
'n_estimators': 25}
Best Random Forest Tree Model has training f1 score as 0.801 and testing f1 score as 0.794.
Summary of Model performances as shown below. We could clearly see that decision tree model has overfitting because the training score is much higher than the testing score. And random forest model avoid the overfitting by randomly select samples and features.
Feature Importance for Best Random Forest Tree Model:
Analysis of the Starbucks Capstone Challenge customer bogo or discount offer effectiveness suggests that the top five features based on their importance are:
1) Customer income
2) Tenure days
3) Customer age
4) Offer delivering method: Social
5) Customers with unknown gender
2. Build a model to predict if a customer would respond effectively to an informational offer
- Generate a new feature, Num_of_offer_received, which is the number of offers a customer has received before the current offer
- Fill NA
- Apply MinMaxscaler to numerical variables
- Build random forest models
Baseline Random Forest Model and Performance:
Improved Random Forest Model and Performance after Gridsearch.
param_grid={‘max_features’: [‘auto’, ‘sqrt’],
‘max_depth’ : [5,10,15],
‘n_estimators’: [25,50,100, 200],
‘min_samples_split’: [2, 5, 10],
‘min_samples_leaf’: [2, 3, 5]
And the best parameters are:
{'max_depth': 10,
'max_features': 'auto',
'min_samples_leaf': 2,
'min_samples_split': 4,
'n_estimators': 200}
Summary of random forest model performance:
Overall the predicting model for informational offers is worse than the best model for bogo/discount offers. I think one reason is that bogo/discount offers have offer completed records after transaction, so it is more accurate for a specific offer. However for informational offers, we could only assume if a customer reviewed the offer and made a transaction within the offer valid window, then this offer is effective. Maybe the customer just coincidently made a transaction during the time window, but we can’t tell from the dataset provided. More information like specific offers corresponding to specific transactions will help to gain more information and therefore make better conclusions for effective offers.
Feature Importance for Best Random Forest Tree Model:
Analysis of the Starbucks Capstone Challenge customer informational offer effectiveness suggests that the top five features based on their importance are:
1) Tenure days
2) Customer income
3) Customer age
4) Offer received time
5) Number of offers received before current offer
Conclusions & Future Improvement
Conclusions
Overall, this project is challenging, mainly due to the structure of the data in the ‘transcript’ dataset.
Recommendations for company distributing bogo or discount offers:
- Customers who filled in age with 118 or leave gender or income empty rarely respond to offers. Company could avoid to send offers to these customers who tend not to fill up their profiles.
2. Female customers respond better than males with bogo or discount offer. Company could send more bogo or discount offers to female customers.
3. Senior customers complete offers more than young customers, especially for customers older than 50. Company could send more bogo or discount offers to senior customers.
4. Customers who have higher income has better ratio of completing an offer which is surprising. Company could send more bogo or discount offers to higher income customers.
5. Customers who have longer tenure complete an offer more often. Company could definitely reward longer tenured customers more with bogo or discount offers.
6. The top important variables from best random forest tree model also suggest Customer income, Tenure days, Customer age, Offer delivered by Social, Customers with unknown gender are important for effective offers.
Recommendations for company distributing informational offers:
- About 50% of Male customers respond to an informational offer while Female customers slightly respond less often which is different from the response to bogo or discount offers. Company could target informational offers slightly more to male customers.
2. Customers with age under 50 respond to informational offers better. which is opposite to the ones for bogo or discount offers. Company could target informational offers more to customers who are under 50.
3. Customers who filled with age 118, or leave sex and income empty, respond to informational offers less often. which is similar to the bogo or discount offer. Company should send less informational offers to the ones who don’t fill in their profiles.
4. Customers with Low income and Middle income respond to informational offers better. Company could send informational offers to middle income or low income customers.
5. Customers who have longer tenure respond an offer more often. Company could definitely send longer tenured customers more informational offers.
6. The top important variables from best random forest tree model suggest Tenure days, Customer income, Customer age, Offer received time, Number of offers received before current offer are important for an effective offer.
Further Improvements and Experimentation
Overall the model performances are relatively good in terms of predicting whether bogo/discount offers be completed and informational offers be effective. And the predicting model for bogo/discount performs better than the informational offers. My thoughts of improving informational offers’ predictions are to gain more informations to make better judgements for effective offers as stated above in model performance summary.
In the future, I can also do some more experiment on feature engineering to see if any other new features can improve the model. I can also try different ways to deal with imbalanced dataset for bogo/discount offers.
Also, so far the analysis and models are focused more on customers who receive an offer would use the offer or not. I could also build the transaction datasets to predict how much a customer would spend in Starbucks with or without an offer.
In addition, I could also try some unsupervised machine learning such as KNN to group the customers who will be more likely to respond to a given offer.