Machine Learning System Design: Recommendation System for Restaurants

Louis Wang

Published in

DataDrivenInvestor

7 min readMar 22, 2021

TL;DR: the article shows a framework to solve a ML System design Case Study, and we use UberEats Recommendation as an example to show the thinking process.

This is a Machine Learning System Design case study: how do You design the recommendation system For UberEats?

The framework for the ML System Design can be divided into several sections as follows:

Goals and Objectives
Requirements
High-Level Design
Evaluation
Further Improvements

The very first step is to define the goal and objectives. The goal can be both long term and short term based, and if there’s a conflict between them, it’s always a bonus to discuss the tradeoff. As for objectives, keep in mind the it is possible to have multiple objectives, and if that is the case, communicate with business partners, stakeholders and product managers can give you a clearer direction on where to optimize. Multi-Objective Optimization is a technique to find the optimal solution given different constrains and objectives.

Goal

For this case study, we have both long term goal and short term goal.

Long term: improve eater satisfaction and eater engagement; increase the revenue for the company
Short term: increase number of orders at Uber eats; increase time and frequency of app use in a week/month;

Objectives

UberEats is a three-sided marketplace: Eaters, restuarants and drivers who deliver the food. When we consider the objectives, we should take Care of each of them, or discuss with the team to set the priority.

Here are some examples for the objectives:

increase the click-through-rate (CTR) in the app
maintain the diversity: not just all Chinese restaurants
dishes are arrived on time (predict the probability of being delayed)
others (working with stakeholders and business partners):

o increase session length

o order frequency

o conversion rate

Note that if we have multiple objectives, we can use multi-objective-optimization as a loss function for our model.

Requirements

The second part is the requirements for both business part and engineering part. This is important because our design depends on how the system will be used by others. For example, a system for new users can be very different from one for loyal users, or a system for millions of users can be much harder than a system for few thousands users.

For the product side, we can consider:

Batch prediction or real-time prediction, or hybrid of them?
Any specific customer segmentation, or it applys for all users.
Do we need to solve the Cold star problem for the recommendation system?

For Engineering considerations, we mainly focus on scalability, accuracy, reliability and robustness of the machine learning system.

How many users/items to deal with
what are the peak number of requests per second/day?
service-level agreement: we will incorporate User actions in To recommendation Within X seconds/minutes.

We can make some assumptions and do some calculates to get the Query-Per-Second as well. This can influence our decisions when we decide which type of database to store our data.

High-Level Design

So far we have a clear goal to work on and requirements to consider when we design the system, next we can discuss about the overall design solution of The system. We know that life is a tons of tradeoff, so is the system. We can build an end-to-end model to match user preference with the restaurants, and If we want To fully capture the relationship between users and restaurants, we need A Complicated model. However, given the population of users, if we deploy such a complex model into production, we will sacrifice some latency. This is Not good for user experience.

The Better solution is we can design a multi-layer framework: Doing the complicated jobs offline and making predictions in real time. We can generate the possible candidates from all the Restaurants first, filter out the Irrelevant restaurants, possibly by location and bad ratings. Then We build a model to match the users with the candidate restaurants.

Therefor, the solution can be three stages: candidate generation, ranking, and post-processing.

Candidate Generation

First, candidate generation. The purpose of this stage is to get a promising candidate set, in other words, dishes and restaurants in a scalable fashion. Given the large amount of data, we need a simple model in this step. We can use the rule-based model here with location. If the distance between the eaters and restaurant is over 50 miles, we filter them out. We can also consider a logistic regression model with regularization, because the training is fast and LR is also explainable. Regularization can avoid overfitting. We can predict the probability that the eaters like the candidate restaurants. We use logistic regression models due to their high efficiency, which is a trade-off of accuracy and latency.

Speaking of training data, we can think from three perspectives:

User profile: age, gender, language, device, payment method, etc.
User behavior: the cuisine of past order in one week/month, average dollar amount per order, number of time open the app past week/month, etc.
Restaurants metadata: cuisine, the reviews, average dollar amount per order, title embedding of top k popular items
- Interaction data: user’s history order in this restaurant in the past week/month, distance between this user and the restaurants, the time of the order, etc.

The positive data can be from the clicks from the user activity logs. For negative samples, we can randomly select some unclicked reataurants in the city or nearby cities given user’s location. If necessary, we need to deal with the unbalanced datasets: upsampling, Downsamping, Or SMOTE.

It is worth noting that to generate the training data for candidate generation stage, we should capture all the patterns, like weekdays and weekends, daytime and nightime. And to split the data into training and test sets, the test set should be from the succeeding time window after the training data because we use history to predict the future.

Ranking

The purpose is to sort the outputs from the candidate generation stage for each user and make top K recommendations.

Since the number of candidates is much smaller, we can use more complicated model to utilize more personalized signals and capture the nonlinear and complex relations between eaters and restaurants. We can use Multi-layer perceptron (MLP). We can use the cross-entropy, or BayesIan Personalized Ranking (BPR) as the loss function, use Adam as optimizer, and use grid search or Bayesian optimization to optimize the hyperparameters.

For the trainig data, besides the data we use in the first stage, we can add contextual or real-time features into the model. Real time features can be time of the order, upcoming holidays, real-time location, current coupons the user has, age of the restaurants (new or old), real-time popularity of the restaurants. Etc.

Post-Processing

This step is important if we want to keep improving the user experience with Some special situations. We can Focus on diversitying the recommendations, reranking The items in the menu, or considering The current traffic for the recommended restaurants in case the orders greayer than The capacity that The restaurants can handle.

Diviersity

Reduce the score if two high ranked recommendation are the same cuisine: AABCD ->ABCDA

Menu ReRanking

We can make personalized recommendation within the restaurants. We can rerank the items based on past order history.

Restautant capacity

If we detect the restaurant accept many orders than a capacity threshold, we can lower the rank for that restaurants because the huge amount of orders can cause longer preparation time and bad experience for eaters.

Evaluation

So far we have a workable system, but how can we make sure it is successful and can we make decision to deploy it into production? It is critical to choose the right metrics to optimize the system, and even dIfferent optimization objectives in different stages. We should design both oFfline experiement and online A/B testing, and from the metrics we can find where to improve.

Through The offline experiements, we can test the algorithms without deploying it, saving us when we have bad models and if we deploy them, user can get bad results. For the candidate generation stage we should improve the Recall, while for ranking part we should improve precision.

For online A/B testing, we split the traffict Into control group and test group with equal impressions. We can decide significant level, power, minimum DEtectable difference, and calculate the minimum sample size to make the comparision statistically significant.

Improvements (data, algorithms, infra, hardware, etc)

We can still improve the ML system: Data, Algorithms, Infrastructure, hardware, etc.

1. Use more complicated models for online prediction: graph neural network with attention to aggregate information from the neighborhood nodes

2. Online parameter tuning: retrain the model with the real-time feedbacks every hour or every day (online learning instead of offline learning)

3. Position bias: use position as a feature in training. Set it as 0 when serving.

4. Use multi-task learning to optimize multiple objectives at the same time

5. Clean up the noisy data (with deeply data analysis, user segmentation)

6. Hardware: more GPU…

You can connect with me in LinkedIn and give the article claps if You think the article is useful!