Distribution of movie ratings in the dataset
Distribution of movie ratings in the dataset

Learning to Rank for Recommender Systems: A Practical Guide with XGBoost

As content creators at learns.edu.vn and experts in Education, we delve into the practical applications of machine learning in enhancing user experience. Learning to Rank (LTR) is a powerful technique increasingly utilized in both Search Engines and Recommender Systems to deliver more relevant results. The core objective of ranking is to present items in an order that maximizes user satisfaction and engagement. In this guide, we will explore how to leverage the popular XGBoost library for building effective movie recommendation systems using Learning to Rank methodologies.

When I first encountered Learning to Rank, a fundamental question arose: how does it differ from traditional machine learning paradigms? In conventional machine learning tasks like classification or regression, each instance is associated with a single target class or value. For instance, in customer churn prediction, each customer record has features and a corresponding churn/no-churn label. The model predicts a class or probability for each individual customer. However, Learning to Rank operates differently. Instead of predicting a single value for each instance, it deals with lists of items. For each instance (e.g., a user), there are multiple items, and the goal is to learn a model that can optimally order these items based on their relevance. In a recommendation context, given a user’s past interactions, the aim is to develop a model that predicts the best possible ordering of items (like movies) for that user.

Let’s move into the practical coding aspect. For simplicity and clarity, we’ll use the MovieLens small dataset¹. You can download this dataset from the official MovieLens website.

MovieLens Latest Datasets

First, let’s load the dataset and perform some essential preprocessing steps to get it ready for our learning to rank task.

The MovieLens dataset provides a rich collection of user interactions, containing 100,000 ratings and 3,600 tag applications applied to 9,000 movies by 600 users. This scale is perfect for demonstrating the principles of Learning to Rank without overwhelming complexity.

Let’s examine the distribution of ratings within the dataset to understand user preferences.

Analyzing these distributions, we can derive insightful features. Time-based and day-based features can capture temporal patterns in user behavior. For each movie, we can aggregate user interactions, such as the total count of users who interacted with it, and the breakdown of ratings (5-star, 4-star, 3-star, 2-star, and 1-star reviews). Additionally, we can incorporate features like the number of reviews received daily and the number of reviews received during specific times of the day (e.g., after 5 PM) to capture trends in user engagement.

Next, we need to split our dataset into training and testing sets. A common approach for time-series data or interaction data is to use a temporal split, where past interactions are used for training, and more recent interactions are used to evaluate the model’s performance on unseen data. This simulates a real-world scenario where we are predicting future preferences based on past behavior.

Now, let’s prepare the input data for our Learning to Rank model. Unlike traditional supervised learning models, ranking models require additional information about the grouping of instances. This grouping is crucial because Learning to Rank algorithms optimize the order of items within each group. In our case, each user represents a group, and the items are the movies they have interacted with.

To build our model, we will use XGBoost, specifically the XGBRanker. Let’s examine the .fit() method of XGBRanker to understand the required parameters. The documentation for XGBRanker().fit() highlights the importance of the group parameter:

Signature: model.fit(X, y, group, sample_weight=None, eval_set=None, sample_weight_eval_set=None, eval_group=None, eval_metric=None, early_stopping_rounds=None, verbose=False, xgb_model=None, callbacks=None)
Docstring: Fit the gradient boosting model

Parameters:

X : array_like Feature matrix
y : array_like Labels
group : array_like group size of training data
sample_weight : array_like group weights
.. note:: Weights are per-group for ranking tasks
In ranking task, one weight is assigned to each group (not each data point). This is because we only care about the relative ordering of data points within each group, so it doesn’t make sense to assign weights to individual data points.

As indicated in the documentation, the group parameter is essential for Learning to Rank models. It specifies the size of each group in the training and testing data. Understanding how to construct this group array is often a point of confusion for those new to Learning to Rank.

In essence, the group parameter is an array that indicates the number of items associated with each user (or group). For example, if user 1 interacted with 2 movies, user 2 with 1 movie, and user 3 with 4 movies, the group array would be [2, 1, 4]. The length of the group array should be equal to the number of unique users, and the sum of the elements in the group array should equal the total number of interactions in the dataset.

Let’s prepare our model inputs, including the feature matrix (X), the relevance labels (y), and the group information. We can use code similar to the example provided in the original article to generate these inputs.

With the training and testing inputs prepared, we are ready to train and evaluate our Learning to Rank model. Before we proceed with model training, it’s crucial to discuss the evaluation metrics commonly used for recommender systems.

Evaluating the performance of a recommender system requires metrics that go beyond standard classification or regression metrics. For ranking tasks, the order of recommendations is paramount. Two of the most widely used evaluation metrics in recommendation and Learning to Rank are Normalized Discounted Cumulative Gain (NDCG) and Mean Average Precision (MAP). In this guide, we will focus on NDCG as our primary evaluation metric.

NDCG is an enhancement of Cumulative Gain (CG). CG simply sums up the relevance scores of recommended items. However, CG does not consider the order of recommendations. In real-world scenarios, presenting highly relevant items at the top of the recommendation list is crucial for user satisfaction. Discounted Cumulative Gain (DCG) addresses this by introducing a discount factor that penalizes relevant items appearing lower in the ranking. The discount is typically logarithmic, reducing the contribution of items further down the list.

However, DCG can be sensitive to the number of items a user interacts with. Normalized Discounted Cumulative Gain (NDCG) normalizes the DCG score by dividing it by the Ideal DCG (IDCG). IDCG is the DCG score of the ideal ranking, where items are ordered by their relevance in descending order. This normalization allows for fair comparisons of ranking quality across users with varying numbers of interactions.

Now, let’s proceed to train our XGBRanker model and generate predictions.

After training, we can generate predictions on the test set. These predictions will represent the ranking scores assigned by the model to each movie for each user.

In addition to NDCG, another valuable metric for evaluating recommender systems is coverage. Coverage measures the percentage of items in the training set that are also present in the test set recommendations. A high coverage indicates that the model is recommending a diverse set of items and not just focusing on a small subset of popular items. In some cases, models might prioritize popular items to maximize NDCG or MAP@k, but this can lead to a lack of personalization and discovery of less popular but potentially relevant items. Monitoring coverage can help identify and address such issues. In the original example, a coverage of around 2% suggests room for improvement, indicating that the model could be further refined to recommend a broader range of movies.

Finally, we can analyze feature importance to understand which features are most influential in the model’s ranking decisions.

Conclusion

In this guide, we have explored the fundamentals of Learning To Rank For Recommender Systems. We differentiated LTR from traditional machine learning, demonstrated how to model ranking problems using XGBoost, and discussed key evaluation metrics like NDCG and coverage. While we used a movie recommendation example, the principles and techniques discussed are broadly applicable to various ranking problems across different domains. Learning to Rank offers a powerful approach to building more effective and user-centric recommender systems.

References

¹ Harper, F. Maxwell, and Joseph A. Konstan. 2015. The MovieLens Datasets: History and Context. ACM Transactions on Interactive Intelligent Systems (TiiS) 5, 4: 19:1–19:19. https://doi.org/10.1145/2827872

Thank you for reading this guide. For further exploration, you can access the Colab notebook associated with this article via this link.

Connect with me on LinkedIn.

Comments

No comments yet. Why don’t you start the discussion?

Leave a Reply

Your email address will not be published. Required fields are marked *