PYMK at VK: ML over EGO-NETS

10 min readJul 12, 2021

The ability to add users as friends is one of the most important mechanics of any social network. The vast majority of interactions occur between users who are friends with each other. They see and comment on each other’s posts in their news feeds and go to their friends lists to start chats. This is why the growth of the social graph is so important.

My name is Evgeny Zamyatin. I’m part of the Core ML team at VK. I want to tell you about how our recommender system works to bring the users of the largest social network on the Russian internet closer together.

Introduction and Related Works

Modern recommender systems often consist of two levels, and our system is no exception.

The first level is the retrieval part of the system. Its task is to search the entire set of users for the most relevant candidates. This process needs to be done quickly. Usually, these tasks are solved using simple-to-use models, such as matrix factorizations or heuristics based on the number of mutual friends. Then, the candidates received at the first level are sent to the second level, where the model is no longer subject to strict speed limits. Its main task is to ensure maximum prediction accuracy and create the list that will be seen by the user. In this article, we will consider only the first stage — retrieval.

First of all, let’s formulate our problem statement. For each user, we need to find k candidates that they are most likely to add to friends. The metric that we will focus on is recall@k. It’s perfect for this task, as we’re not interested in the order of candidates, but their relevance is important.

Let’s look at the basic, but still relevant, solutions that were invented decades ago. The first method that comes to mind is the most logical one: a heuristic based on the number of mutual friends. For each user, the candidates with the highest value are selected. This approach is simple to implement and provides decent quality results.

Another important method for friend recommendations is Adamic/Adar. It’s also based on common friends analysis, albeit modified: the authors suggest taking into account the number of friends a “mutual” friend has. The larger this value is, the less relevant information it carries.

Recently, our colleagues from Google+ suggested a new approach to friend recommendations based on ego-nets. In their article, the authors proposed clustering users’ ego-nets. As a measure of relevance, they used the value of friendship score, which is the number of mutual friends in the same cluster of ego-nets.

In addition to methods based on the analysis of mutual friends, embedding-based recommendations are quite common. At the VK Artificial Intelligence Laboratory at MIPT, we conducted a study where we compared the effectiveness of different approaches to the task of predicting friendships on VK. The results coincided with our experience. Solutions based on graph embeddings don’t work well for us. With this in mind, we began developing a system for selecting candidates by analyzing mutual friends.

EGOML

The general scheme of our method expands on the ideas of the number of common friends and Adamic/Adar. The final measure of relevance E(u, v), with which we will select candidates, is also decomposed into the sum of the common friends u and v. The key difference is in the form of the summand under the sum. In our case, this is the measure ez_c(u, v).

First, let’s try to understand the “physical” meaning of the measure ez_c(u, v). Imagine that we took user c and asked him: ”How likely is it that two of your friends, u and v, will become friends?” The more information this user takes into account for their prediction, the more accurate it will be. For example, if c can only remember the number of his friends, his reasoning might look like this: “The more friends I have, the less likely it is that two random ones will know each other.” Then the estimate of the “probability” of friendship u and v (from the point of view of c) can look like 1/log(n), where n is the number of friends. This is how Adamic/Adar works. But what if c takes more context into account?

Before answering this question, let’s understand why ez_c(u, v) is important to define via user c. The fact is that in this form, it is very convenient to solve the problem in a distributed manner. Imagine that we sent out a questionnaire to all users of the platform asking them to evaluate the probability that each pair of their friends are friends with each other. After getting all the answers, we can substitute the values in the formula E(u, v). This is what the calculation of E(u, v) looks like using MapReduce:

Preparation. For each c, the context that it will take into account for making estimates is allocated. For example, in Adamic/Adar it will just be a list of friends.
Map. “Ask” each c what they think about the possibility of friendship for each pair of their friends. We calculate ez_c(u, v) and store it as (u, v) → ez_c(u, v) for all u, v in N(c). In the case of Adamic/Adar: (u, v) → 1 / log|N(c)|.
Reduce. For each pair (u, v), we sum all the corresponding values. There will be exactly as many of them as u and v have common friends.

Thus, we get all non-zero values of E(u, v). Note: the necessary condition for E(u, v) > 0 is the existence of at least one common friend of u and v.

Context of user c in the case of measure ez_c will be the same list of friends but supplemented with information about the relationships within this list. The scientific term for such a structure is “ego-net”. More formally, the ego-net of a vertex x is a subgraph of the original graph whose vertices are all the neighbors of x and x itself, and whose edges are all the edges of the original graph between these vertices.

The key idea of the ez_c measure is that it can be made learnable. For each user с, its ego-net, and all pairs of users u, v inside it, we can count many different features. For example:

the number of mutual friends u and v inside ego-graph c
number of mutual friends u and c
intensity of interactions between v and c
the time elapsed since the last friendship between u and someone from ego-graph c
the density of ego graph c
and others

This way, we will get a dataset with features. But we also need labels for training. Consider the dataset constructed from the state of the graph at time T. Then, as positive examples, let’s take those pairs of users who were not friends at the time of T, but became friends by T + △T. And as negative, all the other pairs of users who aren’t friends. Note: since we are solving the problem of predicting new friendships, those pairs of users who are already friends at time T don’t need to be taken into account either in training or in practice.

In the end, we get the following dataset:

for each pair of users u and v, as well as their common friend c, the features are calculated inside ego-net c
a couple of u and v users meet in the dataset exactly as many times as they have mutual friends
all user pairs in the dataset are not friends at time T
for each pair of u and v a label is 1 if they became friends during a time △T starting from T, and 0 otherwise

We will use this dataset to train our measure ez_c. As a model, we chose gradient boosting with a pairwise loss function, where the group ID is user u.

Essentially, the measure ez_c (u, v) is defined as a prediction of the model described above. But there is one caveat: with pairwise training, the distribution of model predictions is similar to normal. Therefore, if we take the “raw” prediction as the definition of the measure ez_c(u, v), we may have a situation where we will penalize the final measure E(u, v) for mutual friends since the values of the predictions are negative. This doesn’t quite make sense, as we don’t want the measure E(u, v) to decrease with the increase in the number of mutual friends. So on top of the model prediction, we decided to take the exponent:

This approach works well on small graphs. But to apply it to real data, we need to perform one more action. The essence of the problem is this: we can’t calculate the features and apply the model for each pair of users of all ego-nets, as it would take too much time. To solve this, we came up with a special trick. Let’s imagine that our gradient boosting is trained in such a way that each tree uses the attributes of only one user: either u or v. Then we could divide the entire ensemble into two groups: to group A, we would assign trees that use only the attributes of user u, to B, user v. The prediction of such a model can be represented as:

With such a model, we could get predictions for all pairs of users of the same ego-net faster. All we have to do is apply models A and B for each user, and then add up the predictions corresponding to the pairs. Thus, for an ego-net of n vertices, we could reduce the number of applications of the model from O(n²) to O(n).

But how do we get a model in which each tree depends only on one user? This is how:

Exclude all the features that simultaneously depend on both u and v from the dataset. For example, the attribute “the number of mutual friends u and v inside the ego graph c” will have to be removed.
Train model A using only the features based on u, c, and ego-net c.
To train model B, leave only the features based on v, c, and ego-net c. Pass the predictions of model A as base predictions.

If we combine models A and B, we get what we need: the first part uses features u, the second uses features v. The set of models is meaningful because B has been trained to “fix” A’s predictions. This optimization allows you to speed up calculations hundreds of times and makes the approach applicable in practice. The final result of ez_c(u, v) and E(u, v) looks like this:

Calculating E(u, v) Online

Note that E(u, v) can be represented as:

This formula is a scalar product of sparse vectors whose indexes are the users and whose values are the exponents of the model predictions. The non-zero values here are only for friends of u — basically, they are just lists of friends with additional values.

When building recommendations, we have already calculated the model predictions for all existing friendships. Therefore, for each user, we can collect the vectors and put them into the available online key-value storage. After that, we can get the value of E(u, v) for any pair of users online by a simple operation of multiplying vectors. This makes it possible to use E(u, v) as a light relevance function in high-load parts of the system or as an additional feature of the final ranking model.

Conclusion

As a result, the EGOML system allows you to:

Select candidates for each user offline in distributed settings. The asymptotic complexity of the optimized algorithm is O(|E|) of feature calculations and model applications, where |E| is the number of connections in the graph.
Quickly calculate the measure of relevance E(u, v) for any pair of users online. The asymptotic complexity of the operation is O(|N(u)| + |N(v)|).
Improve the quality of recommendations by expanding the number of considered graphs (friendships, hidden recommendations, sent messages, and other graphs) and adding various labels to edges and vertices, such as the intensity of interactions on the edge, the date of the edge formation, the user’s city or their place of work or study.

At VK, we moved from the Adamic/Adar method of candidate selection to the EGOML system and introduced the features based on the measure E(u, v) into the model. This allowed us to increase the number of confirmed friendships on the entire platform by several tens of percent.

Acknowledgment

I would like to thank the head of the Core ML team, Andrey Yakushev, for his help in developing this method and preparing this article as well as the entire Core ML team for their support at various stages of this work.