Product & technology

Membership Inference as a Model Training Metric

All industries
No items found.
Boat in the water moving past other standing boats
Written by
Nikolas Molyndris
David Sturzenegger
Published on
March 13, 2020

The question of data privacy in machine learning is still widely overlooked. In this article, we propose a new machine learning training metric that greatly reduces the risk of data re-identification. It can be embedded in any model training with minimal cost.

Recommended reading

[The research for this blog was done by Nil Adell Mill. Editing and contextualizing by Nikolas Molyndris and David Sturzenegger]


The question of data privacy in machine learning is still widely overlooked. In cases where the privacy loss comes from leakage of a machine learning model the topic gets quickly very convoluted, adding to the overall mistrust of these still misunderstood systems [11]. Bringing trust into these processes requires unwinding the complexity with safe and simple solutions. In this blog, we propose a new machine learning training metric that greatly reduces the risk of data re-identification. It can be embedded in any model training with minimal cost.

Data leakage may come in many forms, from the retrieval of particular attributes [1] to the reversing of the original training samples [2]. This gains even further relevance when we evaluate systems of Machine Learning-as-a-Service (MLaaS), like those offered by Google or Amazon, where users may not be experts and have only limited access to the entire underlying pipeline used to train the models. In fact, a recently published article [3] presented an approach that allows someone to tell, in an MLaaS trained model, if a given image was part of the dataset used to train the model or not. More strikingly the said method does not require full access to the model but only access to its inputs and outputs once it has already been released. A method known as black-box access.

This particular type of attack, known as membership inference (MI), is regarded as the easiest kind of attack to implement, which makes it a great thing to test. In the case of the very popular risk for face image reconstruction membership inference is a crucial first step. If you cannot even tell if an image was used to train a model or not, how can you even start reconstructing the face?

It is yet not clearly understood where, when and why data leakage happens in machine learning models. While a significant amount of time has been spent on creating these "attacks" and showing the insecurity of the models, There very little research on how a data scientist can defend against these attacks,  while it is an increasingly important topic as more important decisions are being delegated to these models and more sensitive data is being fed into them. Among industry practitioners, it is seen as logical that one of the main causes of data leakage is overfitting. Unsurprisingly, the more a model memorizes a dataset, the easier it is for it to leak data. Previous research [4] has studied and observed this relationship1, while, at the same time, pointing out other seeming causes like model capacity. The size and distribution of the dataset is another important factor, as a rule of thumb, the more data per-label one has, the less likely it is for a particular piece of data to leak. The type of problem that the model is designed to solve also plays a big role. For example, it has been demonstrated that text predictive models like Gmail’s may leak data such as credit card or social security numbers [5].

At the moment there is still not a simple way to assess the leaking and it is a widely overlooked aspect of machine learning models. We propose a new metric for training machine learning models that tracks the risk of membership inference, or simply MI-metric. The metric enables data scientists to rank the models according to how probable they are to leak data. In the same way one monitors validation accuracy during training, the MI-metric will track the leakage risk.

Our proposition is a second model – an attack model - that performs at every epoch membership inference attacks on the primary model – the main model. As the leakage risk is most of the time associated with potential adversarial attacks in a black-box situation, the attack model has access to the current epoch’s predictions of the main model. The MI-metric is the accuracy of the attack model.

This evaluates how successful an attacker could be before we expose our model to the world. The underlying philosophy is comparable to doing penetration testing in software. By analyzing membership inference, we test against one of the easiest types of attacks that can be performed on a model — if you can't verify that a sample was in the training dataset there's nothing else you can really extract about that sample.

This approach has the added benefit of recreating a stronger attack than a malicious party would, since – as we will show - the attack model knows the original dataset and how it's partitioned. The attack model has full knowledge of the ground truth, while an attacker would have to come up with a proxy — a shadow model [3,6] — against which to train their membership inference model.

The algorithm

We propose Algorithm 1 that combines the normal model training with a per-epoch evaluation of the MI-metric, measuring the susceptibility to membership inference attacks. First, from the training and validation data we generate a new membership inference dataset. The samples coming from the training set will be labeled as a members (1’s) and those from the validation set as non-members (0's). This MI dataset is then randomly divided into MI training and testing datasets. At every epoch, we will pass the samples from the MI dataset through the main model, collect the predictions from this data, sort them2, and train the attack model with these values as inputs and the membership as labels.

Once trained, the MI-metric is defined as the accuracy of this attack model. It gives an idea to what extent a model could reveal that a sample was used during training or not.

Our implemented algorithm

So, what to use as an attacker model?  Before answering that question, we have to choose a dataset, and a main model that will be used to defend against leakage.

Regarding the dataset, we focused on analyzing the results of different main models applied to the SVHN, CIFAR-10, and CIFAR-100 datasets. We will mainly focus on the CIFAR-100 dataset because it has a relatively low number of images per label, which makes it more likely for a model to memorize data from a dataset.

In respect to the main model, we trained three different models: Mobilenet [7], Resnet50 [8], and VGG16 [9] with different optimizers. Regarding the hyperparameters, we focused on exploring the learning rate and different values of weight decay. The latter has usually been pointed out in the membership inference literature [3,6, 10] as a possible solution for avoiding data leakage. On top of that we had a stepped learning rate decrease (a reduction of 80% at particular epochs). The easier to train models, like Mobilenet, were explored more in-depth due to their ease of training.

Finally, for our attack model, a large repertoire of models can be used. For instance, we tested two particular models: a linear classifier and gradient boosted trees (XGBoost). Albeit the linear classifier obtained qualitatively similar results to XGBoost, the latter consistently showed superior prediction accuracy so we will limit ourselves to report this model's results. In addition, selecting attacker model is a relatively cheap task compared to the whole training pipeline, hence it can be a separate task with virtually no added computational cost.


When observing the results on all the datasets (as shown in figure 1) one can qualitatively assess the apparent correlation between overfitting and membership inference accuracy. Often when the performance on the test set stops improving, the MI-metric starts increasing. This relationship to overfitting has been observed repeatedly in previous works [3,4]. Nevertheless, overfitting cannot always solely explain the propensity for a model to leak data [4]. Some of our results demonstrate this phenomenon.

Figure 1. Performance of Resnet 50 under different values of weight decay. Interestingly enough, in this particular case, weight decay appeared to be proportional to the amount of data that was leaked. The optimizer was SGD and fixed the learning rate at 0.01.

The maximum MI accuracy we were able to achieve was 68.8%. Although this is an improvement over flipping a coin, it is far from a great predictor for membership. While considering that we made an ad-hoc choice of attack model, this still indicates that even in the worst-case membership inference attacks may not be easy to pull off.

There is a significant spread of the MI-metric across the different results (Tables 1-3) which does not always go hand-in-hand with the accuracy of the model. The Mobilenet results for ADAM with a learning rate of 0.001 in Table 1 show this, where at increasing values of the weight decay the accuracy of the model improves while the MI-metric decreases (from 61.3% to 65.4% for the former, and from 62.8% to 56.1% for the latter).

The opposite effect can be found too. In models trained with SGD (Mobilenet in Table 1, Resnet50 in Figure 1), weight decay makes the model perform better while also increasing the MI-metric. That can be graphically seen in Figure 1, where the weight decay value seems to be proportional to the MI-metric. The same observation, however, does not seem to hold when ADAM is used as an optimizer.


SGD 0.1 0.00E+00 60.72 59.03
SGD 0.1 5.00E-05 63.28 62.8
SGD 0.01 0.00E+00 55.73 65.13
ADAM 0.01 1.00E-03 65.95 68.8
ADAM 0.01 0.00E+00 63.84 55.18
ADAM 0.01 5.00E-05 59.28 50.53
ADAM 0.01 1.00E-03 24.51 50.9
ADAM 0.001 5.00E-05 61.31 62.8
ADAM 0.001 1.00E-04 62.43 60.83
ADAM 0.001 1.00E-03 65.48 56.07

Table 1. Mobilenet results


SGD 0.01 0.00E+00 73.53 55.97
SGD 0.01 5.00E-05 73.3 56.62
SGD 0.01 5.00E-04 77.64 62.12
SGD 0.01 5.00E-03 78.6 66.26
ADAM 0.01 0.00E+00 68.05 55.48
ADAM 0.01 5.00E-05 62.12 48.93
ADAM 0.01 5.00E-04 52.88 50.45
ADAM 0.01 1.00E-03 51.54 49.63
ADAM 0.001 5.00E-04 69.19 53
RMSprop 0.01 0.00E+00 66.69 52.77
RMSprop 0.01 5.00E-04 46.26 49.23
RMSprop 0.01 5.00E-02 20.21 50.1
AdaDelta 0.01 0.00E+00 57.14 55.56
AdaDelta 0.01 5.00E-04 56.94 56.27

Table 2. Resnet50 results


SGD 0.1 1.00E-05 68.24 67.21
SGD 0.1 5.00E-05 69.61 66.25
SGD 0.1 1.00E-04 69.77 65.81
SGD 0.01 5.00E-05 69.04 60.83
ADAM 0.01 5.00E-05 40.18 49.78

Table 3. VGG16 results


There is a growing concern for the capacity of deep learning models to memorize and leak data. Previous work has demonstrated successful membership inference attacks on models trained on MLaaS platforms. This work introduced a metric that monitors potential model data leakage. The metric can be reported alongside other standard performance metrics during training. The proposed metric uses membership inference accuracy of an attack model as a proxy. Computing this metric comes at a very low computational cost due to the overall simplicity of its execution. The proposed approach is model and framework agnostic - it can be used in any training situation.

Our results were mostly in line with those previously reported in the literature. Nevertheless, there were situations where hyperparameter configurations yielded unexpected results, e.g. higher membership inference scores. This motivates further research into the underlying reasons for data leakage. It also justifies a direct empirical measure of the model’s susceptibility to membership inference attacks, such as the MI-metric proposed in this paper.

At decentriq we want to enable machine learning applications without worrying about the sensitivity of the underlying data. Making sure that the deployed algorithms protect this data is a significant step towards this.


  1. It is of interest to mention also the fact that deep learning models have been reported to first learn patterns and structures of the data. Then, if further trained, they start fitting random noise, which is akin to memorizing the data they are being given.
  2. In general, we only care about the shape of the output, so the actual label that the model is outputting is not so important. Sorting allows the features to be ordered in a consistent manner. Despite this, other authors have pointed out that there may be differences on data leakage depending on the class (for instance in cases where there are unbalanced classes), our method would not regard that information for its classification.
  3. Whenever we refer to SGD the actual optimizer used is stochastic gradient descent with nesterov momentum (of 0.9)


[1] Oh, Seong Joon, Bernt Schiele, and Mario Fritz. "Towards reverse-engineering black-box neural networks." Explainable AI: Interpreting, Explaining and Visualizing Deep Learning. Springer, Cham, 2019. 121-144.

[2] Fredrikson, Matthew, et al. "Privacy in pharmacogenetics: An end-to-end case study of personalized warfarin dosing." 23rd Security Symposium (Security 14). 2014.

[3] Shokri, Reza, et al. "Membership inference attacks against machine learning models." 2017 IEEE Symposium on Security and Privacy (SP). IEEE, 2017.

[4] Yeom, Samuel, et al. "Privacy risk in machine learning: Analyzing the connection to overfitting." 2018 IEEE 31st Computer Security Foundations Symposium (CSF). IEEE, 2018.

[5] Carlini, Nicholas, et al. "The secret sharer: Measuring unintended neural network memorization & extracting secrets." arXiv preprint arXiv:1802.08232 (2018).

[6] Salem, Ahmed, et al. "Ml-leaks: Model and data-independent membership inference attacks and defenses on machine learning models." arXiv preprint arXiv:1806.01246 (2018).

[7] Howard, Andrew G., et al. "Mobilenets: Efficient convolutional neural networks for mobile vision applications." arXiv preprint arXiv:1704.04861 (2017).

[8] He, Kaiming, et al. "Deep residual learning for image recognition." Proceedings of the IEEE conference on computer vision and pattern recognition. 2016.

[9] Simonyan, Karen, and Andrew Zisserman. "Very deep convolutional networks for large-scale image recognition." arXiv preprint arXiv:1409.1556 (2014).

[10] Sablayrolles, Alexandre, et al. "D\'ej\a Vu: an empirical evaluation of the memorization properties of ConvNets." arXiv preprint arXiv:1809.06396 (2018).

[11] Chakraborty, Supriyo, et al. "Interpretability of deep learning models: a survey of results." 2017 IEEE SmartWorld, Ubiquitous Intelligence & Computing, Advanced & Trusted Computed, Scalable Computing & Communications, Cloud & Big Data Computing, Internet of People and Smart City Innovation. IEEE, 2017.

Recommended reading

Related content

Subscribe to Decentriq

Stay connected with Decentriq. Receive email notifications about industry news and product updates.