Multi-Label Classification

Evaluation Metrics for Multi-Label Classification

A survey of evaluation metrics for training a multi-label classifier in python

Pritish Jadhav
DataDrivenInvestor
Published in
5 min readMar 14, 2021

--

  • In a traditional classification problem formulation, classes are mutually exclusive. In other words, under the condition of mutual exclusivity, each training example can belong only to one class.
  • In such cases, classification errors occur due to overlapping classes in the feature space.
  • However, often we encounter tasks where a data point can belong to multiple classes.
  • In such cases, we pivot the traditional classification problem formulation to a MultiLabel Classification framework where we assume each label to be a Bernoulli random variable representing a different classification task.
https://gombru.github.io/2018/05/23/cross_entropy_loss/

From an implementation standpoint -

  1. We use the sigmoid activation function in the final layer instead of using a softmax activation.
  2. We use binary cross-entropy loss instead of categorical cross-entropy.

There are numerous blogs out there with details on how to train a multi-label classifier using popular frameworks (Sklearn, Keras, Tensorflow, PyTorch, etc). However, evaluating the performance of any machine learning algorithm is a critical piece of the puzzle.

In this blog post, we would focus on different evaluation metrics that can be used for evaluating the performance of a multilabel classifier. The evaluation metrics for multi-label classification can be broadly classified into two categories —

  • Example-Based Evaluation Metrics
  • Label Based Evaluation Metrics.

Example-Based Evaluation Metrics

The example-based evaluation metrics are designed to compute the average difference between the true labels and the predicted labels for each training data point, averaged over all the training examples in the dataset.

1. Exact Match Ratio (EMR)

  • The Exact Match Ratio evaluation metric extends the concept of accuracy from the single-label classification problem to a…

--

--