Facial recognition and neural networks to enhance images

Raffaella Aghemo
DataDrivenInvestor
Published in
5 min readMay 9, 2024

--

by Raffaella Aghemo

We are in the final stages of the final approval of the world’s first piece of legislation on Artificial Intelligence, the European Artificial Intelligence Act, which will attempt to balance respect for man and democracy with an appropriate but safe push towards technological innovation.

We also know that in this totally and horizontally risk-oriented path, one of the major concerns will focus on facial recognition systems, which although banned in real time, can still be legitimised in post-earthquake by a judicial or administrative authority.

Two researchers at the University of California, Berkeley, Justin Norman and Hany Farid have compiled a study, transcribed in a paper entitled ‘An Investigation into the Impact of AI-Powered Image Enhancement on Forensic Facial Recognition’, to investigate if and when advances in neural-based image enhancement and restoration can be used to restore degraded images, but preserving facial identity, in forensic facial recognition use. We have read about a number of cases of mistaken identity and errors in facial recognition in the course of criminal investigations, so this study focuses on the methods, and especially the errors, these algorithmic systems can incur.

The introduction reads: ‘Although automatic facial recognition has its roots in the mid-1960s, it is only recently that the accuracy of facial recognition has reached levels that allow for its credible use in real-world forensic contexts; albeit, not without concerns, regarding human rights violations, privacy and bias. It has been claimed that automatic facial recognition is as accurate or more accurate than human-level recognition.’

The earliest reported experiments actually date back to the years between 1964 and 1966, when Woodrow W. Bledsoe, together with Helen Chan and Charles Bisson of Panoramica Research, began researching and studying computer programming for the recognition of human faces; but very little documentation has been found, as this was research financed by an anonymous intelligence agency. Bledsoe explained the difficulties encountered in this research in these terms: ‘This recognition problem is made difficult by the great variability in head rotation and tilt, light intensity and angle, facial expression, ageing, etc.’. Some other attempts at facial recognition by machine have allowed little or no variability in these quantities. Yet the method of correlation (or pattern matching) of raw optical data, often used by some researchers, will certainly fail in cases where the variability is large. In particular, the correlation is very low between two images of the same person with two different head rotations.’

These advances in automatic facial recognition have been largely fuelled by advances in machine learning, along with access to ever larger and more diverse datasets, fuelling a revolution in image enhancement in which noisy, low-resolution or blurry images can be seemingly miraculously restored to their high-resolution, high-quality originals.

But is this really the case?

This study makes use of two large and diverse face sets, two popular deep learning facial recognition systems and 12 different image enhancement techniques based on GAN and diffusion techniques.

For facial enhancement, two typical image enhancements that a forensic analyst might use are: either super-resolution, in which a low-resolution image is sampled at a higher resolution, restoring detail to the original image; or deblurring, in which optical or motion blurring is removed from an image.

The former, super-resolution, uses several neuron-based techniques; techniques that span a range of different underlying mechanisms, from generative adversarial networks (GANs) to convolutional neural networks (CNNs), transformers and combinations of all three, as does the latter, albeit with different approaches.

In the first case, among the techniques used, LDM, Latent Diffusion Models, are quite effective in a variety of image restoration tasks. The general approach of diffusion image models is to exploit denoising autocoders to segment image formation into sequential and progressive steps. This process, however, relies on processing images directly in pixel space, which is computationally expensive and requires a huge IT infrastructure, generally only available to a few organisations with adequate resources. To address this shortcoming, latent diffusion models (LDM) were introduced, operating in a lower dimensional latent space, which supports the ability to train image restoration models on more standard and accessible computer resources.

The blind face restoration technique — CodeFormer — can also be used for two main tasks: reducing or removing perceivable image degradation and matching degraded image features to desired image quality and style. This technique uses a Transformer-based architecture to create a low quality image representation specifically contextualised for human faces.

In the second case, deblurring, or de-focusing, techniques are used to create a final stage that works directly on the resolution of the original image to capture fine-grained spatial details.

Beyond the more exquisitely technical details, (which you can find in the paper at the link below: https://farid.berkeley.edu/downloads/publications/cvprw24b.pdf) , the study made use of two datasets for the evaluations: the first real-world dataset is derived from the CASIA-Webface dataset, consisting of 491,414 images derived from 10,575 identities. These images are of various sizes, quality, pose, subject attire and environment. Due to the initial quality of the dataset, some manual editing was required, including the removal of duplicate images and the removal of incorrectly labelled images. Then, a second synthetically generated dataset was used, as it offers more detailed control over the differences in each subject’s appearance within and between identities, using a combination of classical rendering and generative synthesis to create photorealistic human faces. All images are rendered at a resolution of 512×512 pixels.

Over the past decade, the improvement of facial recognition models for forensic identification tasks has been remarkable. However, much of the performance evaluation of such models has been conducted in controlled laboratory environments that do not necessarily replicate the diversity of data and difficulty of tasks inherent in real-world forensic environments.

In conclusion, after the authors explored the impact of super-resolution and optical/movement deblurring on forensic face recognition, they found that under certain conditions and with the appropriate choice of enhancement model, these tools can be an asset. At the same time, however, this type of image enhancement is not a panacea and care must be taken when using these techniques to carefully understand their effectiveness in the presence of different levels of image degradation, the type of degradation, the nature of the desired enhancement and the underlying facial recognition model. On the other hand, the observed cases of failure are worrying. What is particularly worrying about these hallucinations is that there is no obvious way to determine that such a hallucination occurred only by looking at the enlarged image. Further analysis will be needed to assess the effectiveness of other forms of image enhancement in the form of, for example, de-noising and in-painting, and the interaction between different forms of image degradation.

All Rights Reserved

Raffaella Aghemo, Lawyer

--

--

Innovative Lawyer and consultant for AI and blockchain, IP, copyright, communication, likes movies and books, writes legal features and books reviews