In the field of data compression, traditional methods have long dominated, ranging from lossless techniques such as ZIP file compression to lossy techniques like JPEG image compression and MPEG video compression. These methods are typically rule-based, utilizing predefined algorithms to reduce data redundancy and irrelevance to achieve compression. However, with the advent of advanced machine learning techniques, particularly autoencoders, new avenues for data compression have emerged that offer distinct advantages over traditional methods in certain contexts.
Auto-encoders are a class of neural network designed for unsupervised learning of efficient encodings by compressing input data into a condensed representation and then reconstructing the output from this representation. The primary architecture of an auto-encoder consists of two main components: an encoder and a decoder. The encoder compresses the input into a smaller, dense representation in the latent space, and the decoder reconstructs the input data from this compressed representation as closely as possible to its original form.
The flexibility and learning-based approach of autoencoders provide several benefits over traditional compression methods:
In the notebook, an experimental framework is set up to investigate the compression capabilities of autoencoders using the MNIST dataset. MNIST, a common benchmark in machine learning, consists of 60,000 28x28 grayscale images in 10 classes, providing a diverse range of handwritten digits for evaluating model performance.
The notebook utilizes a convolutional autoencoder, leveraging the spatial hierarchy of convolutional layers to efficiently capture the patterns in image data. The autoencoder's architecture includes multiple convolutional layers in the encoder part to compress the image, and corresponding deconvolutional layers in the decoder part to reconstruct the image. The model is trained with the objective of minimizing the mean squared error (MSE) between the original and reconstructed images, promoting fidelity in the reconstructed outputs.
The notebook details a systematic exploration of different sizes of the latent space, ranging from high-dimensional to low-dimensional representations. The goal is to understand how the dimensionality of the latent space affects both the compression percentage and the quality of the reconstruction. The compression percentage is calculated based on the ratio of the dimensions of the latent space to the original image dimensions, while the reconstruction error is measured using the MSE.
The results presented in the notebook illustrate a trade-off between compression and reconstruction quality. Specifically, as the latent space is reduced, achieving higher compression percentages, the reconstruction error initially remains low, indicating effective compression. However, a marked increase in reconstruction error is observed as the latent dimension is further reduced beyond a certain threshold . This suggests a boundary in the compression capabilities of the autoencoder, beyond which the loss of information significantly impacts the quality of the reconstructed images.
The chart below shows the reconstruction error per label for 95% and 99% compression
Now let's take a look at sample of images and see how reducing the compressed image size affects the reconstructed image:
We notice that the more we increase the compression ratio the more reconstructed images get blurred. The second image which is of digit (2) was reconstructed in a good way until we tried compression ratio of 99%, in this case it started to look like an (8). Similarly for the digit (4), it looked like a (9).
Below is a scatter plot that shows the difference between the original image (Blue) and the reconstructed image (Red) using t-SNE. It's obvious that both versions are very close. For low compression ratios, the red and blu points are on top of each other. But for the 99% compression ration, the points starts to diverge.
This publication investigated the efficacy of autoencoders as a tool for data compression, with a focus on image data represented by the MNIST dataset. Through systematic experimentation, we explored the impact of varying latent space dimensions on both the compression ratio and the quality of the reconstructed images. The primary findings indicate that autoencoders, leveraging their neural network architecture, can indeed compress data significantly while retaining a considerable amount of original detail, making them superior in certain aspects to traditional compression methods.
Key results demonstrated a clear trade-off between compression ratio and reconstruction quality. Notably, as the size of the latent dimension decreased, the compression ratio increased; however, this also led to a rise in reconstruction error, particularly beyond a critical threshold of compression. These results underscore the potential of autoencoders to adapt to specific data characteristics, offering a flexible and powerful approach to data compression.