Enhancing JPEG with Machine Learning

Migel Tissera


325 KB(74% reduction)

2000 X 2400

1 MB

2000 X 2400

745 KB(74% reduction)

2000 X 2400

3 MB

2000 X 2400

Early on at Compression.ai, our main image compression algorithm used to look a lot more like most other Machine Learning (ML) based image compression algorithms — an auto-encoder. We had a branched neural network as the encoder, a branched neural network as the decoder, and an entropy coding module sandwiched in the middle. We then trained this whole network using the 1M ImageNet dataset, which contain just over 1 million images.

It was super simple. And it worked! Our model compressed images at 0.8 bit-per-pixel (bpp), while keeping the MS-SSIM over 0.99. Comparing against all the published work at the time, we were at the top in terms of visual quality to compression ratio. It was great, and we thought we had something.

A typical auto-encoder

Figure 1: A typical auto-encoder

But trying to get people to adapt a new file format was way harder than we originally thought, it was nearly impossible. No one wants to download a codec (that handles decompression) to open their images, even if it’s anything open source – let alone a proprietary one. We figured this out early on, and knew that if we were to make something substantial in the compression space, it had to be about taking an existing standard and improving on it, rather than creating a completely new one.

JPEG, first introduced to the world almost 3 decades ago, is still the most widely used image compression standard in the world. It is surprisingly simple, yet extremely effective. We figured if we needed to create something for the masses, it needed to be JPEG. This is when we began our work in improving JPEG with ML. In this blog post, I’ll detail the process we took, and how we ultimately created the best image compression algorithm there is.

Enter JPEG

To really understand how we can bring ML to JPEG, we’d need to start with JPEG itself and understand how JPEG compresses images. For a given image, the following explains how the JPEG algorithm works:

  • First, the image is broken in to 8×8, non-overlapping blocks
  • Then for each 8×8 block, the algorithm:
    • Converts the color space to YUV
    • Performs subsampling on the UV channels (In case for 4:2:0 for example)
    • Applies the Discrete Cosine Transform (DCT)
    • Quantizes using a Q-Table
    • Entropy codes to obtain a bit string
  • Lastly, this bit string (each per block of 8×8), together with metadata (such as the Q-Table) comprises the JPEG file.

The JPEG decoder, on the other hand, performs the above in reverse to reconstruct the image. If we take the output after quantization, perform de-quantization (basically by multiplying element-wise with the Q-Table) followed by an Inverse Discrete Cosine Transform (IDCT), and convert the color space back to RGB, then we end up with 8×8 blocks of the original image. To go into more depth into how the JPEG compression works, refer to the following articles:

  1. https://www.freecodecamp.org/news/how-jpg-works-a4dbd2316f35/
  2. https://medium.com/breaktheloop/jpeg-compression-algorithm-969af03773da

Enter Machine Learning

So how do we go about improving this process with Machine Learning? When you take a step back, you realize that the above computational steps can be executed in tensor-flowy fashion.  Take an image, say of dimensions (a, b, 3), and create 8×8 patches of it (N, 8, 8, 3), N being the number of patches. (It helps if the image dimensions are divisible by 8, otherwise zero-padding would do. Anyway, details don’t matter much here, let’s focus image flow for now).  On each 8×8 patch, we apply DCT, followed by a quantization step. A dequantization step  followed by IDCT will yield back the original color space.

Now, if we had trainable parameters, we can imagine this process as a flow of a tensor, and use a framework like TensorFlow to code the model. You can then start training it with Stochastic Gradient Descent (SGD), and optimize the network for MS-SSIM, for example.

Quantization in the JPEG algorithm consists of two operations. The first is to divide the numbers element-wise using a table commonly referred to as a Q-Table. This division is then followed by rounding to the nearest integer (to convert from floating-point numbers after DCT to integers in order to save space).

What we realized was, all the meaningful compression in the JPEG algorithm happens at this Quantization step – it is when the output of the DCT gets divided by the Q-Table. If we then use a neural network to approximate the best Q-Table possible, while guaranteeing a minimum MS-SSIM score, we have a winner.

Compression.ai’s newest JPEG compression engine was built using this concept. At its heart, we have a neural network outputting the best Q-Table guaranteeing the smallest output files size for the highest MS-SSIM score, tailored for each and every input image. A super fast implementation of the traditional JPEG compression algorithm is then used to create the final output image file.

So there you go. There’s a peak into the machine learning side of our algorithms.

At Compression.ai, we develop spectral compression solutions, such as the above, in addition to 16-bit greyscale video compression, and CCSDS-123 compliant hyperspectral lossless data compression. Our Enterprise offerings are containerized, and ready to deploy on any Kubernetes infrastructure. As a solutions provider, we’d love to talk to you about your data management needs and how you could save on both data storage and transmission.

Would love to hear your thoughts on our ML approach, as well as on our other products. Get in touch at migel@compression.ai.

Share this article