Enhancing JPEG with Machine Learning

Migel Tissera


325 KB(74% reduction)

2000 X 2400

1 MB

2000 X 2400

745 KB(74% reduction)

2000 X 2400

3 MB

2000 X 2400

Early on at Compression.ai, our main image compression algorithm used to look a lot more like most other Machine Learning (ML) based image compression algorithms – an auto-encoder. We had a branched neural network as the encoder, a branched neural network as the decoder, and an Entropy coding module in the middle to join them. Then we trained this whole network using the 1M ImageNet dataset, which contain just over 1 million images.

It was super simple. And it worked! Our model compressed images at 0.8 bit-per-pixel (bpp), while keeping the MS-SSIM over 0.99. Comparing against all the published work at the time, we were at the top in terms of visual quality to compression ratio. It was great, and we thought we had something.

A typical auto-encoder

Figure 1: A typical auto-encoder

But trying to get people to adapt a new file format was way harder than we originally thought, it was nearly impossible. No one wants to download a codec (that handles decompression) to open their images, even if it’s anything open source – let alone a proprietary one. We figured this out early on, and knew that if we were to make something substantial in the compression space, it had to be about taking an existing standard and massively improve upon it, rather than to create a new one.

JPEG, first introduced to the world almost 3 decades ago, is still the most widely used image compression standard. It is surprisingly simple, yet extremely effective. We figured if we needed to create something for the masses, it needed to be JPEG. This is when we began our work in improving JPEG with ML. In this blog post, I will explain to you the process we took, and how we ultimately created the best image compression algorithm in the world.

Enter JPEG

To really understand how we can bring ML to JPEG, we’d need to start with JPEG itself and understand how JPEG compresses images. For a given image, the following explains how the JPEG algorithm work:

  • The image is broken in to 8×8, non-overlapping blocks
  • Then per 8×8 block:
    • Convert the colour space to YUV
    • Perform subsampling on the UV channels (In case for 4:2:0 for example)
    • Discrete Cosine Transform (DCT)
    • Quantization using a Q-Table
    • Entropy code to obtain a bit string
  • The bit strings (each per block of 8×8), together with metadata (such as the Q-Table) comprise the JPEG file.

The JPEG decoder then does the above in reverse to reconstruct the image. If we take the output after the Quantization step, and we do de-quantization (basically by multiplying element-wise with the Q-Table), followed by Inverse Discrete Cosine Transform (IDCT), and colour space conversion back to RGB, then we end up with 8×8 blocks of the original image. For a some great reference on how JPEG compression works, you can refer to these articles.

Enter Machine Learning

So how do we go about improving this with Machine Learning? If you think of it though, the above leads way to an idea to execute the computations in a tensor-flowy fashion.  You take an image, say of dimensions (a, b, 3), and create 8×8 patches of it (N, 8, 8, 3), N being the number of patches. (It helps if the image can get divided by 8, otherwise zero padding would do. Anyway, details don’t matter here, let’s just focus on the flow of the image).  Then on each patch, we do DCT, followed by Quantization. Then you de-quantize, IDCT and you end up with in the original color space.

Now, if we had trainable parameters, we can imagine this process as a flow of a tensor, and use a framework like TensorFlow to code the model. Then you can start training it with Stochastic Gradient Descent (SGD), and optimize the network for MS-SSIM, for example.

Quantization in the JPEG algorithm consist of two operations. First is to divide the numbers element-wise using a table commonly referred to as a Q-Table. This division is then followed by rounding to the nearest integer (to convert back from floating-point numbers after DCT to integer point arithmetic to save space).

What we figured out was, all the meaningful compression in the JPEG algorithm happens at this Quantization step – it is when the output of the DCT gets divided by the Q-Table. Then if we use a neural network to approximate the best Q-Table possible, while guaranteeing a minimum MS-SSIM score, we have a winner.  While we can’t go into too much detail here, what we have in production at Compression.ai as our “V2” Compression Engine (CE) is just that.

At the heart of it, we have a neural network outputting the best Q-Table that guarantees a minimum MS-SSIM score, tailored for each and specific image. Then we use a super fast implementation of JPEG compression to create the image file.

So there you go.. That’s a bit on our algorithm/ML side.

At Compression.ai, we have the above and other spectral compression solutions such as 16-bit grey scale video HEVC compression, as well as CCSDS-123 compliant hyperspectral lossless data compression capability. Our Enterprise offerings are all containerised, ready to be deployed on your Kubernetes infrastructure. As a solutions provider, we’d love to talk to you about your data compression needs and how you could re-architect your entire spectral data management solution.

Would love to hear all your thoughts on our ML approach, as well as on our other products. Get in touch at migel@compression.ai.

Share this article