fbpx

Speeding up Image Classification and Segmentation

Francis Doumet

02-01-2020

Machine learned compression algorithms, such as those used by Compression.ai, have been shown to outperform traditional JPEG, JPEG2000 or even BPG (based off the H.265 standard) on visual perception metrics such as the widely-accepted Multi-scale Structural Similarity Index (MS-SSIM).

Besides compression, such networks can also be used to simplify and speed up downstream applications operating on the images, such as classification or segmentation, reducing computational complexity by up to a factor of 2x. Since machine-learned encoders generate compressed representations containing enough information to regenerate the originals with minimal perceived loss, the same compressed representations can be used directly in neural networks to classify images with no significant drop in accuracy.

Using a machine-learned image compression encoder for downstream inference becomes particularly interesting for remote applications which compress images on the client side before transmission to a central server. Besides saving transmission time and costs, this scheme also avoids needing to decompress and reconstruct the images before feeding them into classification or segmentation networks, which significantly speeds up processing time.

Classification Results

Performing standard classification on the ImageNet dataset yields a top-5 accuracy of 89.96%. On the other hand, performing the same classification on ImageNet’s compressed representations yields a top-5 accuracy of 88.34% at 0.635 bpp, reducing the storage space required for the dataset down to 24.8 GB from 144 GB — a reduction factor of 5.8x.

Segmentation Results

Segmentation performance typically decreases with the bitrate, as detail is removed from the image. Using machine-learned compression, however, we found that using the compressed representation of the image instead of the decoded RGB versions leads in fact to better results as can be seen below:

Decoeded RGB Image

Segmentation mask starting from decoded RGB

Segmentation mast starting from compressed representation

Further, computational complexity for segmentation is reduced by 3.3 · 10^9 FLOPs when operating on compressed representations of the images.

Reach out here for more on information on how Compression.ai can help accelerate image processing applications.

(*) Results originally published by Agustsson et al. at ICLR 2018

Share this article