Dubbed as Super-Resolution via Repeated Refinements (SR3) and Cascaded Diffusion Models (CDM), the two new technologies were recently developed by Google Research’s Brain Team. The Mountain View giant recently published an in-depth blog post on its AI forum, detailing both technologies. It is similar to the previous AI algorithm that we saw researchers at the Duke University of North Carolina develop earlier this year.
Now, starting with the SR3 model, it is essentially a super-resolution diffusion model that can convert low-resolution images into high-res ones from pure noise. It takes a low-resolution image as input and uses an image corruption process, using which it was trained, to progressively add noise to the image until only pure noise remains. It then reverses the process and starts removing the noise to reach the target image with the low-res input image as a reference.
The company says that with large-scale training of the SR3 model, it was able to achieve strong benchmark results on the super-resolution task for face and natural images. The model could convert a 64 x 64 input image into a 1024 x 1024 image. To demonstrate the process, Google shared a short video that showcases the SR3 model in action, which you can check out right below.
Now, coming to the second AI model, the Cascaded Diffusion Model (CDM) is a class-conditional diffusion model that has been trained on ImageNet data. This enables the model to churn out high-resolution natural images by chaining multiple generative models over several spatial resolutions.
In this process, the CDM model uses one diffusion model to generate data at a low resolution, followed by a sequence of SR3 super-resolution diffusion models. This gradually increases the resolution of a low-res image to its highest resolution. You can check out the GIF attached below to get a better idea of the image generation process.
Other than the two models described above, the researchers at Google AI also developed a new data augmentation technique called conditioning augmentation. It further improves the sample quality results of CDM by using Gaussian noise and Gaussian blur. Moreover, it prevents each super-resolution model from overfitting to its lower resolution conditioning input. It results in better high-resolution sample quality for CDM.
So, with the above AI-based image improvement models, Google says that it has pushed the limit of diffusion models to the state-of-the-art on super-resolution and class-conditional ImageNet generation benchmark. The researchers will further test the limits of these models for more generative modeling problems going forward.