Snapchat’s looking to accelerate the response time of generative AI image creation, with a new approach that presents a faster model for building visuals based on text queries.
Which I wouldn’t have thought is a major impediment to usage. Most generative AI tools currently take, maybe, 30 seconds or so to generate such images, even on mobile devices. But Snap says that its new system is able to produce similar visuals within less than two seconds - which, while it may not be a major game-changer, is an interesting development in the broader context of generative AI process.
As explained by Snap:
“SnapFusion shortens the model runtime from text input to image generation on mobile to under two seconds–the fastest time published to date by the academic community. Snap Research achieved this breakthrough by optimizing the network architecture and denoising process, making it incredibly efficient, while maintaining image quality. So, now it’s possible to run the model to generate images based on text prompts, and get back crisp clear images in mere seconds on mobile rather than minutes or hours, as other research presents.”
These are some examples of the visuals produced by the SnapFusion process, which still look much like the same type of generative AI pictures that you get from any other app (i.e. pretty close but kinda weird). But they were returned to the user much faster, which Snap says could have a range of benefits.
An improved user experience is one factor, but Snap also notes that the new process could facilitate improved privacy, by limiting data sharing to third parties, while also reducing processing costs for developers.
Though Snap’s research does include a few asterisks, including, most notably, that the majority of its experiments were conducted on an iPhone Pro 14, which, in Snap’s own words ‘has more computation power than many other phones’. As such, it’s probably doubtful that anything less than this is going to meet these speed benchmarks – but it’ll still likely be quicker than current systems.
Snap’s provided a full overview of ‘denoising’, along with far too many mathematical equations, in its full paper on the process, which can download for yourself here.
It’s an interesting experiment, which also points to the future of generative AI, which will eventually be able to respond to user cues in real time, which could enable a whole range of new usage options, like real-time translation, increasingly responsive creation, and more.