Google’s Whisk AI generator will ‘remix’ the photographs you plug in

Last updated: December 16, 2024 7:36 pm

7 months ago

Google’s Whisk AI generator will ‘remix’ the photographs you plug in

Google has introduced a brand new AI instrument referred to as Whisk that allows you to generate pictures utilizing different pictures as prompts as a substitute of requiring a protracted textual content immediate.

With Whisk, you possibly can provide pictures to recommend what you’d like as the topic, the scene, and the model of your AI-generated picture, and you’ll immediate Whisk with a number of pictures for every of these three issues. (If you’d like, you possibly can fill in textual content prompts, too.) For those who don’t have pictures readily available, you possibly can click on a cube icon to have Google fill in some pictures for the prompts (although these pictures additionally look like AI-generated). You can too enter some textual content right into a textual content field on the finish of the method if you wish to add additional element concerning the picture you’re searching for, however it’s not required.

Whisk will then generate pictures and a textual content immediate for every picture. You possibly can favourite or obtain the picture in case you’re pleased with the outcomes, or you possibly can refine a picture by getting into extra textual content into the textual content field or clicking the picture and modifying the textual content immediate.

A screenshot of Whisk. I clicked the cube to generate a topic, scene, and magnificence. I swapped out the auto-generated scene by getting into a textual content immediate. Whisk created the primary two pictures, which I iterated on by asking Whisk so as to add some steam across the topic (as a result of it’s a hearth being in water), ensuing within the subsequent two pictures.

Screenshot by Jay Peters / The Verge

In a weblog publish, Google stresses that Whisk is designed to be for “speedy visible exploration, not pixel-perfect edits.” The corporate additionally says that Whisk could “miss the mark,” which is why it permits you to edit the underlying prompts.

Within the jiffy I’ve used the instrument whereas scripting this story, it’s been entertaining to tinker with. Photos take a number of seconds to generate, which is annoying, and whereas the photographs have been just a little unusual, all the things I’ve generated has been enjoyable to iterate on.

Google says Whisk makes use of the “newest” iteration of its Imagen 3 picture era mannequin, which it introduced at present. Google additionally launched Veo 2, the subsequent model of its video era mannequin, which the corporate says has an understanding of “the distinctive language of cinematography” and hallucinates issues like additional fingers “much less continuously” than different fashions (a kind of different fashions might be OpenAI’s Sora). Veo 2 is coming first to Google’s VideoFX, which you will get on the Google Labs waitlist for, and it is going to be expanded to YouTube Shorts “different merchandise” someday subsequent yr.