Google Unveiled Lumiere, an AI Video Generator Based on Space-Time Diffusion Model

The tech firm Google released a paper and demo over the weekend showcasing their latest AI development, the text-to-video generator Google Lumiere

This new tool for video creation is based on a new diffusion model for video that allows it to create short video clips in one step, unlike the existing generators that stitch still frames together. 

Google Lumiere’s preview clips show impressive results and evidence that AI video generation is a reality already. 


Text-To-Video Generation in One Single Pass 

Google released its latest AI generative model, Google Lumiere, last Saturday. According to its product page and attached research paper, this text-to-video generator uses a new diffusion model, Space-Time-U-Net (STUNet), which creates short video clips in one step. This announcement comes only a couple of months after the launch of Google Gemini, the company’s largest large language model, and weeks after a leak assured that AI image generation is coming to Google Bard

The STUNet model generates a base frame from the given text prompt, predicts where the objects in the frame will go and how they’ll move, and creates a full-frame-rate video clip in one pass. The process combines previously trained diffusion models for text-to-image generation to achieve “realistic, diverse, and coherent motion.” 

This is a distinct change from existing AI video generators like Runway Gen 2 or Stable Video Diffusion, which generate multiple still frames and stitch them together using temporal super-resolution. 

Besides text-to-video, the Lumiere tool is said to be able to process image-to-video generation, stylized video generation (creating video in a given style), cinemagraphs –which only animate one element in an otherwise still image– and video inpainting that is essentially AI video editing of selected areas in a frame, letting you change a color in an element, for example. 

AI Video Generation is a Reality

Google Luminere has yet to be available to the public, but its unveiling makes it clear that text-to-video is steadily and quickly becoming a reality for all. 

While still not entirely realistic, the preview reels of videos created with Lumiere are promising, looking considerably more authentic than those of existing tools like Meta’s Emu or Stable Video Diffusion. 

Besides launching a revolution in video synthesis with Luminere, Google acknowledged in its paper that the tool has the potential to create harmful or misguiding content and that safeguards should be developed to protect it from bias that could result in malicious uses or results. However, they give no details on how or when that would or should be implemented. 

One thing is sure. With a handful of AI video generators already being explored and Google Lumiere making its first public appearance, we are every day closer to having text-to-video generators at our fingertips. 

What do you think of Google Lumiere? Are you eager to try it? 


