Multinational tech firm Meta recently announced their newest generative AI model, CM3leon. It is a multimodal AI model that can do text-to-image and image-to-text generation.
The novel technology used to develop this model is already interesting, but what’s even more interesting is that the company claims it outperforms most of the current image generators available regarding image quality and generation accuracy.
Let’s learn more!
The most recent introduction from Meta, CM3leon (pronounced “chameleon”), is a multimodal generative model that can understand images and text. It can generate images from text prompts, perform image edits from written instructions, answer questions, and generate descriptive captions about a given image, all with impressive precision.
Unlike most popular AI image generators (like Dall-E 2 or Midjourney), which are based on computation-heavy “diffusion” technology, CM3leon is built on a transformer model. This technology uses the “attention” method, which essentially creates images by analyzing the input text, images, and other data based on their relevance.
Meta applied a decoder-only transformer that had previously worked for text-only models –like OpenAI’s ChatGPT–to images and text generation. The model is also more efficient and lower cost as transformers are much lighter computationally.
As a result, CM3leon has about 7 billion parameters (bits of information learned from training, which indicate the model’s ability). For context, that is more than 2x the parameters of Dall-E 2.
And so, it performs a lot better in terms of understanding details in text prompts and synthesizing images that not only follow the prompt to a T but also have a much higher quality overall.
Another interesting point is that CM3leon achieves its high-quality image generation results despite being trained with a lot less volume of data than most diffusion models. Meta’s latest model is trained with a licensed dataset of millions of images from Shutterstock –their long-term partner for AI tech development–versus the billions of images in Dall-E’s and other similar tools’ datasets (which are under the microscope regarding legality, too).
Sharing this information, the firm is trying to make a point that an image generator that is efficient and transparent is possible. That added to the legal safety of working with all authorized content only –and providing fair compensation as Shutterstock does through their Contributor Fund program–they hope will encourage collaboration in AI development.
The model so far can perform a number of generative visual tasks with great results:
Regarding quality, while the initial raw output of the model is ok in terms of definition and details, Meta says that a separately trained, super-resolution model can be added to make the results high-quality, we’d say even photorealistic.
At this time, Meta only introduced its new AI image generator with information and sample visuals. But the tool itself is not available to the public, and there is no estimation of when it could be.
As promising as this tool is, we can only wait until the day CM3leon is publicly launched, and it wouldn’t be surprising if it takes a little while.
Are you excited about this new image generator from Meta?
I am an experienced author with expertise in digital communication, stock media, design, and creative tools. I have closely followed and reported on AI developments in this field since its early days. I have gained valuable industry insight through my work with leading digital media professionals since 2014.
AI Secrets is a platform for tech decision-makers to learn about AI technology. Our team includes experts such as Amos Struck (20+ yrs ICT, Stock Photo, AI), Ivanna Attie (expert in digital comms, design, stock media), and more who share their views on AI.
Get AI news in your inbox & join thousands of engineers and managers using AI to boost sales and grow market share