Meta Introduced CM3leon, a Multimodal AI Art Generator

Multinational tech firm Meta recently announced their newest generative AI model, CM3leon. It is a multimodal AI model that can do text-to-image and image-to-text generation.

The novel technology used to develop this model is already interesting, but what’s even more interesting is that the company claims it outperforms most of the current image generators available regarding image quality and generation accuracy.

Let’s learn more!

Meta’s CM3leon: A Better and More Efficient AI Image Generator

The most recent introduction from Meta, CM3leon (pronounced “chameleon”), is a multimodal generative model that can understand images and text. It can generate images from text prompts, perform image edits from written instructions, answer questions, and generate descriptive captions about a given image, all with impressive precision.

Unlike most popular AI image generators (like Dall-E 2 or Midjourney), which are based on computation-heavy “diffusion” technology, CM3leon is built on a transformer model. This technology uses the “attention” method, which essentially creates images by analyzing the input text, images, and other data based on their relevance.

Meta applied a decoder-only transformer that had previously worked for text-only models –like OpenAI’s ChatGPT–to images and text generation. The model is also more efficient and lower cost as transformers are much lighter computationally.

As a result, CM3leon has about 7 billion parameters (bits of information learned from training, which indicate the model’s ability). For context, that is more than 2x the parameters of Dall-E 2.

And so, it performs a lot better in terms of understanding details in text prompts and synthesizing images that not only follow the prompt to a T but also have a much higher quality overall.

meta cm3leon sample image generation without prompts — *Samples of AI-generated images created with CM3leon:* (1) “A small cactus wearing a straw hat and neon sunglasses in the Sahara desert. (2) A close-up photo of a human hand, hand model. High quality. (3) A raccoon main character in an Anime preparing for an epic battle with a samurai sword. Battle stance. Fantasy, Illustration. (4) A stop sign in a Fantasy style with the text “1991.”

CM3leon Training: Less Cuantious, Transparent, and Legally-Safer

Another interesting point is that CM3leon achieves its high-quality image generation results despite being trained with a lot less volume of data than most diffusion models. Meta’s latest model is trained with a licensed dataset of millions of images from Shutterstock –their long-term partner for AI tech development–versus the billions of images in Dall-E’s and other similar tools’ datasets (which are under the microscope regarding legality, too).

Sharing this information, the firm is trying to make a point that an image generator that is efficient and transparent is possible. That added to the legal safety of working with all authorized content only –and providing fair compensation as Shutterstock does through their Contributor Fund program–they hope will encourage collaboration in AI development.

meta cm3leon sample image editing — Sample of images edited with CM3leon using written instructions

What Can CM3leon Do

The model so far can perform a number of generative visual tasks with great results:

Text-to-image generation – As mentioned before, CM3leon does very well with the generation of images from text prompts that contain multiple constraints (such as specific details on composition or complex descriptions of objects).
Text-guided image editing – Because the model understands images better than others, it can also perform image edits from text instructions (i.e., “change the color of the sky to pink”) successfully. This sounds like a graphic design assistant like those Adobe Firefly, Shutterstock, and the design platform Canva are also developing.
Analyzing image content – When it comes to text output, the model is capable of correctly answering prompts that quiz about an input image’s content.
Caption generation – Similarly, if prompted to describe an image, the software can produce captions with a considerable level of detail.

Regarding quality, while the initial raw output of the model is ok in terms of definition and details, Meta says that a separately trained, super-resolution model can be added to make the results high-quality, we’d say even photorealistic.

CM3leon and the Future

At this time, Meta only introduced its new AI image generator with information and sample visuals. But the tool itself is not available to the public, and there is no estimation of when it could be.

As promising as this tool is, we can only wait until the day CM3leon is publicly launched, and it wouldn’t be surprising if it takes a little while.

Are you excited about this new image generator from Meta?

THE AUTHOR

Ivanna Attie

All About Ivanna

I am an experienced author with expertise in digital communication, stock media, design, and creative tools. I have closely followed and reported on AI developments in this field since its early days. I have gained valuable industry insight through my work with leading digital media professionals since 2014.

NETWORKING

Meta Introduced CM3leon – Top-Of-The-Game Multimodal AI Art Generator

Meta’s CM3leon: A Better and More Efficient AI Image Generator

CM3leon Training: Less Cuantious, Transparent, and Legally-Safer

What Can CM3leon Do

CM3leon and the Future

AI Insights from Experts