I examined the primary main AI image generator, Dall-E, when it initially launched. Since then, I’ve watched because the world of generative AI has exploded, however one function has at all times bugged me: textual content in pictures.
As faces seemed clearer, and arms went all the way down to the proper variety of fingers, each mannequin nonetheless appeared to essentially battle with creating textual content.
But as faces looked clearer, and hands went down to the correct number of fingers, every model still seemed to really struggle with creating text. Whether that was on a poster, a sign or even a T-shirt, it often looked like a giant smudge of hieroglyphics.
In recent updates, the problem has started to fade. ChatGPT could reliably recreate text, but only to a certain extent, quickly having a meltdown if your request became too specific.
Then, in steps Gemini 3, or extra particularly, Google’s current replace to Nano Banana Pro, its large AI picture improve. This improve improved plenty of key areas for the software, however textual content recreation was by far the most important in my eyes.
What’s new with Nano Banana Pro
Nano Banana Pro has improved image quality and added the ability to see if an image is AI-generated. It can also now edit multiple reference images into one coherent final product. What’s more, you can now translate text in images into another language and also create complicated text-based images.
As our How To editor, Kaycee Hill, pointed out, this has made it an incredible tool for creating infographics. With a easy immediate, Gemini 3 can pump out an advanced infographic, together with clear textual content and accompanying pictures to clarify it.
Nevertheless, on prime of that, the AI mannequin now has a greater understanding of fonts, textual content colours and sizes. This enables for much extra creativity than earlier than, providing the power to customise your infographics, labels and journal covers like by no means earlier than.
In a single instance from Gemini, a picture of an astronaut is was a storyboard sketch, full with legible written textual content and the reference picture was a drawing.
Elsewhere, Gemini creates an power drink model, writing the textual content on the can in English. Then, utilizing the immediate: “translate all of the English textual content on the three yellow and blue cans into Korean, whereas retaining all the things else the identical,” the cans reworked, retaining the textual content in the identical place, now merely translated.
Is text in an image really that exciting?
Over the years, AI has faced challenges that have been clear identifiers of its weaknesses. For a while, AI image generators couldn’t make hands, but now they can. AI video generators couldn’t recreate the intricate nature of gymnastics, however it’s bettering quickly. Now, AI picture mills are lastly getting textual content.
This opens up an enormous avenue for these sorts of instruments that simply weren’t dependable earlier than. Translating textual content in pictures, creating detailed infographics and reliably remaking completely different fonts are extremely helpful throughout quite a few sectors.
As this type of expertise improves, it may very well be used to create complete storyboards, magazines or posters utterly from scratch.
Not solely that, however that is an space the place Gemini has a large benefit, leaving the likes of ChatGPT behind.
Observe Tom’s Guide on Google News and add us as a preferred source to get our up-to-date information, evaluation, and critiques in your feeds.
Extra from Tom’s Information
Again to Laptops