GoogleAnnounceGoogle has announced a new wave of feature upgrades for its Gemini image generation model, specifically enhancing AI editing, generation consistency, and flexibility. This update, developed by the DeepMind team, is now available in the Gemini app. Google also emphasized that all images generated or edited via Gemini will be accompanied by a visible digital watermark to clearly indicate their AI-generated origin.
Strengthen role consistency and improve editing accuracy
One of the biggest highlights of the new version is that it can ensure the consistency of portrait characters during continuous image editing.
In the past, when AI generated or repeatedly modified images, characters often exhibited discrepancies in details, such as inconsistent facial features, clothing, or proportions. Gemini's new model maintains the authenticity of generated characters, allowing users to more naturally place their own images into different scenes or clothing without the "face-shifting" effects of repeated editing.
Another improvement is the addition of multi-stage image editing, allowing users to gradually change image elements, such as adjusting the background first and then replacing specific objects, without losing the previous changes. Gemini can also combine two images to create a new scene or use elements from existing images to create new design cues, thereby increasing creative flexibility.
Comparison with other image generation tools
The evolution of Gemini allows Google to more directly compete with other generative AI tools:
• OpenAI DALL·E 3:Currently, ChatGPT is highly integrated, supporting text-to-image generation and inpainting. However, Gemini's new model clearly has an advantage in character consistency control, making it particularly attractive for users who require continuous creation.
• Adobe Firefly:Focusing on creators and the design industry, Gemini emphasizes commercial licensing of generated images and integrates tools like Photoshop and Illustrator. While lacking a complete professional software ecosystem, its ability to maintain characters across multiple scenes makes it a promising lightweight option for supporting creators.
• Stable Diffusion:Known for its open source and highly customizable nature, users can achieve diverse editing through local models or community plug-ins. However, for general users, Gemini integrates with cloud services and Google services, providing a more user-friendly experience and lowering the learning curve.
AI-generated transparency and future impact
Google emphasized that all images generated through Gemini will automatically be digitally watermarked to ensure transparency and traceability. As generative AI imagery is increasingly used in media, advertising, education, and entertainment, this design also addresses concerns about "deepfakes" and misinformation.
Gemini's enhanced functionality allows Google to move beyond simply providing tools in the AI image generation market and further consider how to ensure long-term trust in generated images. As competitors strengthen their respective areas, Gemini has found a niche in character consistency, editing flexibility, and transparency. Whether it can establish more direct market competition with DALL·E, Firefly, and Stable Diffusion in the future will be a worthwhile next step.
Comparison table of AI image generation and editing tools
| Tool name | main features | Edit function | Advantage | Restrictions/Disputes |
| Google Gemini (DeepMind) | Integrate with Google ecosystem, support generation + editing | – Maintain character continuity – Multi-segment editing without interruption – Image synthesis (merging multiple images) – Visual feature conversion |
– Emphasis on consistency, especially the stability of character images – All content has digital watermarks for easy identification |
– Initial functions are concentrated in Gemini App – Still needs to prove its maturity compared to professional design tools |
| adobe firefly | Deep integration with Photoshop and Illustrator | – Generative Fill – Style transfer – Vector generation |
– Seamless integration with the design software ecosystem – Suitable for professional designers |
– Requires subscription to Adobe suite, which is costly |
| OpenAI DALL·E (currently v3) | Deep integration with ChatGPT | – Inpainting (block editing) – Text to image |
– Low barrier to entry and intuitive conversational operation – Suitable for quick ideation and storyboarding |
– More generative, less flexible and precise editing than Firefly |
| midjourney | Community-driven, good at art style | – Prompt fine-tuning – Local deformation – Upgraded resolution |
– Strong artistic sense and delicate image generation – Sharing inspiration with the community |
– Relies on Discord platform operation – Commercial use authorization requires attention |
| stable diffusion | Open source community ecosystem | – Inpainting – ControlNet (detailed control) – Model fine-tuning |
– Highly customizable, capable of training dedicated models – Not restricted to a single platform |
– High technical threshold – Image generation quality varies greatly depending on the model |



