Gemini 2.5 Flash Image released!

In partnership with

Dear Readers,

Sometimes the world of AI feels like a sprint where hardly anyone stops to catch their breath—today is one of those days. Gemini 2.5 Flash Image is a model that redefines our idea of image editing: no complex menus, just simple language that takes shape instantly. What used to take hours in Photoshop can now be done with a single sentence. This is more than just convenient—it shows how quickly the boundaries between humans and machines are blurring.

But that's just the beginning: we'll look at how COMPUTERRL is taking digital assistants to a new level, why VibeVoice is shaking up the audio world, and what it means for Grok-2 to be released as an open model with weights. Each of these steps opens doors to a future in which creative and technical boundaries continue to crumble. So stay tuned—today's issue is full of insights that will change your view of AI.

In Today’s Issue:

Google's new AI lets you edit images just by talking to it
A new AI agent can now control your computer for you
VibeVoice can generate a 90-minute, multi-person podcast
Elon Musk just open-sourced his powerful Grok-2 AI model
And more AI goodness…

All the best,

Gemini 2.5 Flash Image released!

❝

The Takeaway

👉 Gemini 2.5 Flash Image enables image generation and editing using natural language - without UI fiddling.

👉 Developers can consistently place characters, merge images, and edit them precisely- with precise control.

👉 The platform is immediately usable via Gemini API, Google AI Studio, and Vertex AI -starting at around 3.9 cents per image.

👉 SynthID watermarks ensure transparency for AI-generated images—responsibility included.

Ever had an idea that you put into words—and it turned into an image? That's exactly what happens with Gemini 2.5 Flash Image. This new image model from Google—internally called “nano-banana”—understands your instructions naturally. You can merge multiple images into one, show the same character in different settings, and remove a stain on a T-shirt with a simple “get rid of it” prompt. The whole thing runs on the Gemini API, Google AI Studio, or Vertex AI and costs the equivalent of about 3.9 cents per image. The editing feature is undoubtedly state-of-the-art and could make Photoshop irrelevant in many areas.

This is a real leap forward for the AI community: developers are regaining creative control - not with complex tools, but through natural language. This encourages us to rethink interfaces- such as modular image editors or storytelling generators.

Why it matters: It makes AI image generation more accessible and interactive—almost like having a conversation with the machine. It balances creative freedom and technical control - ideal for developers who love fun and need clarity.

Sources:

🔗 https://developers.googleblog.com/en/introducing-gemini-2-5-flash-image/?utm_source=chatgpt.com

🔗 https://cloud.google.com/blog/products/ai-machine-learning/gemini-2-5-flash-image-on-vertex-ai

Hear from leaders at Anthropic, Rocket Money, and more at Pioneer

Pioneer is a summit for the brightest minds in AI customer service to connect, learn, and inspire one another, exploring the latest opportunities and challenges transforming service with AI Agents.

Hear directly from leaders at Anthropic, [solidcore], Rocket Money, and more about how their teams customize, test, and continuously improve Fin across every channel. You’ll take away proven best practices and practical playbooks you can put into action immediately.

See how today’s service leaders are cultivating smarter support systems, and why the future of customer service will never be the same.

In The News

Google Translate Gets an AI Upgrade

Google Translate is rolling out new AI-powered features, including real-time live conversation translation and personalized language learning practice sessions designed to help users master conversational skills.

Codex CLI Gets a Major Upgrade

Responding rapidly to user feedback, the OpenAI team has released a new version of the Codex CLI that adds powerful new features like web search and queued messages.

Learn More

Qwen Chat Now Reads Web Pages

In a useful new update, Qwen Chat can now directly read and process the content of any web page when you simply paste a link into the chat.

Watch Here

Graph of the Day

The new Gemini Flash image editing is by far the best model when it comes to editing images.

COMPUTERRL, a framework for autonomous desktop intelligence

arxiv.org

Researchers have achieved a breakthrough for AI agents that operate computers with COMPUTERRL. The system combines efficient machine commands (APIs) with human user interaction (GUI) for the first time, enabling it to train more autonomously than ever before. This innovation is highly relevant as it surpasses previous models and enables the most complex tasks across multiple programs. This paves the way for digital assistants that could independently take over entire workflows in the future, revolutionizing productivity.

VibeVoice: A Frontier Open-Source Text-to-Speech Model

Hugging Face

Microsoft's AI model VibeVoice 1.5B takes artificial speech generation to a new level. For the first time, freely available AI can generate up to 90 minutes of expressive conversations with up to four different speakers at a time. This enormous leap in length and complexity is unprecedented and surpasses the limits of previous systems. The technology is highly relevant as it could greatly simplify the production of podcasts or audiobooks and pave the way for significantly more lifelike digital assistants.

Grok 2 Open Sourced plus weights

Hugging Face

The general availability of Grok-2 marks a special moment in the development of open language models, because here a tech company like xAI is making the weights of a gigantic system accessible that otherwise only runs behind API walls. With around 270 billion parameters—effectively, around 115 billion are used per request because it is a mixture-of-experts model—it opens up an area that was previously reserved for research collaborations or internal teams.

Get Your AI Research Seen by 200,000+ People

Have groundbreaking AI research? We’re inviting researchers to submit their work to be featured in Superintelligence, the leading AI newsletter with 200k+ readers. If you’ve published a relevant paper on arXiv.org, email the link to [email protected] with the subject line “Research Submission”. If selected, we will contact you for a potential feature.

Question of the Day

Are you pleased that Grok 2 is now open source and free?

Tweet of the Day

— # (#)

Fact-based news without bias awaits. Make 1440 your choice today.

Overwhelmed by biased news? Cut through the clutter and get straight facts with your daily 1440 digest. From politics to sports, join millions who start their day informed.

How'd We Do?

Please let us know what you think! Feel free to reply to this email with suggestions (we read everything)!

Gemini 2.5 Flash Image released!

Gemini 2.5 Flash Image released!

Ad

Hear from leaders at Anthropic, Rocket Money, and more at Pioneer

In The News

Google Translate Gets an AI Upgrade

Codex CLI Gets a Major Upgrade

Qwen Chat Now Reads Web Pages

Graph of the Day

The new Gemini Flash image editing is by far the best model when it comes to editing images.

COMPUTERRL, a framework for autonomous desktop intelligence

VibeVoice: A Frontier Open-Source Text-to-Speech Model

Grok 2 Open Sourced plus weights

Get Your AI Research Seen by 200,000+ People

Question of the Day

Are you pleased that Grok 2 is now open source and free?

Tweet of the Day

Ad

Fact-based news without bias awaits. Make 1440 your choice today.

How'd We Do?

Reply

Keep Reading

Superintelligence.