Gemini 1.5 Pro 2M context window, code execution capabilities, and Gemma 2 are available today

JUN 27, 2024
Logan Kilpatrick Senior Product Manager Gemini API and Google AI Studio
Shrestha Basu Mallick Group Product Manager Gemini API
Ronen Kofman Group Product Manager Gemini API

Today, we are giving developers access to the 2 million context window for Gemini 1.5 Pro, code execution capabilities in the Gemini API, and adding Gemma 2 in Google AI Studio.


Long context and context caching

At I/O, we announced the longest ever context window of 2 million tokens in Gemini 1.5 Pro behind a waitlist. Today, we’re opening up access to the 2 million token context window on Gemini 1.5 Pro for all developers.

As the context window grows, so does the potential for input cost. To help developers reduce costs for tasks that use the same tokens across multiple prompts, we’ve launched context caching in the Gemini API for both Gemini 1.5 Pro and 1.5 Flash.


Code execution

LLMs have historically struggled with math or data reasoning problems. Generating and executing code that can reason through such problems helps with accuracy. To unlock these capabilities for developers, we have enabled code execution for both Gemini 1.5 Pro and 1.5 Flash. Once turned on, the code-execution feature can be dynamically leveraged by the model to generate and run Python code and learn iteratively from the results until it gets to a desired final output. The execution sandbox is not connected to the internet, comes standard with a few numerical libraries, and developers are simply billed based on the output tokens from the model.

This is our first step forward with code execution as a model capability and it’s available today via the Gemini API and in Google AI Studio under “advanced settings”.


Gemma 2 in Google AI Studio

We want to make AI accessible to all developers, whether you’re looking to integrate our Gemini models via an API key or using our open models like Gemma 2. To help developers get hands-on with the Gemma 2 model, we’re making it available in Google AI Studio for experimentation.


Gemini 1.5 Flash in production

Gemini 1.5 Flash was built to address developers’ top request for speed and affordability. We continue to be excited by how developers are innovating with Gemini 1.5 Flash and using the model in production:

  • Envision empowers people who are blind or have low vision to better understand their immediate environment through an app or smart glasses and ask specific questions. Leveraging the speed of Gemini 1.5 Flash, Envision’s users are able to get real time descriptions of their surroundings, which is critical to their experience navigating the world.

  • Plural, an automated policy analysis and monitoring platform, uses Gemini 1.5 Flash to summarize and reason with complex legislation documents for NGOs and policy-interested citizens, so they can have an impact on how bills are passed.

  • Dot, an AI designed to grow with a user and become increasingly personalized over time, leveraged Gemini 1.5 Flash for a number of information compression tasks that are key to their agentic long-term memory system. For Dot, 1.5 Flash performs similarly to more expensive models at under one-tenth the cost for tasks like summarization, filtering & re-ranking.

In line with our previous announcement last month, we're working hard to make tuning for Gemini 1.5 Flash available to all developers, to enable new use cases, additional production robustness and higher reliability. Text tuning in 1.5 Flash is now ready for red-teaming and will be rolling out gradually to developers starting today. All developers will be able to access Gemini 1.5 Flash tuning via the Gemini API and in Google AI Studio by mid-July.


We are excited to see how you use these new features, you can join the conversation on our developer forum. If you’re an enterprise developer, see how we’re making Vertex AI the most enterprise-ready genAI platform.