Google Offers Gemini AI Model To Challenge GPT-4

Google’s answer to GPT-4 has been unveiled this week, with the arrival of the large language model called Gemini

Alphabet’s Google continues to expand its AI capabilities with the launch of the Gemini large language model (LLM).

The arrival of of Gemini was revealed in a blog post on Wednesday by CEO Sundar Pichai and Demis Hassabis, CEO and co-founder of Google DeepMind.

Pichai wrote that Gemini is Google’s most capable and general model yet, with state-of-the-art performance across many leading benchmarks.

DeepMind co-founder and chief executive Demis Hassabis. Image credit: DeepMind

Google Gemini

“Our first version, Gemini 1.0, is optimised for different sizes: Ultra, Pro and Nano,” wrote Pichai. “These are the first models of the Gemini era and the first realisation of the vision we had when we formed Google DeepMind earlier this year.”

In April Alphabet had merged its internal AI research team (Google Brain) with UK-based DeepMind, which it acquired in 2014 for $500m.

“This new era of models represents one of the biggest science and engineering efforts we’ve undertaken as a company,” wrote Pichai about Gemini. “I’m genuinely excited for what’s ahead, and for the opportunities Gemini will unlock for people everywhere.”

According to both Pichai and Demis Hassabis, Gemini is huge leap forward in an AI model that will ultimately affect nearly all of Google’s products going forward.

“Gemini is the result of large-scale collaborative efforts by teams across Google, including our colleagues at Google Research,” wrote Hassabis. “It was built from the ground up to be multimodal, which means it can generalise and seamlessly understand, operate across and combine different types of information including text, code, audio, image and video.”

Three versions

Gemini 1.0 has been optimised into three different sizes:

  1. Gemini Ultra — the largest and most capable model for highly complex tasks.
  2. Gemini Pro — the best model for scaling across a wide range of tasks.
  3. Gemini Nano — the most efficient model for on-device tasks.

“We’ve been rigorously testing our Gemini models and evaluating their performance on a wide variety of tasks,” wrote Hassabis. “From natural image, audio and video understanding to mathematical reasoning, Gemini Ultra’s performance exceeds current state-of-the-art results on 30 of the 32 widely-used academic benchmarks used in large language model (LLM) research and development.”

“Gemini 1.0’s sophisticated multimodal reasoning capabilities can help make sense of complex written and visual information,” wrote Hassabis. “This makes it uniquely skilled at uncovering knowledge that can be difficult to discern amid vast amounts of data.”

“Its remarkable ability to extract insights from hundreds of thousands of documents through reading, filtering and understanding information will help deliver new breakthroughs at digital speeds in many fields from science to finance,” said Hassabis.

Gemini availability

And Google has insisted that Gemini has been built with “responsibility and safety at its core” and has “the most comprehensive safety evaluations of any Google AI model to date, including for bias and toxicity.”

Gemini 1.0 is now rolling out across a range of products and platforms, and starting now, Google’s Bard chatbot will use a fine-tuned version of Gemini Pro for more advanced reasoning, planning, understanding.

Starting on 13 December, developers and enterprise customers can access Gemini Pro via the Gemini API in Google AI Studio or Google Cloud Vertex AI.

Developer tool

The availability of Google’s Gemini and its potential for app development purposes, was welcomed by Wyatt Oren, director of sales for telehealth at development specialist Agora.

“Gemini isn’t just a step forward; it is a leap into a new realm of AI capabilities,” said Oren. “By making it accessible to developers through Pro and Nano, Google is empowering unprecedented innovation. The API offers incredible benefits for rapid prototyping and app development, especially when it comes to handling multimedia content like images, videos, and audio.”

“For independent developers or small teams, the intuitive interface and straightforward API key access, also provide an ideal environment to experiment with Gemini’s advanced features without hefty initial investment,” said Oren.

“I see Gemini as a valuable opportunity to innovate and create applications that are not only functionally superior but also more aligned with the evolving needs and expectations of users, making every interaction more meaningful and impactful,” said Oren.