Google Unveils Gemini: The Latest Leap in AI Technology

By Aya Mohammed On Dec 11, 2023

Google, in a surprising move last Wednesday, unveiled its most advanced artificial intelligence model to date, named Gemini, this release came amidst widespread speculation about the company postponing its launch until the following year, Google confirmed that Gemini has successfully outperformed OpenAI’s GPT3.5 model and is a strong competitor to the latest GPT-4.

What is Google Gemini?

Gemini is Google’s latest and most powerful AI model, capable of understanding not just texts but also images, videos, and audio, as a multimodal model, Gemini excels in complex tasks in mathematics, physics, and other fields, as well as understanding and generating high-quality programming code in various languages.

Google’s official blog post states that Gemini is designed as a multimodal model, surpassing current AI models that typically handle only one type of user prompt, such as images or text exclusively, Gemini is capable of dealing with multiple types of inputs including text, images, audio, videos, and coding in different languages, the goal behind developing Gemini is to create an AI that can accurately solve problems, offer advice, and answer questions across various fields, from everyday matters to scientific domains.

How Did Google Develop Gemini?

Google describes the Gemini model as flexible, able to function everywhere from Google data centers to smartphones, to achieve this scalability, the company has introduced it in three versions with varying capabilities: Nano, Pro, and Ultra.

Gemini Nano Model: The Gemini Nano model was designed for use in smartphones, with the Google Pixel 8 phones being the first to feature this new model, it’s engineered for tasks requiring quick AI processing within the phone itself without needing external server connections, such as suggesting replies in chat applications or summarizing texts.

The Gemini Nano model relies on Google’s latest Tensor G3 processor chip, this model supports many features launched by Google in the Pixel phones last October, such as the ‘Summarize in Recorder’ feature, which helps summarize recorded audio clips in the Recorder app, and the creation of smart replies when using Google’s Gboard keyboard app, this feature will initially be available in WhatsApp, with plans to extend it to more messaging applications by 2024.

Notably, the Gemini Nano model’s reliance on the neural processing unit within the Tensor G3 chip will ensure the privacy of Pixel phone users’ data, as it will be processed locally on their devices, without leaving any information on Google’s servers, this also ensures the speedy performance of AI features without the need for an internet connection.

By 2024, Google Assistant on Pixel phones will incorporate the advanced capabilities of the Bard robot, but this will be exclusive to Google Pixel phones.

Gemini Pro Model: Google developed the Gemini Pro model to operate in its data centers, powering the latest version of its Bard robot, it’s designed to support advanced capabilities in text analysis and generation, coding, and planning, as well as handling various input forms such as texts, images, videos, and audio simultaneously.

Google’s official blog stated that the Gemini Pro model would initially assist Bard in quickly processing textual requests, the model will be rolled out to Bard in two phases:

The first phase will begin with a specially modified version of Gemini Pro in English, available in 170 countries worldwide, with further updates to cover more countries and support more natural languages in the near future.
The second phase will start early next year with the launch of Bard Advanced, the most sophisticated version of Bard, which will initially rely on the Gemini Ultra model, the most advanced of the three Gemini versions.

The Gemini Pro model outperformed the GPT3.5 model in 6 out of 8 tests conducted by Google before unveiling its new model, this includes excelling in the MMLU test, a leading standard for measuring the capability of large linguistic AI models to perform multiple text analysis tasks simultaneously, additionally, the model excelled in the GSM8K standard, which tests the ability of intelligent models to handle mathematical equations.

Gemini Ultra Model: Gemini Ultra is the most advanced model in terms of its ability to perform complex tasks, Google confirmed that it surpassed 30 out of 32 benchmarks of efficiency for large linguistic models (LLM), relied upon in academic research and development.

Gemini Ultra is the first model to outperform human experts by 90% in the MMLU standard, which involves a range of 57 complex cognitive topics varying from mathematics, physics, history, law, and medicine, both in general knowledge and problem-solving ability.

Google stated that focusing on the MMLU standard in training Gemini Ultra enabled it to use its logical capabilities to think carefully before answering difficult questions, thus enhancing its skills in providing accurate information, this moves away from offering answers based on first impressions of the posed questions.

Google also highlighted the model’s extraordinary capabilities in handling different inputs, the model succeeded in recognizing the content of images, including the texts on them, without relying on Optical Character Recognition (OCR) systems, this surpasses the capabilities of most modern intelligent models like OpenAI’s GPT-4V, the model also excelled in dealing with mathematical operations, both in solving them and verifying the accuracy of the solutions through image analysis.

Regarding video content, Google’s test results showed that the Gemini Ultra model could produce outstanding performance in generating text for spoken words in English-language video clips and answering questions about specific video content.

In programming, the new model was able to understand, explain, and create high-quality programming instructions in the most common programming languages, such as Python, Java, C++, and Go.

Google said that its ability to work across different programming languages and to think through complex information makes it one of the leading fundamental models in global programming, Google’s tests proved that the Gemini Ultra model excelled in several programming standards, including HumanEval, an important industrial standard for assessing performance in programming tasks, two years ago, Google introduced AlphaCode, the first system to use AI for generating programming codes, which achieved competitive performance in programming competitions.

On Wednesday, Google announced that it used a specialized version of Gemini to create the second generation of this system, AlphaCode 2, which is more advanced in generating programming instructions, it excels in solving competitive programming problems that go beyond generating codes to include complex mathematics and theoretical computer science.

How Can You Experience the New Gemini Model?

Gemini is now available in several Google products, the Nano version is in Google Pixel 8 phones, and the Pro version powers Bard in 170 countries worldwide, developers and corporate clients will have access to the Gemini Pro model through the Gemini API in Google AI Studio and Google Cloud Vertex AI from December 13, 2023, Android developers can access Gemini Nano via the Android AICore service in Android 14, available as a preview.