ChatGPT now supports voice chats and image-based queries

By Kamel Ahmed On Sep 26, 2023

ChatGPT is receiving some important updates that will enable the AI chat program to handle voice commands and image-based queries.

ChatGPT now supports voice chats and image-based queries - image source: gayglobe

Users on Android and iOS will be able to engage in voice conversations with ChatGPT and input images across all major platforms.

Initially, these features will be available to Plus and Enterprise users, with others gaining access to image-based features later.

To experience this, you’ll need to subscribe to voice conversations in the ChatGPT app (go to settings, then new features).

By clicking the microphone button, you can choose from five different voices.

OpenAI mentions that voice conversations are powered by a new model that can generate “human-like voice from text alone in a few seconds of sample speech.”

The five voices were created with the assistance of professional actors.

On the flip side, the company’s speech recognition system, Whisper, converts spoken words from the user into text.

OpenAI also states that you can, for instance, show a picture of a grill on the AI chat program and inquire why it’s not working or have it help you plan a meal based on a snapshot of what’s in your fridge, or even ask it to solve a math problem you need.

OpenAI uses GPT-3.5 and GPT-4 to power image recognition features, To use ChatGPT’s image-based functions, click the image button (you’ll need to tap the collect button first on iOS or Android) to take a picture or select an existing one from your device.

You can ask ChatGPT about multiple images and use the drawing tool to focus on a specific part of the image.

In a blog post announcing the updates, OpenAI mentions the possibility of harm.

Bad actors might potentially mimic the voices of public figures (and ordinary individuals) and engage in fraudulent activities.

That’s why OpenAI is focusing on voice conversations with this technology and working with selected partners in other limited use cases (more on that later).

As for images, OpenAI has collaborated with Be My Eyes, a free app that allows blind and visually impaired individuals to better understand their surroundings with the help of volunteers who join video calls with them.

The company also notes how ChatGPT analyzes and provides direct data about individuals appearing in images, “as ChatGPT is not always accurate, and these systems must respect individuals’ privacy.”

The company has published a research paper on safety properties for the image-based function, which they’ve named GPT-4 Vision.

ChatGPT is more proficient at understanding English text in images compared to other languages.

OpenAI states that the chatbot’s performance is “poor” in other languages currently, especially those using non-Roman scripts.

Therefore, users who are not English speakers are advised to refrain from using ChatGPT for dealing with text in images at this time.

Source