OpenAI releases “emotional” AI chatbot update

14 May 2024

Technology, Media & Telecoms

Digital, Brand & Creative Strategy

artificial intelligence

News

Can you imagine laughing, crying, or even flirting with your smartphone - let alone your device being able to detect these emotions and respond using a ‘personality’ and a ‘voice’?

Yesterday, OpenAI announced its latest AI model, GPT-4o, powered by ChatGPT, with these capabilities integrated, alongside the ability to look, listen and talk. This will be available both via an app and on desktop, from this coming Monday, free of charge.

The “o” in GPT-4o stands for “omni”, referring to the model’s ability to handle text, speech, and video. The model builds on the capabilities of OpenAI’s GPT-4, the company’s language model that generates text from text and visual information. This model rolls together its similar products into a single system.

The most significant update from the announcement was that users will be able to talk with ChatGPT in real time. Whilst ChatGPT has long offered a voice mode, transcribing the chatbot’s responses using a text-to-speech model, ChatGPT-4o is a vast improvement.

It can act as a real-time translator, listening to voices and communicating to a preferred language. And if you think ChatGPT-4o is blathering on or not answering the question, users can question and interrupt whilst it’s answering. The ‘real-time’ responsiveness can also extend to ChatGPT-4o sharing responses in a range of emotions, including singing.

ChatGPT-4o also has vision capabilities, integrated with a phone’s camera. It can process a photo or a live video feed and can answer questions like ‘what brand of shoes is this person wearing?’, ‘how fast are those ducks moving?’ or ‘who are the cast of this movie on this poster?’.

The app is part of an effort to combine chatbots with voice assistants such as Apple’s Siri and Google Assistant. Notably, the company is framing this announcement as part of the ‘future of interaction between ourselves and machines.’ This represents the next step towards OpenAI’s vision for a world in which users have hyper-personalised interactions with their device.

Many questions and hurdles, which were not covered in its chic live-streamed press conference, remain on this technology. Chatbots are trained on data from the internet and are therefore prone to mistakes. Elsewhere, they make up information completely, known as ‘hallucinations’. These mistakes and hallucinations are therefore almost certain to seep into these new types of interactions with our machines.

Another huge question remains: what will be its environmental impact? Though the technology has prompted some to see the tool as ‘magic’, this speed and complexity relies on energy and water – and lots of it. With ChatGPT 3.5, researchers found that 10 – 15 prompts used around 0.5 litres of water, in order to cool the servers. ChatGPT also already consumes over half a million kilowatts of electricity each day, equating to the same average electricity as around 180,000 homes in the US each day.

By rolling out ChatGPT-4o in beta but with much fanfare, OpenAI is implying that the tool is largely ready for public use. "GPT-4o has also undergone extensive external red teaming with 70+ external experts in domains such as social psychology, bias and fairness, and misinformation to identify risks that are introduced or amplified by the newly added modalities," says OpenAI. New risks, the company says, will continue to be mitigated against as they are discovered.

The release prompts comparisons to the sci-fi film ‘Her’, in which Joaquin Phoenix plays an introverted writer who buys an artificial intelligence system to help him write, which he subsequently falls in love with. With OpenAI’s latest iteration, we are likely to see some use the technology to find ‘connection’, opting for machine over human, with unexpected results.