With its uncanny ability to hold a conversation, answer questions, and write coherent prose, poetry, and code, the chatbot ChatGPT has forced many people to rethink the potential of artificial intelligence.

The startup that made ChatGPT, OpenAI, today announced a much-anticipated new version of the AI model at its core. 

The new algorithm, called GPT-4, follows GPT-3, a groundbreaking text-generation model that OpenAI announced in 2020, which was later adapted to create ChatGPT last year. 

The new model scores more highly on a range of tests designed to measure intelligence and knowledge in humans and machines, OpenAI says. It also makes fewer blunders and can respond to images as well as text. 

However, GPT-4 suffers from the same problems that have bedeviled ChatGPT and cause some AI experts to be skeptical of its usefulness—including tendencies to “hallucinate” incorrect information, exhibit problematic social biases, and misbehave or assume disturbing personas when given an “adversarial” prompt.

“While they’ve made a lot of progress, it’s clearly not trustworthy,” says Oren Etzioni, a professor emeritus at the University of Washington and the founding CEO of the Allen Institute for AI. “It’s going to be a long time before you want any GPT to run your nuclear power plant.”

OpenAI provided several demos and data from benchmarking tests to show GPT-4’s capabilities. The new model can not only beat the passing score on the Uniform Bar Examination, which is used to qualify lawyers in many US states, but it got a score in the top 10 percent of those of humans.

It also scores more highly than GPT-3 on other exams designed to test knowledge and reasoning, in subjects including biology, art history, and calculus. And it gets better marks than any other AI language model on tests designed by computer scientists to gauge progress in such algorithms. “In some ways it’s more of the same,” Etzioni says. “But it’s more of the same in an absolutely mind-blowing series of advances.”

GPT-4 can also perform neat tricks seen before from GPT-3 and ChatGPT, like summarizing and suggesting edits to pieces of text. It can also do things its predecessors could not, including acting as a Socratic tutor that helps guide students toward correct answers and discussing the contents of photographs. For example, if provided a photo of ingredients on a kitchen counter, GPT-4 can suggest an appropriate recipe. If provided with a chart, it can explain the conclusions that can be drawn from it. 

“It definitely seems to have gained some abilities,” says Vincent Conitzer, a professor at CMU who specializes in AI and who has begun experimenting with the new language model. But he says it still makes errors, such as suggesting nonsensical directions or presenting fake mathematical proofs. 

ChatGPT caught the public’s attention with a stunning ability to tackle many complex questions and tasks via an easy-to-use conversational interface. The chatbot does not understand the world as humans do and just responds with words it statistically predicts should follow a question.