During an event held in San Francisco in November, Sam Altman, the CEO of the leading artificial intelligence firm OpenAI, was posed a question regarding the unexpected developments the field might unveil in 2024. Without hesitation, he stated that online chatbots, including OpenAI’s ChatGPT, are poised to experience “a leap forward that no one expected.” His statement was met with agreement from James Manyika, a Google executive, who echoed, “Plus one to that.”
The A.I. landscape in the coming year is set to be characterized by an astonishingly rapid evolution of technology, where advancements will build upon one another. This will enable A.I. to create new forms of media, emulate human reasoning more effectively, and penetrate physical environments through a new generation of robots.
In the months ahead, we can anticipate A.I.-driven image generators, such as DALL-E and Midjourney, not only producing still images but also generating videos almost instantaneously. Furthermore, these tools will progressively integrate with chatbots like ChatGPT, leading to a significant expansion of their capabilities beyond mere digital text. This integration will allow chatbots to manage various types of content, including photos, videos, diagrams, charts, and additional media formats. As a result, chatbots will demonstrate behavior that closely resembles human reasoning, addressing increasingly intricate challenges in domains such as mathematics and science. As this technology transitions into the realm of robotics, it will also tackle real-world problems.
Many of these breakthroughs are already taking shape within premier research laboratories and tech products. However, 2024 is expected to witness a dramatic enhancement in the power of these tools, making them accessible to a much broader audience. David Luan, the CEO of Adept, an A.I. startup, remarked, “The rapid progress of A.I. will continue; it is inevitable.”
OpenAI, Google, and other tech giants are advancing A.I. at a pace that surpasses other technologies, primarily due to the architecture of the underlying systems. Unlike traditional software applications, which are painstakingly crafted by engineers line by line, A.I. development is expedited through the use of neural networks—mathematical frameworks capable of learning skills by examining vast amounts of digital data. By identifying patterns in data sources such as Wikipedia articles, books, and digital content from the internet, these neural networks can autonomously generate text.
This year, tech companies plan to introduce A.I. systems to an unprecedented volume of data—including images, sounds, and extensive text—far beyond what humans can comprehend. As these systems become adept at understanding the interconnections between various data types, they will be equipped to solve increasingly complex problems, paving the way for their application in the physical world. (It is noteworthy that The New York Times recently initiated a lawsuit against OpenAI and Microsoft for copyright infringement related to A.I. systems.)
However, it is essential to clarify that A.I. is unlikely to replicate the complexities of the human brain in the near future. Although A.I. companies and innovators aspire to develop what they term “artificial general intelligence”—a machine capable of performing any cognitive task that a human can do—this remains a formidable challenge. Despite its rapid advancements, A.I. is still in its nascent stages.
A Glimpse into A.I.’s Transformative Changes Ahead
Here’s an overview of how A.I. is expected to evolve in the coming year, beginning with the most immediate advancements, which will serve as a foundation for further progress in its capabilities.
Instant Videos
Up until now, A.I.-powered applications have primarily generated text and still images in response to user prompts. For example, DALL-E can produce photorealistic images within seconds based on requests like “a rhino diving off the Golden Gate Bridge.” However, this year, companies such as OpenAI, Google, Meta, and New York-based Runway are anticipated to introduce image generators capable of creating videos as well. Prototypes of tools that can quickly generate videos from brief text prompts are already in existence, and tech firms are likely to integrate the capabilities of image and video generators into chatbots, significantly enhancing their functionality.
‘Multimodal’ Chatbots
Chatbots and image generators, initially designed as distinct tools, are gradually merging into more comprehensive systems. When OpenAI launched a new iteration of ChatGPT last year, the chatbot gained the ability to generate both images and text. A.I. companies are now focusing on developing “multimodal” systems, which can process and generate multiple types of media. These systems learn by analyzing an array of inputs, including photos, text, and potentially other formats like diagrams, charts, sounds, and videos, enabling them to create their own diverse content.
Moreover, because these systems are learning the relationships between different media types, they will be able to interpret one form of media and respond with another. For instance, a user may input an image into a chatbot, and it could respond with relevant text. “The technology will get smarter and more useful,” stated Ahmad Al-Dahle, who leads the generative A.I. division at Meta. “It will be capable of performing a wider array of tasks.”
While multimodal chatbots will undoubtedly have their share of inaccuracies—much like their text-only counterparts—tech companies are diligently working to minimize errors as they strive to construct chatbots that can reason more like humans.
Enhanced ‘Reasoning’ Abilities
When Mr. Altman refers to A.I. making significant strides, he is alluding to chatbots that will exhibit improved reasoning capabilities, allowing them to tackle more complex tasks such as solving intricate mathematical problems and generating detailed computer code. The goal is to develop systems that can logically and methodically resolve issues through a series of sequential steps, each building upon the previous one, akin to human reasoning in certain scenarios.
Leading experts remain divided on whether chatbots can genuinely reason in this manner. Some contend that these systems merely mimic reasoning by reflecting patterns found in internet data. Nonetheless, OpenAI and other organizations are focused on creating systems that can reliably tackle complex inquiries in subjects like mathematics, programming, physics, and other scientific fields. “As systems become more dependable, their popularity will surge,” remarked Nick Frosst, a former Google researcher who now helps lead Cohere, an A.I. startup.
If chatbots enhance their reasoning capabilities, they could evolve into what are termed “A.I. agents.”
‘A.I. Agents’ in Action
As companies train A.I. systems to navigate complex problems step by step, they also enhance chatbots’ abilities to utilize software applications and websites on behalf of users. Researchers are essentially transforming chatbots into a new class of autonomous systems known as A.I. agents. This means that chatbots could manage various software applications, websites, and online tools, such as spreadsheets, calendars, and travel platforms, allowing users to delegate mundane office tasks to them. However, this development raises concerns about job displacement.
Currently, chatbots can perform basic tasks like scheduling meetings, editing documents, analyzing data, and generating bar charts. Nevertheless, these systems do not always function as effectively as desired, and they often struggle with more complex tasks. This year, A.I. companies are expected to introduce more reliable agents capable of handling a broader range of responsibilities. “You should be able to delegate any tedious, day-to-day computer work to an agent,” Mr. Luan commented.
Such tasks might encompass managing expenses in applications like QuickBooks or recording vacation days in software like Workday. In the long-term, the potential of A.I. agents will extend beyond software and digital services, paving the way for robotics integration.
Advancements in Robotics
Historically, robots were programmed to execute repetitive tasks, such as picking up boxes of uniform size and shape. However, utilizing the same technology that powers chatbots, researchers are now equipping robots with the ability to tackle more intricate challenges, including those they’ve never encountered before. Just as chatbots learn to anticipate the next word in a sentence through extensive exposure to digital text, robots can learn to predict physical interactions by analyzing countless videos of objects being manipulated, lifted, and moved.
“These technologies can absorb tremendous amounts of data. As they do, they learn about the world, physics, and how to interact with various objects,” explained Peter Chen, a former OpenAI researcher now leading the robotics startup Covariant. This year, A.I. will significantly enhance robots operating behind the scenes, such as robotic arms that fold shirts in laundromats or sort items in warehouses. Tech leaders like Elon Musk are also endeavoring to introduce humanoid robots into everyday home environments.
Leave a Reply