Previous Editions

Written By

Ian Eisenberg
Head of AI Governance Research, Credo AI

  • Web search tool use, along with the related development of large language model (LLM) -based agents, has enormous consequences for AI capability improvements.
  • ChatGPT’s web-enabled search capability is a step towards curbing the model’s tendency to hallucinate and extending its knowledge base.
  • Like other tools, web search makes ChatGPT more powerful, but it’s incumbent on us to ensure that power is directed in a positive direction.

ChatGPT, the application built on models such as GPT4, now features image processing, voice recognition and web search. While image processing demonstrates the power of multimodality and voice recognition teases the next generation of human-AI interactivity, web search seems more commonplace.

While less novel, web search is emblematic of an impactful trend in AI system development: tool use. Tool use and the related development of large language model (LLM) -based agents have enormous consequences for AI capability improvements. They are foundational to what OpenAI’s Andrey Karpathy dubs a new operating system, contributing to an evolving paradigm shift in engineering that unlocks a new era of applications.

ChatGPT’s web-enabled search

ChatGPT’s web-enabled search capability is a step towards curbing the model’s tendency to hallucinate and extending its knowledge base. Hallucinations pose a challenge when accurate and up-to-date information is paramount. Web search lets ChatGPT access current data, seeding its ‘context window’ with accurate information, thereby reducing hallucinations.

ChatGPT can fetch the most recent and relevant information from the web when a query is posed instead of relying solely on its pre-training data. This feature doesn’t just enhance the accuracy; it also enriches the depth and breadth of information ChatGPT can provide, paving the way for more informed and precise interactions.

The underlying mechanism, which instructs ChatGPT when to initiate a web search, is a blend of programmed protocols and heuristic evaluation. While it can be directed by the user to perform a web search, it also has built-in heuristics to gauge when a web search might be beneficial. For instance, when confronted with a query about recent events or data-specific inquiries, ChatGPT might automatically trigger a web search to fill the gap in its own knowledge.

Broadening horizons: tool-use and agents

While powerful, web search is only one example of AI tool use, which will have far-reaching implications that complement and extend the capability developments driven by the sophistication of the models themselves.

Large language models, such as ChatGPT, now interact with external tools, evolving into reasoning engines central to capable AI systems. These interactions are orchestrated through APIs, which allow the large language models to issue commands, receive responses and perform a myriad of functions. Some tools are highly specific, while others, such as web search, have huge impacts on the capabilities of the AI system.

For instance, sophisticated memory systems give an AI system the ability to use external data sources and develop its own repository of memories to draw on. Code evaluation is perhaps the most general purpose of the tool and it has been incorporated into powerful AI systems, such as OpenAI’s Code Interpreter (or its open-source analogue).

With Code Interpreter, we see the next level of sophistication in AI tool use. Code Interpreter is more than just a tool – it’s a kind of agent that can pursue a higher-order goal (such as analyzing a dataset) and iteratively reason and take actions (via tools) until that goal is accomplished.

While Code Interpreter is advertised as a debugger and data analysis tool, ‘agents’ can be directed towards various tasks, depending on the tools they can access. Executing arbitrary code is incredibly general, but more use-case-specific toolsets may be preferred depending on the application. As already mentioned, with a web browsing tool, a large language model, such as ChatGPT, can autonomously fetch, filter and synthesize information from the web in response to a user’s query.

In the context of agents, the action loop need not end with simple queries. Instead, longer-term research goals can be pursued, unlocking the reality of AI research assistants. Deploying other instances of AI agents is a powerful tool that can be used, allowing one AI agent to act as a kind of manager over subtasks, integrating the work of many other intelligent agents (an early prototype of this idea was popularized by AutoGPT).

While there is some sense in which the word ‘agent’ is anthropomorphizing too much (making us ask, do these systems have goals like humans or even reinforcement-learning systems?), the term is appropriate in the sense that these systems act as if they decompose higher-order goals into subgoals and take relevant actions to accomplish them.

This has significant implications for the future of AI capabilities. New tools can be developed faster and in a more democratized fashion than frontier AI systems, yet they may have a commensurate effect on the generality and sophistication of these systems. Artificial general intelligence (AGI) is a nebulous term, but human-comparable performance on diverse tasks may be unlocked by systems with similar capabilities to GPT4 enmeshed in more sophisticated agentic AI systems.

Agents and human oversight

As these AI agents burgeon in capabilities, evolving from mere tools to entities capable of pursuing higher-order goals autonomously, human oversight becomes complex but critical. Human in (or over) the loop refers to AI-human-systems where humans can understand, monitor and direct AI systems as they take actions. Yet, as AI agents become adept at handling more abstract tasks, the granularity and frequency of human intervention is poised to change.

As we have discussed, AI agents can pursue ever more abstract tasks, reducing the practical necessity of human oversight and blurring the concept of a single decision. Evaluating a resume is one kind of decision; evaluating a stack of resumes and surfacing the top candidates is another, and deciding on the evaluation criteria before filtering the candidate pool is yet another.

The landscape of human oversight will morph as AI agents climb the ladder of abstraction in task handling. With every rung, they reduce the practical necessity for human oversight, potentially redefining the concept of a single decision. Market incentives may further accelerate this shift, leading to less oversight over time as we entrust AI systems with more abstract objectives. Where we believe this would be detrimental, whether due to worries of AI bias, value alignment or robustness, we must ensure that human oversight is easily operationalized and properly incentivized.

Web search, like other tools, makes ChatGPT more powerful. It’s incumbent on us to make sure that power is directed in a positive direction.

License and Republishing

World Economic Forum articles may be republished in accordance with the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International Public License, and in accordance with our Terms of Use.

The views expressed in this article are those of the author alone and not the World Economic Forum.