AI Agents #1: Three Definitions

Sean Wu
5 min readMay 1, 2024

--

One of the buzzwords we hear nowadays is ‘AI agent.’ But what exactly is an AI agent? That depends on whom you ask because different people have different ideas, as evident from the links on the first Google results page for ‘what is AI agents.’ Many of these intriguing definitions are forward-looking and not entirely reliable for practical application today. However, certain viewpoints on AI agents could already have a significant impact.

In this article, we will distill the main variants of the definition for busy business owners or product managers to make it easy for you to make sense of what every one is saying. With right interpretation, you can start putting agent to work today, without waiting for the AGI whose arrival date remains unknown. While AI agents can be software-based or physical entities, our focus will be on the former.

The textbook definition

If you took a college-level artificial intelligence course, chances are you will know about ‘Agent,’ which is a core concept in the de facto textbook ‘Artificial Intelligence: A Modern Approach’ by Peter Norvig and Stuart Russell. In this classical treatment:

An agent is anything that can be viewed as perceiving its environment through sensors and acting upon that environment through effectors.
The key points about this definition are:

  1. An agent perceives its environment through sensors. For example, a human agent has eyes, ears, and other organs as sensors, while a robotic agent has cameras and infrared range finders.
  2. An agent acts upon its environment through effectors or actuators. For example, a human agent has hands, legs, and other body parts as effectors, while a robotic agent has motors and servos.
  3. The agent program is a function that maps the agent’s percepts (inputs from sensors) to actions (outputs to effectors). This agent program is what determines the agent’s intelligent behavior.

Agents can be categorized into different types based on their capabilities, such as reflex agents, model-based agents, goal-based agents, utility-based agents, and learning agents. Even though this definition comes straight from a text book and is quiet general, it is widely adopted by the industrial brands like AWS.

Autonomous Agent: A Gen AI perspective

There is no denial that the current AI agent wave is largely revived by the Gen AI. The most popular concepts here is autonomous agent, which uses LLM (large language model) as its core controller, or a powerful general problem solver.

In an excellent early writing by Lilian Weng from OpenAI: a software is called a LLM-powered autonomous agent system, if a LLM functions as the agent’s brain, complemented by several key components as follows:

  1. Planning: The agent breaks down large tasks into smaller, manageable subgoals, enabling efficient handling of complex tasks. At meta level, one of the decomposition can be self-criticism and self-reflection over past actions, so agents can learn from mistakes and refine them for future steps, thereby improving the quality of final results.
  2. Memory: both short-term with in-context learning directly with prompting and long-term by leveraging an external vector store and fast retrieval in retrieval augmented generation settings.
  3. Tool use: The agent learns to call external APIs for extra information that is missing from the model weights (often hard to change after pre-training), including current information, code execution capability, access to proprietary information sources and more.

Recently, Andrew Ng popularizes the multi-agents collaboration pattern. Given a complex task like writing software, a multi-agent approach would break down the task into subtasks to be executed by agent with different roles — such as a software engineer, product manager, designer, QA (quality assurance) engineer, and so on. This way we can potentially use smaller specialized LLMs to solve problems otherwise need large and expensive models.

This Gen AI perspective provides a white-box view of how autonomous agents should work and, as such, also hints at how they should be developed using LLMs. However, this definition is easier for people with a development background to master. For many product owners and product managers with a business background, it provides little insight into how autonomous agents can be used to improve the businesses’ bottom line. Furthermore, it is not clear whether the term “autonomous” should be understood as the developer experience (developer does not need to do much) or end user experience (user does not need to do much).

Be Practical: It is just a CUI application

For help business owners to get in on the agent games, there are also more practical, and use case oriented definitions. For example, Microsoft use this on their website:

AI agents are code or mechanisms which act to achieve predetermined goals. Examples of AI agents can be found in the code for things like chat bots, smart homes, and the programmatic trading software used in finance.

And AWS use this:

AI agents are autonomous intelligent systems performing specific tasks without human intervention. Organizations use AI agents to achieve specific goals and more efficient business outcomes. Business teams are more productive when they delegate repetitive tasks to AI agents.

In both cases, agents are simply defined as pieces of software with a conversational user interface. Compared to a more commonly used term like ‘chatbot,’ which is traditionally associated with serving users’ informational needs, agents are generally developed to expose APIs or tools so users can self-service. In another words, agent will update something in the digital environment it is in, such as booking a ticket by adding a row to the some database, for example. A related concept here is ‘copilot,’ which also emphasizes tool usage. However, instead of integrating with backend APIs as agents typically do, the copilot normally integrates with frontend APIs to help users navigate complex GUI applications more easily. Of course, sometime copilot is also used under soft use cases where is problem itself is open-ended or the correctness is not required as human will be take responsibility of make decisions.

Parting Words

In Sequoia Capital’s AI Ascent 2024 opening remarks, Pat Grady made an interesting statement: ‘Because of the ability to interact with users in a human-like manner, one of the significant opportunities for AI is to replace services with software.’ Indeed, if software can consistently deliver the same top-notch user experience 24x7, no business can resist that. However, it is suggested that tomorrow’s businesses will reimagine the user experience with AI, via Chat UI in form of agent, or software with conversational user interface.

This new software will allow users to get what they want on their terms, so they don’t need to learn how to navigate your website or app. More importantly, the service can be easily tailored to fit each individual situation separately for the ultimate user experience. Time to personalize your service with agents.

Reference:

  1. https://lilianweng.github.io/posts/2023-06-23-agent/

--

--

No responses yet