Different kinds of CUI Apps

Sean Wu
6 min readDec 10, 2023

--

As far as many people know, the applications they use primarily rely on a graphical user interface (GUI), which always involves a display for output and various input methods, including touch, keyboard, and mouse. Are these the best ways to interact with computers?

The introduction of ChatGPT has greatly heightened interest in applications featuring conversational user interfaces (CUI). Under this paradigm, to get the desired service, users can just say what they want, instead of first learning how to use these UI elements to express their intention. The development of CUI apps, however, is not as easy as developing their GUI counterparts. One reason is that the lack of well-defined terminology, which has led to the casual use of terms without precise definitions, causing potential confusion.

In this blog post, our aim is to establish a classification system to bring clarity to CUI apps. Instead of categorizing these applications based on implementation methods or the specific industry vertical they serve — both of which may evolve over time — we will focus on fundamental differences from a horizontal perspective. The goal is to help you decide which kind of CUI app is best for your business use cases.

Which side is it on? User or Business

While any successful app needs to address both user needs and business goals, the first question you need to ask when you build a CUI app is still, ‘Who is this for?’ A CUI app can be developed on behalf of users to interact with many businesses; this is commonly known as personal assistants or virtual assistants. Alternatively, it can be deployed on behalf of a business to serve many users, in the form of chatbots, (virtual) agents, or copilots.

If you want to become a gatekeeper like Google, Apple, Meta, or TripAdvisor, you want to develop a CUI app acting as personal assistant to takes care of user needs across multiple vendors. In this case, you not only need to understand what users want but also must figure out which vendors provide the best overall experience — balancing user needs with your gatekeeper business objectives. This involves considering many factors, including vendor reputations, the quality of the products and services they provide, and more, using recommendation systems.

For all other businesses, you should think about building chatbots, agents or copilots since your main focus will be on addressing user needs with the services offered by your organization. Since the products or services your organization offers are more or less fixed and does not overlap for most part, you will have simpler problems to deal with. These CUP apps can be built both with traditional software engineering and prompt engineering, and they can be combined to form a seamless experience using a dual process approach.

What is the interaction mode? Voice or Text

Another main classification is based on the interaction mode, specifically whether voice is employed. In fact, people often use the term ‘voicebot’ to describe Conversational User Interface (CUI) applications with a voice interface, exemplified by experiences like Alexa. Conversely, the term ‘chatbot’ commonly refers to CUI apps with a text-based interface. From an implementation perspective, the two are not fundamentally different. In fact, voicebots are typically a form of chatbot enhanced with Automatic Speech Recognition (ASR) and Text-to-Speech (TTS) capabilities. Typically, both voice and text interfaces share the same natural language perception layer encompassing understanding and generation, as well as the interaction logic.

Supporting a voice interface, however, does introduce some serious technical challenges, such as pronunciation variance due to accents and dialects, and signal separation from a noisy background, to name a few. Beyond these ASR and TTS-related technical challenges, spoken language is generally more informal. This informality can potentially necessitate increased robustness in downstream natural language understanding. Therefore, the primary decision you need to make is whether the extra complexity introduced by a voice interface is worth it, compared to the added business value.

What does it do? Is it informational only?

Historically, chatbots have served as a pivotal component in addressing user queries, often playing a crucial role as the initial touchpoint in customer service interactions. Their primary function has been centered around information delivery, making them adept at providing answers and guiding users through basic inquiries. However, these chatbots were typically limited in their capacity to execute actions on behalf of users, which often requires live agent involvement. Chatbot’s functionalities were tailored to handle tasks that necessitated minimal authorization, focusing mainly on informational issues.

On the other hand, the term “agents” which has recently experienced a resurgence in popularity, denotes a more dynamic and capable form of conversational interface. Unlike traditional chatbots, agents are designed not only to respond to queries but also to undertake substantive actions and complete tasks on behalf of users, via the same APIs that GUI application used, and thus demanding proper authorization due to their potential significance and influence on critical business processes.

In essence, while chatbots traditionally served informational roles and often operated by a cost center, agents are positioned to actively engage in a broader spectrum of user interactions and can be used to generate revenue, contributing significantly to operational efficiency and overall business outcomes.

Where are actions executed? Back-end or Front-end

Both agents and copilots share the common purpose of task execution on behalf of the user, differing primarily in the location where these actions take place. An agent execute action for user in the backend, allowing users to accomplish tasks simply by interacting with it, streamlining the user experience. In contrast, a copilot, exemplified by tools like Github Copilot, functions as a CUI companion to an existing GUI application. Here, actions are executed in the front-end of the hosting application, directly influencing the GUI interface.

In the case of a copilot, users navigate and complete tasks via the familiar GUI interface of the hosting application. This integration ensures a seamless transition between the copilot and the GUI, preventing any loss of productivity. To facilitate this smooth interaction, copilots often share context with the host application, enabling users to effortlessly switch between the conversational and graphical interfaces while maintaining a consistent and efficient workflow. This collaborative approach between copilots and GUI applications enhances user productivity and versatility in carrying out tasks.

Parting words

Compared to GUI apps, such as desktop, web, and mobile apps, the popularity of CUI apps is not quite there yet. However, CUI apps have the potential to reduce the learning curve, thus providing a better overall user experience. With recent developments in applying Large Language Models (LLM) to the tasks of language understanding and generation, as well as advancements in CUI interaction logic frameworks, the cost of building feature-rich CUI apps has greatly reduced. So, it is time to pick a CUI app form and give the CUI experience your user deserves.

References:

--

--