
Prompt engineering has emerged as a new way of building applications. But is it effective from a business perspective? To help you make informed decisions about this emerging technology, we adapted a deeplearning.ai course example and tried various ways to control the chatbot interaction using prompts. Here is what we learned.
Background: ChatAPI Completion API
The Chat Completion API, designed for ChatGPT and GPT-4, accepts an array of conversation messages annotated with roles as input. To generate a response from the chatbot (assistant) using this completion API, the input messages should alternate between the user and assistant roles starting from the second message, with the last message being from the user. Here is a typical input:
{"role": "system", "content": "Provide some context and/or instructions to the model."},
{"role": "user", "content": "Example question goes here."},
{"role": "assistant", "content": "Example answer goes here."},
{"role": "user", "content": "First question/message for the model to actually respond to."}
The first message, known as system message, should always be from the system role. This message is the prompt, it should contain the following information: a brief description of the assistant, personality traits of the assistant, instructions or rules you would like the assistant to follow and finally data or information needed for the model, such as relevant questions from an FAQ.
Building a simple food ordering chatbot is easy
You can customize the system message according to your specific use cases. For instance, to create a food ordering chatbot, you can use the following example prompt/system message:
menu = """
pepperoni pizza 12.95, 10.00, 7.00 \
cheese pizza 10.95, 9.25, 6.50 \
eggplant pizza 11.95, 9.75, 6.75 \
fries 4.50, 3.50 \
greek salad 7.25 \
Toppings: \
extra cheese 2.00, \
mushrooms 1.50 \
sausage 3.00 \
canadian bacon 3.50 \
AI sauce 1.50 \
peppers 1.00 \
Drinks: \
coke 1.00, 2.00, 3.00 \
sprite 3.00, 4.00, 5.00 \
bottled water 5.00 \
"""
instruction = """
You are OrderBot, an automated service to collect orders for a pizza restaurant. \
You first greet the customer, then collects the order, \
and then asks if it's a pickup or delivery. \
You wait to collect the entire order, then summarize it and check for a final \
time if the customer wants to add anything else. \
Finally you collect the payment.\
If it's a delivery, you ask for an address. \
Make sure to clarify all options, extras and sizes to uniquely \
identify the item from the menu.\
You respond in a short, very conversational friendly style. \
The menu includes \
"""
Notice we put restaurant independent information such as a description, personality traits of the assistant, as well as the rules that the assistant should follow into one variable instruction, and the restaurant specific information, such as the menu items, their sizes, and prices, into another variable.
These two parts are then combined inside a simple loop where you get user input, format it into a conversation, call the OpenAI chat completion API, and print the assistant’s response:
# remember install openai using: pip install openai
import os
import openai
openai.api_key = "sk-***" # your openai api key you registered on the openai.
conversation = [
{"role": "system", "content": f"""{instruction}{menu}""" },
{"role": "user", "content": "hi"} # This make the API complete for one turn.
]
while(True):
response = openai.ChatCompletion.create(
model="gpt-3.5-turbo",
messages = conversation,
temperature=0 # Try to as deterministic as possible.
)
reply = response.choices[0].message["content"]
print("\nBot:" + reply + "\nUser:")
conversation.append({"role": "assistant", "content": reply})
user_input = input()
conversation.append({"role": "user", "content": user_input})
Sometimes ChatGPT can actually observe the constraints you set up in the system message or prompt, in which case, it is fully capable of delivering a seamless user experience:
Bot: Hello! Welcome to our pizza restaurant. What can I get for you today?
Customer: Hi, I'd like to order a pepperoni pizza, a cheese pizza, and a greek salad.
Bot: Great choices! Would you like the pepperoni pizza in small, medium, or large size?
Customer: I'll take a medium.
Bot: And for the cheese pizza?
Customer: I'll take a large.
Bot: Perfect. And would you like any toppings on your pizzas?
Customer: Yes, can I get extra cheese on both and mushrooms on the pepperoni pizza?
Bot: Sure thing. That's extra cheese and mushrooms on the pepperoni pizza and extra cheese on the cheese pizza. Anything else for you?
Customer: No, that's it.
Bot: Alright, is this for pickup or delivery?
Customer: Delivery, please.
Bot: Great. What's the delivery address?
Customer: 123 Main Street.
Bot: Got it. And finally, would you like anything else or are you all set?
Customer: No, I'm good.
Bot: Alright, your total comes to $39.65. How would you like to pay?
Customer: I'll pay with my credit card.
Bot: Perfect. Thank you for your order! Your pizzas and salad will be delivered to 123 Main Street shortly.
While the response generated by ChatGPT may seem satisfactory from the user’s perspective, businesses also need to have control over the interaction to provide a differentiated experience at the same time. However, this aspect is where you may encounter some difficulties.
The bad #1: no direct control yet
One may find it difficult to precisely control the behavior of the chatbot using natural language, as it lacks formally defined semantics.
For instance, the instruction provided above only prompts the user for payment when they select delivery. However, many businesses might prefer to collect payment upfront and send a code when the order is ready. But it is a struggle to make chatbot ask user for payment regardless, as of May 8th, 2023, none of these plausible methods are actually working:
- Moving the “finally you collect payment” up so that it is not governed by the if statement.
+Finally you collect the payment.\
If it's a delivery, you ask for an address. \
-Finally you collect the payment.\
2. Use stronger sentence like “You should always collect the payment”.
Payment collection should always be conducted. \
You should always collect the payment. \
Although using natural language to define expected behavior requires no learning curve, ChatGPT can be insensitive to some perfectly legitimate instructions. This lack of control can potentially be fixed through trial and error, but it is a poor developer experience.
The bad #2: no dynamic control yet
An effective interaction in chatbots often need to interact with a production system. For instance, if a user wants a cheese pizza, the chatbot should always check if it is available in stock. There is no need to ask about the size and toppings for the pizza if it is not in stock.
This brings up the second issue with using prompts to program interaction logic: it is hard to use prompts to define dynamic behavior. Currently, it is not possible for instruction-tuned models like ChatGPT to trigger some APIs per predefined logic and to guide the conversation based on the function returns.
The ugly: it hallucinates
Chatbots built with ChatGPT can sometimes perform surprisingly well, but other times they can fail terribly. Furthermore, it can be difficult to determine when the chatbot is functioning properly and when it is not.
Bot: Hello! Welcome to our pizza restaurant. How can I assist you today?
User: one cheese pizza and three drinks
Bot: Great! Just to confirm, would you like a small, medium, or large cheese pizza? And which drinks would you like and in what sizes?
User: large
Bot: Got it! So that's one large cheese pizza and three drinks. Which drinks would you like and in what sizes? We have Coke in small ($1.00), medium ($2.00), and large ($3.00), Sprite in small ($3.00), medium ($4.00), and large ($5.00), and bottled water for $5.00.
User: coke
Bot: Sure thing! And what size would you like the Coke to be? Small, medium, or large?
User: small
Bot: Great! So that's one large cheese pizza and one small Coke. Is there anything else you'd like to order or is that all for now?
For some reason it changed three drinks to just one without user consent.
Bot: Hello! Welcome to our pizza restaurant. How can I assist you today?
User: small cheese pizza
Bot: Great choice! Would you like a 6.50 inch, 9.25 inch, or 10.95 inch pizza?
For some reason, it did not pick up small. Also it picked up the price as size of it.
The ugly: it is not deterministic
One must-have property of any programming system is deterministic behavior. According to OpenAI, a temperature of 0 in the OpenAI completion API produces only the most likely completion, making it the only choice for achieving deterministic behavior. However, even with a temperature of 0, deterministic behavior is not always guaranteed.
Bot: Hello! Welcome to our pizza restaurant. How can I assist you today?
User: large cheese pizza
Bot: Great choice! Is there anything else you would like to order or is that all for now?
User: that is all.
Bot: Alright, just to confirm, you would like to order a large cheese pizza. Is that correct? Also, would you like to pick it up or have it delivered?
The only change in user input is from small to large, yet the response changed drastically. In fact, some time, sometime user will get a different response even for the identical input.
Bot: Hello! Welcome to our pizza restaurant. How can I assist you today?
User: small cheese pizza
Bot:Great choice! Is there anything else you would like to order or is that all for now? And would you like to pick it up or have it delivered?
The random behavior observed at temperature zero has been linked to the nondeterministic nature of faster inference algorithms according to here. Moreover, OpenAI may run multiple experimental models to handle live traffic, and the behavior that users experience may vary depending on the experiment they are in. This can be particularly challenging for developers, as users may report bugs that they cannot replicate if they are not in the same experiment, adding to their frustration of not being able to get what they want.
The ugly: fix an issue needs Ph.D.
There is currently no straightforward solution to address these issues without retraining or fine-tuning the model with new data. Despite the promise of instruction-tuned models that supposedly eliminate the need for a Ph.D., retraining or fine-tuning still requires some serious setup.
Parting words: prompt is not good at programming
Using prompts to program has its benefits: anyone can express what they want without going through a steep learning curve. Thanks to recent advancements in instruction-tuned models, it is now possible to do so, especially for simple and static use cases that don’t have many special business requirements.
However, without a carefully designed semantics to precisely describe the desired interaction and a reliable runtime to carry out such semantics, using natural language to program may not be a viable option for businesses that require exact control to build differentiated services, even for relatively simple use cases like food ordering, as we have observed in our experimentation.
In the next case studies, we will explore the possibility of using these instruction-tuned model only for dialog understanding. We will see how you can have full control of interaction logic when taking food orders. Stay tuned.
References: