The advent of instruction-tuned Large Language Models (LLMs) such as ChatGPT has revolutionized how we develop software features. Instead of relying on specialized programming languages, anyone can now use natural language to direct computers to perform complex tasks. You might think that skilled professionals, i.e., software engineers, are no longer necessary, right? Paradoxically, this new paradigm of “programming” computers, called prompt engineering, has actually given rise to a novel profession: the prompt engineer.
They are called engineers because this emerging role defines and refines the prompt templates (input instructions) for LLMs to achieve specific and accurate outputs, similar to how software engineers design and refine their code. But is this really the right analogy? Let’s examine software engineers’ responsibilities during the different phases of software development and see how prompt engineers can provide similar value.
Document function specification
Developing software feature is not cheap, so it is important for businesses to carefully decide what feature to build first before they spend a large chunk of capital to actually build it. Since the product owners or product managers are responsible for understanding business conditions and objectives, including market and user needs, they are typically the source of what needs to be built.
However, it is not as simple as product owners specifying what they want into requirements. The process of defining what to build typically involves back-and-forth negotiation between product managers, who represent what the business wants, and software engineers, who knows whether something can be built or not. The end result of this process is a set of requirement documents, including the functional requirements or functional specifications, which detail exactly what to build.
For the most part, experienced software engineers can tell if a function specification can be built or not. The same cannot be said for prompt engineers. LLMs are not predictable: it is impossible to predict what output an LLM will produce for a given prompt. So someone needs to translate the function specification into a prompt template and see whether it works, but do we need a special role for this?
From function specification to implementation
For each feature, a good functional specification should provide enough information so that a developer can understand exactly what needs to be built. For this reason, specification typically include the following sections:
- Purpose: Describe the goal of the function.
- Inputs and Outputs: Specifies what inputs the system will accept and what outputs it will produce.
- Description: Defines how the system should behave in response to various inputs and situations.
- Examples: Demonstrate how feature should work by a set of inputs and there corresponding expected output. For non trivial functionality, this might need to be long.
Purpose:
To generate an automatic email response for price inquiries based on product
prices and available templates.
Inputs:
productID (String):
The unique identifier of the product for which the price is being inquired.
customerName (String):
The name of the customer making the inquiry.
customerEmail (String):
The email address of the customer making the inquiry.
Outputs:(String):
The generated email reply containing the product price and a polite message.
Description:
The function should generate an automatic email reply for price inquiries.
It will look up the price of the product corresponding to the productID in
a predefined product database. Using this price, along with the
customer's name and email, the function will format an email reply based
on a predefined message template.
Translating the functional specification into code requires someone to be fluent in those languages, understand needed algorithms, computer architecture, and networks, etc., all of which require extensive training.
On the other hand, Turning the functional specification into some prompt template is a lot easier; you don’t need to learn a new language, and most guidelines are common sense. There are only a few strategies, such as chain of thought (COT), that you need to become familiar with. Of course, you also need to know how different LLM encode system prompt and user input. Altogether, you only need a couple of hours to become effective, then it is trial-and-error. So there is really little need for a new human role. Particular when LLM can do a better job.
And LLM can create prompt using examples
While manually creating prompt templates via trial-and-error might work for some simple and stable use case, it is not scalable. According to this study on how different prompt-engineering strategies affect an LLM’s ability to solve grade-school math questions, “What’s best for any given model, dataset, and prompting strategy is likely to be specific to the particular combination at hand.” So any time you change something, LLM or function specification including examples, you need to go through this time-consuming process manually again.
To some extent, prompt templates are like assembly code: when the target machine (LLM) changes, they need to change. For this reason, we do not use assembly much anymore. Instead, we use compilers to compile one piece of code into different binaries for different architectures, so the same functionality can be migrated to different hardware easily.
There are already tools like DSPy that can automatically optimizes prompt using labeled datasets. DSPy allows one to express the function specification in the forms of Signatures/Modules, and it then use the labeled examples to automatically figure out which prompt template is best for particular LLM models. This way, we can simply specify what we need, instead of trying to figure out the nuances of LLMs.
Parting words
There is plenty of skepticism about calling prompting engineering as prompting lacks several key characteristics of traditional engineering disciplines. For example, it’s inherently unstable: minor alterations in the system prompt can lead to substantial changes in output.
But the bigger issue here is its ill-perceived parallels to traditional software engineering. Prompt template actually mostly overlapped with functional specification instead of implementation: functional specification has all the information needed by a prompt. With tools like DSPy, translating from functional specification to prompt templates can be completely automated. Furthermore, but it is also useful for product managers to learn the principles of prompting, as this way, they can iterate on functional specifications faster, as they do not have to wait for the engineers to tell them whether functionality is buildable or not.
The vast common sense knowledge distilled in the LLM can be used to solve many problems, but we do not need human “prompt engineers” to take advantage of it. A tool that takes functional specifications with desired input-output examples and triggers a compiler like DSPy will allow product managers to experiment and develop LLM-based functionality easily. Without manual trial and error, the LLM functionality built this way will be more dependable for composing large software.
Reference:
- Why prompt engineering wont be a thing [medium].
- Prompt engineering is a task best left to AL models[register]
- https://spectrum.ieee.org/prompt-engineering-is-dead
- https://platform.openai.com/docs/guides/prompt-engineering/six-strategies-for-getting-better-results
- https://dspy-docs.vercel.app/docs/intro
- https://hai.stanford.edu/news/textgrad-autograd-text
- https://arxiv.org/pdf/2402.10949
- https://arxiv.org/pdf/2402.03099