There are a lot of studies and demos that show how large language models (LLMs) can perform impressive tasks. While there is no one-size-fits-all approach, we’ve tried to create a set of guidelines that will help you better steer your way around all the innovation and confusion surrounding LLMs.
This post was written by Ben Dickson, a seasoned engineer, tech blogger, and mentor at our AI/ML Simulator for Product Managers.
I use the following three-stage framework when considering if and how to use LLMs in a product. It helps me with defining the problem, choosing the right models, creating the right prompts and making sure my process is efficient when going into production.
Stage I: Prepare
In this stage, the goal is to get a good sense of what you want to accomplish and where is the best place to start.
Define the task: With all the publicity surrounding LLMs, it is easy to think that they are general problem-solvers that can take a complex task and come up with a solution. But if you want to get good results, you should pick one specific task and try to formulate it as an input-output problem that can be categorized into one of known categories (classification, regression, question-answering, summarization, translation, text generation, etc.).
Choose a benchmark that is closely related to the problem you want to solve. This will help you determine good prompting techniques and models. For example, if you’re solving a reasoning problem, HellaSwag is a good benchmark. For language tasks, MMLU gives a good impression of how different LLMs perform. This guide from Confident AI is a good overview of different LLM benchmarks.
Create a basic test set: Create at least five examples that are descriptive of the problem you want to solve. The examples should be created manually and be directly related to your product or industry. You can use the benchmark examples as a guide on how to format your examples.
Choose a model: Look at LLM leaderboards to choose up to three models that perform the best on the benchmark that is related to your task.
Create a basic prompt template: Create a prompt for your test set. Use very simple prompting techniques to get a feel of the baseline performance for each model. A basic prompt usually includes the role, instructions, and problem.