Building AI features remains the number one priority for SaaS product teams, in fact 74% of startups are actively working on it.
We've built and deployed multiple AI features, learning a lot along the way. Here, as part of a 3 part series of articles, we're sharing everything we've learned.
When evaluating a use case for AI, Product Managers must weigh the value a feature will bring to end users vs the cost and risk associated with using AI for the task. To support PMs in this process, I've detailed below the strengths and limitations we have observed while building with AI. The examples below are illustrated using ChatGPT, but all of this functionality is available via API's to the underlying models.
General applications of AI are well understood, but certain specific strengths may not be immediately obvious and offer insight into suitable use cases.
Reasoning on unstructured data
AI can ingest unstructured input like text, audio and now images, decide whether certain criteria have been met and then take an action based on flexibly defined criteria. For example, it can take an image + free form text input instructions, reason on this and provide a structured response. These responses can be even requested as valid JSON to be parsed reliably.
Baked in Worldly Knowledge
Large Language models (LLMs) possess a wealth of pre-existing world knowledge, allowing them to augment directly shared information. They recognize entities and jargon and understand references to companies, occupations, and more. This is useful when setting up automations where the number of possible scenarios is too large to build rigidly defined rules around. For example, it can respond to a customer inquiry in context from a Zookeeper without explicitly being told how to handle this interaction.
Dealing with noise
LLMs are exceptional at dealing with what would traditionally be seen as poor/messy data. They are happy to ignore things which aren't relevant to the command and can even extrapolate based on their understanding of the world. To test this, pose a question to an LLM in shorthand.
Code Generation
Perhaps the most impressive skill of language models is their ability to write and debug code in any language. GPT is really really good at this! The primary use case today is to act as a co-pilot to developers (and non-developers willing to get their hands dirty). For the more ambitious, GPT can also be used a code generating agent within your application itself. We've recently implemented a Text to SQL query feature, where a user can ask a question in natural language and GPT will write the SQL query to fetch relevant data from the database. In future, we expect code generating agents to facilitate fully flexible software applications which can respond to end user needs in real time, generating custom dashboards and UI components to make the experience for the end user fully personalized.
Multi-Modal
By stringing models together, solutions can be multi modal in input / output. As such, input can be text / audio / image / video and outputs can be text / audio / image / video. Interesting use cases here include auto-generated media to support product updates and auto-generated narration of support videos.
The limitations of AI are less well understood and perhaps more important for product managers to grapple with. Competitive moats and IP lies at the intersection of user needs and constraints with AI models.
Latency
Speed of execution for LLMs differs by model and nature of the task. Tasks will take longer to complete if they require multiple prompts to be strung together or involve longer text outputs. Additionally, GPT-4 can be 2-3x longer completion time then GPT-3.5 Turbo. Execution by AI can take anywhere from a few seconds to multiple hours depending on the use case. When deciding upon a use case, it is important to consider the users' tolerance for speed of response. For example, if generating a report of customer trends across a large dataset, users may be willing to wait multiple minutes for a response. In comparison, having AI automatically apply filters to a dashboard must happen faster than the user would achieve the same outcome by pressing buttons on the UI. As a rule of thumb, if something takes a user more then 20 seconds to complete, AI may help.
Cost
Cost is variable based on the model used and amount of data input / output required. A simple task like summarizing 100 emails would cost $0.07 on GPT-3.5 and $0.8 on GPT-4. This is an important consideration for product managers thinking about AI features where usage of the feature will have direct marginal cost to the business. Where costs are significant, product teams should consider usage based pricing to align revenue with the increase in marginal costs. Traditionally, software had theoretically zero marginal cost of usage. Companies incurred additional cloud and support spend per additional user, but this was usually non-significant and had economies of scale baked in. With AI products, costs scale linearly with usage and therefore pricing should reflect this reality.
Data Size & Context
AI models have a hard limit on the amount of data that they can process in one API call. This is usually referred to as the models “context window”, and varies from 8k tokens in GPT-4 (roughly 6000 words) to 128,000 (100,000 words) in GPT-4 Turbo with the limit here being combined input / output. This means LLMs are not reliable for classification or topic modeling over large datasets like batches of call transcripts or support tickets (a single sales call is typically ~10,000 tokens). Depending on the use case, this constraint can be overcome by utilizing Retrieval Augmented Generation techniques (RAG) or by combining the LLM with traditional modelling techniques like K-means, which we will discuss in Part Two of this series.
Non-Deterministic
LLMs are non-deterministic models meaning each time they run, a different output will be expected. In some instances, this can be a feature where some creativity is needed to solve the task (for example, “help me write a joke about X” or “what ideas do you have on how to improve our product”). This non-determinism can cause issues when output formats require consistency (e.g. where the response must be a specific item from a list). OpenAI's recent release mitigates these issues somewhat, as you can now pass a seed parameter and it promises to make the model return consistent outputs. Additionally, the new JSON mode in GPT-4 Turbo forces the model to output a valid JSON object.
In summary, when choosing a use case, product managers must weigh the potentially high value benefit to the end user vs the tradeoffs associated with AI (cost, latency, non-determinism).
Checklist for a good application of AI:
âś… High user friction associated with the problem
âś… Cost associated with processing is acceptable given this value creation
âś… Risks associated with inaccurate response is low
âś… Execution speed is acceptable
This is an exciting time of major disruption to product development. I'd love to hear about your experiences and ideas in the comments, or reach out directly
Tune in for Part Two next week where we will discuss implementation options for certain use cases of AI such as Q&A over a large dataset, techniques to chain prompts, and other general considerations.
‍