Building AI products with a holistic mental model

Author:Murphy | View: 20629 | Time: 2025-03-23 12:50:01

Last update: October 19, 2024

Recently, I coached a client – an SME in the fintech sector – who had hit a dead end with their AI effort for marketing content generation. With their MS Copilot and access to the latest OpenAI models in place, they were all set for their AI adventure. They had even hired a prompt engineer to help them create prompts they could regularly use for new marketing content. The whole project was fun and engaging, but things didn't make it into production. Dissatisfied with the many issues in the AI outputs – hallucination, the failure to integrate relevant sources, and a heavy AI flavour in the writing style – marketeers would switch back to "manual mode" as soon as things got serious. They simply couldn't rely on the AI to produce content that would publicly represent the company.

Misunderstanding AI as prompting the latest GenAI models, this company failed to embrace all the elements of a successful AI initiative, so we had to put them back in place. In this article, I will introduce a mental model of for AI systems that we often use to help customers build a holistic understanding of their target application. This model can be used as a tool to ease collaboration, align the different perspectives inside and outside the AI team, and create successful applications based on a shared vision.

Inspired by product management, this model has two big sections – an "opportunity space" where you can define your use cases and value creation potentials, and a "solution space" where all the hard work of implementing, fine-tuning, and promoting your AI happens. In the following, I will explain the components that you need to define in each of these spaces.

Note: If you want to learn more about pragmatic AI applications in real-life business scenarios, subscribe to my newsletter AI for Business.

1. Pinpointing your opportunities for value creation with AI

With all the cool stuff you can now do with AI, you might be impatient to get your hands dirty and start building. However, to build something your users need and love, you should back your development with a use case that is in demand by your users. In the ideal world, users tell us what they need or want. For example, in the case of my client, the request for automating content generation came from marketeers who were overwhelmed in their jobs, but also saw that the company needs to produce more content to stay visible and relevant. If you are building for external users, you can look for hints about their needs and painpoints in existing customer feedback, such as in product reviews and notes from your sales and success teams.

However, since AI is an emerging technology, you shouldn't overrely on what your users ask for – chances are, they simply don't know what is possible yet. True innovators embrace and hone their information advantage over customers and users – for example, Henry Ford famously said: "If I had asked people what they wanted, they would have said faster horses." Luckily for us, he was proactive and didn't wait for people to articulate they need cars. If stretch out your antennae, AI opportunities will come to you from many directions, such as:

Market positioning: AI is trendy – for established businesses, it can be used to reinforce the image of a business as innovative, high-tech, future-proof, etc. For example, it can elevate an existing marketing agency to an AI-powered service and differentiate it from competitors. However, don't do AI for the sake of AI. The positioning trick is to be applied with caution and in combination with other opportunities – otherwise, you risk losing credibility.
Competitors: When your competitors make a move, it is likely that they have already done the underlying research and validation. Look at them after some time – was their development successful? Use this information to optimize your own solution, adopt the successful parts, and iron out the mistakes. For example, let's say you are observing a competitor that is offering a service for fully automated generation of marketing content. Users click a "big red button", and the AI marches ahead to write and publish the content. After some research, you learn that users hesitate to use this product because they want to retain more control over the process and contribute their own expertise and personality to the writing. After all, writing is also about self-expression and individual creativity. This is the time for you to move ahead with a versatile tool that offers rich functionality and configuration for shaping your content. It boosts the efficiency of users while allowing them to "inject" themselves into the process whenever they wish.
Regulations: megatrends such as technological disruption and globalization force regulators to tighten their requirements. Regulations create pressure and are a bulletproof source of opportunity. For example, imagine a regulation comes into place that strictly requires everyone to advertise AI-generated content as such. Those companies that already use tools for AI content generation will disappear for internal discussions on whether they want this. Many of them will refrain because they want to maintain an image of genuine thought leadership, as opposed to producing visibly AI-generated boilerplate. Let's say you were smart and opted for an augmented solution that gives users enough control so they can remain the official "authors" of the texts. As the new restriction is introduced, you are immune and can dash forward to capitalize on the regulation, while your competitors with fully automated solutions will need time to recover from the setback.
Enabling technologies: Emerging technologies and significant leaps in existing technologies, such as the wave of generative AI in 2022–23, can open up new ways of doing things, or catapult existing applications to a new level. Let's say you have been running a traditional marketing agency for the last decade. Now, you can start introducing AI hacks and solutions into your business to increase the efficiency of your employees, serve more clients with the existing resources, and increase your profit. You are building on your existing expertise, reputation, and (hopefully good-willed) customer base, so introducing AI enhancements can be much smoother and less risky than it would be for a newcomer.

I also advise to proactively brainstorm opportunities and ideas to make your business more efficient, improve, and innovate. You can use the following four "buckets" of AI benefits to guide your brainstorming:

Productivity and automation: AI can support or automate routine processes where many small decisions need to be made, such as fraud detection, customer service, and invoice processing. This reduces human workload, frees up resources for more significant tasks, and generally makes your users happy. AI can also be applied to tasks that users would like (someone) to do but don't because they lack the time and resources. For example, if you want to publish content daily rather than biweekly, AI can become your best friend.
Improvement and augmentation: AI can help you improve the outcome of certain tasks. For example, when creating new content, the wide knowledge of an LLM can be combined with the specific context knowledge of the human, thus enhancing the final result.
Personalization: Individualism is a trend, and modern users want products to adapt to their needs and preferences. Many B2C tech companies have already mastered this discipline – think of the personalized recommendations on YouTube, Netflix, or Amazon. Generative AI pushes this further, allowing for content personalization on a micro-level.
Innovation and transformation: Business environments, including competitors, regulations, and customers, are changing at high speed. To stay relevant over time, companies need to get into the habit of adjusting and innovating on a continuous basis, even in situations of stress and uncertainty. AI can be our best friend on this journey. It can connect the dots between different bodies of information, come up with innovative ideas, and assist us with the implementation of these innovations.

Some of these benefits – for example productivity – can be directly quantified for ROI. For less tangible gains like personalization, you will need to think of proxy metrics like user satisfaction. As you think about your AI strategy, you might want to start with productivity and automation, which are the low-hanging fruits, and move on to the more challenging and transformative buckets later on.

2. Data – the fuel of your AI system

In our content generation project, the company jumped right into using LLMs without having a high-quality, task-specific dataset at hand. This was one of the reasons for its failure – your AI is only as good as the data you feed it. For any kind of serious AI and machine learning, you need to collect and prepare your data so it reflects the real-life inputs and provides sufficient learning signals for your AI models. When you start out, there are different ways to get your hands on a decent dataset:

You can use an existing dataset. This can either be a standard machine learning dataset or a dataset with a different initial purpose that you adapt for your task. There are some dataset classics, such as the IMDB Movie Reviews Dataset for sentiment analysis and the MNIST dataset for hand-written character recognition. There are more exotic and exciting alternatives, like Catching Illegal Fishing and Dog Breed Identification, and innumerable user-curated datasets on data hubs like Kaggle. However, the chances of finding a dataset that exactly matches your specific task are low. In most cases, you will soon need to follow up with other methods to enrich your data.
You can annotate or create the data manually to create the right learning signals. In the content generation project, we built a small database of examples of past content pieces, and the marketing team evaluated them based on a range of criteria such as novelty and style. Annotation can be also be done by an external provider or a crowdsourcing service such as Amazon Mechanical Turk. It is a rather costly undertaking, so you can also consider automated methods to increase the scale of your training data.
You can quickly add more examples to an existing dataset using data augmentation. Thus, in our project, we used content performance metrics such as reads and likes to automatically evaluate and annotate a larger number of social media posts. For simpler tasks like sentiment analysis, you can produce near-duplicates of existing examples by introducing some additional noise into the texts, switch up a couple of words, etc. For more open generation tasks, you can use LLMs for automated training data generation. Once you have identified the best method to augment your data, you can easily scale it to reach the required dataset size – however, keep in mind that automated data generation also introduces more noise and bias into your data. Thus, you need to continuously monitor and curate the augmented data to maintain quality and ensure your AI doesn't drift away into the wrong direction.

When creating your data, you face a trade-off between quality and quantity. You can manually annotate less data with a high quality, or spend your budget on developing hacks and tricks for automated data augmentation that will introduce additional noise. A rough rule of thumb is as follows:

Prioritize quantity when pre-training a model. It needs to acquire knowledge from scratch, which can only happen with a larger quantity of data.
Prioritize quality when fine-tuning an existing model. The controlled manual annotation of a small dataset using detailed guidelines might be the optimal solution in this case.

Ultimately, you will find your ideal data composition through a constant back-and-forth between training, evaluation, and enhancing your data. Thus, in the content generation project, the team is now continuously collecting and curating data based on new published content and updating the datasets used for LLM fine-tuning. They are able to quickly optimize and sharpen the system because the most valuable data comes directly from production. When your application goes live, you should have data collection mechanisms in place to collect user inputs, AI outputs, and, if possible, additional learning signals such as user evaluations. Using this data for fine-tuning will make your model come as close as possible to the "ground truth" of user expectations. This results in higher user satisfaction, more usage and engagement, and, in turn, more high-quality data – a virtuous cycle that is also called the data flywheel.

3. Intelligence

Data is the raw material from which your model will learn, and hopefully, you can compile a representative, high-quality dataset for your challenges. Now, the actual intelligence of your AI system – its ability to generalize to new data – resides in the machine learning algorithms and models, and any additional tools and plugins that can be called by these.

In terms of the core AI models, there are three main approaches you can adopt:

Prompt an existing model. Mainstream LLMs (Large Language Models) of the GPT family, such as GPT-4o and GPT-4, as well as from other providers such as Anthropic and AI21 Labs are available for inference via API. With prompting, you can directly talk to these models, including in your prompt all the domain- and task-specific information required for a task. This can include specific content to be used, examples of analogous tasks (few-shot prompting) as well as instructions for the model to follow. For example, if your user wants to generate a blog post about a new product they are releasing, you might ask them to provide some core information about the product, such as its benefits and use cases, how to use it, the launch date, etc. Your product then fills this information into a carefully crafted prompt template and asks the LLM to generate the text. Prompting is great to get a head-start into pre-trained models. However, just as in the case of my client, it often leads to the "last-mile problem" – the AI gives reasonable outputs but their are just not good enough for real-life use. You can do whatever you want – provide more data, optimize your formulation, threaten the model, but this point, you've used up the optimization potential of prompting.
Fine-tune a pre-trained model. When you hit the ceiling with your prompting efforts, you can consider fine-tuning a model with your custom data. Think of the model as a high-school student which is knows a little about a lot of things, but doesn't really excel at anything in particular. With fine-tuning, you can send it to university by feeding it specialized data and tasks. For example, for marketing content generation, we collected a set of blog posts that performed well in terms of engagement, and reverse-engineer the instructions for these. From this data, the model learned about the structure, flow, and style of successful articles. Fine-tuning requires some engineering skill, but is a great way to optimize the accuracy, privacy, and running cost of your application. Since you have full control and ownership over the fine-tuned model, this is also a great way to enhance your competitive moat.
Train your ML model from scratch. In general, this approach works well for simpler but highly specific problems such as sentiment analysis, fraud detection, and text classification. These tasks can often be solved with established machine learning methods like logistic regression, which are computationally less expensive than fancy deep learning methods. The generation of content does not exactly fall into this category – it requires advanced linguistic capabilities to get you off the ground, and these can only be acquired after training on ridiculously large amounts of data.

Beyond the training, evaluation is of primary importance for the successful use of machine learning. Suitable evaluation metrics and methods are not only important for a confident launch of your AI features but will also serve as a clear target for further optimization and as a common ground for internal discussions and decisions. While technical metrics such as precision, recall, and accuracy can provide a good starting point, ultimately, you will want to look for metrics that reflect the real-life value that your AI is delivering to users.

Finally, today, the trend is moving from using a single AI model to compound AI systems which accommodate different models, databases, and software tools and allow you to optimize for cost and transparency. Thus, in the content generation project, we used a RAG (Retrieval-Augmented Generation) architecture and combined the model with a database of domain-specific sources that it could use to produce specialized fintech content.

Figure 3: Retrieval-Augmented Generation architecture

After the user inputs a query, the system doesn't pass it directly to the LLM, but rather retrieves the most relevant sources for the query in the database. Then, it uses these sources to augment the prompt passed to the LLM. Thus, the LLM can use up-to-date, specialized sources to generate its final answer. Compared to an isolated fine-tuned model, this reduced hallucinations and allowed users to always have access to the latest sources. Other types of compound systems are agent systems, LLM routers and cascades. A detailed description is out of the scope of this article – if you want to learn more about these patterns, you can refer to my book The Art of AI Product Management.

4. User experience

The user experience of AI products is a captivating theme – after all, users have high hopes but also fears about "partnering" with an AI that can supercharge and potentially outsmart them. The design of this human-AI partnership requires a thoughtful and sensible discovery and design process. One of the key considerations is the degree of automation you want to grant with your product – and, mind you, total automation is by far not always the ideal solution. The following figure illustrates the automation continuum:

Figure 4: The automation continuum of AI systems

Let's look at each of these levels:

In the first stage, humans do all the work, and no automation is performed. Despite the hype around AI, most knowledge-intensive tasks in modern companies are still carried out on this level, presenting huge opportunities for automation. For example, the content writer who resists AI-driven tools and is persuaded that writing is a highly manual and idiosyncratic craft works here.
In the second stage of assisted AI, users have complete control over task execution and do a big part of the work manually, but AI tools help them save time and compensate for their weak points. For example, when writing a blog post with a tight deadline, a final linguistic check with Grammarly or a similar tool can become a welcome timesaver. It can eliminate manual revision, which requires a lot of your scarce time and attention and might still leave you with errors and overlooks – after all, to err is human.
With augmented intelligence, AI is a partner that augments the intelligence of the human, thus leveraging the strengths of both worlds. Compared to assisted AI, the machine has much more to say in your process and covers a larger set of responsibilities, like ideation, generation, and editing of drafts, and the final linguistic check. Users still need to participate in the work, make decisions, and perform parts of the task. The user interface should clearly indicate the labor distribution between human and AI, highlight error potentials, and provide transparency into the steps it performs. In short, the "augmented" experience guides users to the desired outcome via iteration and refinement.
And finally, we have full automation – an intriguing idea for AI geeks, philosophers, and pundits, but impractical for most real-life systems. Full automation means that you are offering one "big red button" that kicks off the process and gives full control to the machine. Once the AI is done, your users face the final output and either take it or leave it. Anything that happened in-between they cannot influence. Full automation is an important element of design approaches such as ambient intelligence and calm technology, as implemented in smart home appliances, voice assistants, etc. However, as of now, LLMs and other foundational models are far from being able to catch and process the rich context information they would need for seamless and reliable automated operation. As you can imagine, the UX options for full automation are rather limited since there is virtually no interaction going on. The bulk of the responsibility for success rests on the shoulders of your technical colleagues, who need to ensure an exceptionally high quality of the outputs.

AI systems need special treatment when it comes to design. Standard graphical interfaces are deterministic and allow you to foresee all possible paths the user might take. By contrast, large AI models are probabilistic and uncertain – they expose a range of amazing capabilities but also risks such as toxic, wrong, and harmful outputs. From the outside, your AI interface might look simple because a broad range of the capabilities of your product reside in the model and are not directly visible to users. For example, an LLM can interpret prompts, produce text, search for information, summarize it, adopt a certain style and terminology, execute instructions, etc. Even if your UI is a simple chat or prompting interface, don't leave this potential unseen – in order to lead users to success, you need to be explicit and realistic. Make users aware of the capabilities and limitations of your AI models, allow them to easily discover and fix errors made by the AI, and teach them ways to iterate themselves to optimal outputs. By emphasizing trust, transparency, and user education, you can make your users collaborate with the AI. While a deep dive into AI UX Design is out of the scope of this article, I strongly encourage you to look for inspiration not only from other AI companies but also from other areas of design, such as human-machine interaction. You will soon identify a range of recurring design patterns, such as autocompletes, prompt suggestions, and templates, that you can integrate into your own interface to make the most out of your data and models.

5. Governance – balancing innovation with responsibility

When you start out with AI, it is easy to forget about governance because you are busy solving technological challenges and creating value. However, without a governance framework, your tool can be vulnerable to security risks, legal violations, and ethical concerns that can erode customer trust and harm your business. In the fintech example mentioned earlier, this led to issues such as hallucinations and irrelevant sources leaking into the public content of the company. A strong governance structure creates guardrails to prevent these issues. It protects sensitive data, ensures compliance with privacy regulations, maintains transparency, and mitigates biases in AI-generated content.

There are different definitions of AI governance. In my practice, companies were especially concerned with the following four types of risk:

Security: AI systems must ensure the protection of sensitive data from breaches and attacks through encryption, role-based access controls, and regular adversarial testing. This helps safeguard against unauthorized access or malicious inputs that could compromise the integrity of the system.
Transparency and explainability: It is important for AI systems to offer clear explanations of how decisions are made and outputs are generated. This transparency helps build user trust and ensures that users can understand and, when necessary, intervene in the system's processes, especially in high-stakes or critical applications.
Privacy: Ensuring compliance with data protection laws, such as GDPR, is essential. This involves anonymizing user data, minimizing data collection, and giving users the option to opt out of data usage, which helps protect individual privacy and uphold ethical data practices.
Bias, fairness, and ethics: AI systems should actively work to identify and reduce bias in their models to ensure fair and equitable treatment across different user groups. Regular fairness audits and adherence to ethical guidelines are crucial to align AI with societal values and prevent discriminatory outcomes.

Exposure to these risks depends on the application you are building, so it is worth spending some time to analyze your specific situation. For example, demographic bias (based on gender, race, location, etc.) is an important topic if your model generates user-facing content or makes decisions about people, but it turns into a non-issue if you use your model to generate code in the B2B context. In my experience, B2B applications have higher requirements in terms of security and transparency, while B2C applications need mode guardrails to safeguard the privacy of user data and mitigate bias.

To set up your AI governance framework, begin by reviewing the relevant regulations and defining your objectives. At a minimum, ensure you meet the regulatory requirements for your industry and geography, such as the EU AI Act in Europe or the California Consumer Privacy Act in the U.S. Beyond compliance, you can also plan for additional guardrails to address key risks specific to your AI application. Next, assemble a cross-functional team of legal, compliance, security, and AI experts to define, implement, and assign governance measures. This team should regularly review and update the framework to adapt to system improvements, new risks, and evolving regulations. For example, the recent FTC actions against companies that exaggerated their AI performance signal the importance of focusing on quality and maintaining realistic communication about AI capabilities.

6. Specifying the mental model

Let's summarize how we addressed the different components in the content generation project:

Figure 5: Specified mental model for AI-driven content generation

This representation can be used at different stages in the AI journey – to prioritize use cases, to guide your team planning and discussions, and to align different stakeholders. It is an evolving construct that can be updated as you move along with new learnings as you move forward with your project.

Summary

Let's summarize the key take-aways from this article:

Identify AI use cases strategically: Focus on areas where AI can add real value, such as improving productivity, personalization, or driving innovation.
Balance AI with human expertise: Design AI tools to support, not replace, human skills, using AI to augment decision-making.
Prioritise data quality: High-quality data is essential for effective AI. Curate, annotate, and enhance your datasets to ensure they reflect real-world conditions.
Start with low-hanging fruits: Begin with simple AI applications like automation, then expand to more complex, strategic tasks.
Develop a data flywheel: Create a feedback loop by collecting real-time user data and using it to continuously improve your AI model.
Refine AI through fine-tuning: Customise pre-trained models with your own data to improve performance in your specific business context.
Optimise the user experience: Ensure AI products are user-friendly, with clear explanations and error-correction options to boost adoption.
Implement strong AI governance: Establish governance to ensure security, privacy, transparency, and bias mitigation, complying with legal standards like GDPR.
Create a cross-functional governance team: Form a team of experts in legal, compliance, security, and AI to manage governance and risks.
Focus on realistic communication and trust: Be transparent about AI capabilities and avoid overpromising, building user trust through responsible practices.

Where you can go from here:

To see the mental model in action, learn how it can be applied to conversational AI and to Text2SQL applications.
If you have already started implementing AI, read this article to learn how you can use it to carve out a competitive advantage.
For deep-dives into many of the topics that were touched in this article, check out my upcoming book The Art of AI Product Development.

Note: All images are by the author.

Tags: Artificial Intelligence Deep Dives Large Language Models Product Management UX Design